CppCast - Asynchronous Programming

Starting point is 00:00:00 This episode of CppCast is sponsored by JetBrains, maker of excellent C++ developer tools. Listen in for a special discount code later this episode. And by CppCon, the annual week-long face-to-face gathering for the entire C++ community. The call for CppCon 2015 submissions is now open with a deadline of May 22nd. Episode number 9 of CppCast with guest Hartmut Kaiser recorded April 2nd, 2015. In this episode, we talk about C++ modules and some new Boost libraries. Then we'll interview Hartmut Kaiser from LSU about the future of asynchronous programming. Hartmut tells us about HPX, a general-purpose C++ runtime for parallel and distributed applications of any scale. Welcome to episode 9 of CppCast, the only podcast for c++ developers by c++ developers i'm your host

Starting point is 00:01:26 rob irving and i want to start off this episode by thanking manuel sanchez for joining me on episode eight manuel and i talked about b code and all the great work they're doing over working on the c++ dependency manager last episode i talked about how i'd like to have a co-host for CBPCast. And this episode, after talking with Jason Turner, he's decided to come on and join me as a co-host. How are you doing, Jason? I'm doing great. How are you doing, Rob? Doing very well. So, Jason, you joined me on episode two, which was great.

Starting point is 00:02:01 We talked about some of the projects you're working on. Is it ChaiScript or ChaiScript? ChaiScript, like the T, ChaiT. Right, right. So I'm really happy that you're here to join me as a co-host of CBPCast, and hopefully we'll do a lot of great episodes together. I am too. Yeah.

Starting point is 00:02:19 So at the start of every episode, I like to read a piece of feedback. This week I got an email from Kaylin. Kaylin writes in, thanks for starting CppCast. It has been great to see the range of topics you bring up. My impression of the show so far is that the topics on C++ have really been for the hobby programmer or the small-scale application developer. Where I work at Altera, our C++ code base is over 40 million lines, which makes all facets of development from build, IDE, third-party library dependencies very different. For example, I, too, was very interested in CLI when it first came out and tried the EAP,

Starting point is 00:02:55 but loading our code base into the tool crashed it almost immediately. For those of us who work in C++ for a living on large code bases, I think it would be interesting to have a discussion on issues in that area. Caelan, I absolutely agree. I think we have had some talks that might be worth listening if you haven't already. I think David Senkel talked a bit about,

Starting point is 00:03:15 back in episode 3, about large codebases with CMake. So that might be worth listening if you haven't already. But I definitely agree and I would like to have more topics. I'm definitely not aiming this show purely at hobbyist developers. I want to talk to C++ developers of all backgrounds. Jason, is this something you work in, like really large codebases? The largest I believe I've worked on is around 600,000 lines of code.

Starting point is 00:03:40 That's two orders of magnitude, I think, bigger than I've worked on. It would be great to talk to someone who's had those special challenges about what they've encountered. Yeah, I work about the same size code base, about 800,000 lines of code. I can't confirm who I have planned for a guest next week yet. I'm still trying to get a confirmation from him, but I have been talking to a developer for a AAA game development shop, and I think he would probably have that sort of background where they're working on millions of lines of code. Yeah, so joining me today, joining me and Jason today, is Dr. Hartmut Kaiser, who is an adjunct professor of computer science at Louisiana State University.

Starting point is 00:04:28 At the same time, he holds the position of a senior scientist at the Center for Computation and Technology at LSU. He received his doctorate from the Technical University of Chemnitz, Germany in 1988, and he's probably best known through his involvement in open source software projects, mainly as the author of several C++ libraries he has contributed to Boost, which are in use by thousands of developers worldwide. He's a voting member of the ISO C++ Standards Committee, and his current research is focused on leading the stellar group at CCT working on the practical design and implementation of the Parallax execution model

Starting point is 00:05:02 and related programming methods. In addition, he architected and developed the core library modules of Saga for C++, a simple API for grid applications. Dr. Hartmut, welcome to the show. Hey, Rob. Hey, Jason. Nice to meet you. And you really don't have to say Dr. Hartmut.

Starting point is 00:05:27 You know, that makes me feel very old, so let's not do that. Okay. Not a problem. Do you have background, by the way? Going back to that feedback, do you have any background with really large codebases? Well, the largest I worked on was a spatial information system for Windows. It was about two and a5 million lines of code. So that's not quite the 40 million your other listener or

Starting point is 00:05:55 was talking about. Yeah, I really can't imagine what the scope of something that large would be. I'll definitely have to try to get some guest on who can talk about that type of experience. So before we get into the interview tonight, I wanted to talk about a few pieces of news. The first item is the Sea Lion 1.1 roadmap. And basically, Sea Lion released 1.0 a couple weeks ago, and they're just laying out their roadmap of features that they're going to try to release by the end of this summer. Jason, did you take a look at this one yet?

Starting point is 00:06:30 I did, yes. But I've only worked with CLion briefly, so I didn't have a lot of context for it. Yeah, though the one thing I saw that was interesting in there is they actually addressed that they're aware they have some issues with very large codebases,

Starting point is 00:06:46 and it's something they're going to continue working on, testing the IDE against large codebases. They're also working on some debugger improvements to include LDB integration, code style and formatting changes, and something with auto-import, which sounds pretty cool. So it's just nice to see that the product is out in 1.0 now, and they're obviously committed to making lots of improvements to it, which is great. The next article is Boost 1.58, which just got released. Now, Harmud, I know you have some involvement in Boost. Did you take a look at this?

Starting point is 00:07:25 Well, I try to very closely follow Boost, obviously, because I'm in the steering committee of Boost, and I contributed quite a bit of code to it. And I'm also a consumer of Boost, because the library we're developing is heavily depending on Boost. So we always have to follow it and with high interest follow Boost. And I'm very glad to see that Boost kind of has managed to come out of that hiatus

Starting point is 00:07:52 which occurred after the switch from SVN to Git. And now it's back on track and we will see more regular releases, hopefully. Yeah, that's great. The two libraries that came out with this new version are Endian, which provides facilities to manipulate the endianness of integers and user-defined types, and Sort, which is a generic library that has a better sorting algorithm

Starting point is 00:08:21 aimed at containers that contain large amounts of elements, like over 1,000 elements. So they both sound like very useful libraries. So go check them out. And there's an article in the show notes if you want to read more information. The next thing I have is the module proposal for C++ is now at revision 3. And this is actually just a very large PDF of the entire C++ proposal for modules. And I read through the introduction,

Starting point is 00:08:52 and it's just really interesting to imagine C++, I guess it's going to be C++ 17, where this might come out, where there might be these new modules and there might be less dependencies on headers. It's really hard for me to imagine what C++ is like without headers. Did you guys take a look at this or are aware of the module proposal? I haven't looked very closely, but I anticipate that very much

Starting point is 00:09:20 because as on any larger C++ code base, at some point you face the problem of header hull, right, where you run into cyclic include dependencies and this kind of fun stuff. And we really hope that the module proposal will help us resolving these issues because that's one of the really hard problems which are very difficult to solve. How about you, Jason? Did you read through this at all? I tried to. I got about halfway through it before... You did better than me then. I had a hard time following along exactly,

Starting point is 00:09:59 but I was very curious to figure out how it would solve the problems of a header-only library or something like Boost that has so many template instantiations. And I know they discussed that in the document, but I was having a difficult time following it, honestly. Okay, well, moving on to HeartMoot. You had a great talk at CppCon 2014 this past year on the subject of asynchronous programming in C++. Could you define asynchronous computing?

Starting point is 00:10:35 Asynchronous computing for me is essentially, in the simplest case, spawning off some work without immediately waiting for the work to finish. So some people call it asynchronous work. It may produce a result or a value or it may not. It might just signal its completion. And then you either wait for the asynchronous work at some later point to get back the result, or you attach a continuation or a function to that piece of work, which then automatically is run once the work is done.

Starting point is 00:11:13 While this sounds very much like parallelism, right, it's absolutely not directly related. It's kind of orthogonal to it. Because it runs just as well in single-threaded environments, but it runs just as well in environments with an arbitrary number up to, I don't know, billions or billions of threads. No problem. But the interesting thing for me, why I'm trying to work with it or try to investigate what you can do with asynchronous computing

Starting point is 00:11:44 is that we discovered that you can do kind of auto-paralyzed code. And for those guys who saw that talk, you mentioned there are some examples in there how you can actually do that, how you can convert straight code into something which doesn't look like parallel code but is run fully parallelized and utilizes the machine as much as possible. So very interesting, very cool stuff. Sounds very interesting. You talked about multithreading for a minute there. I think a lot of developers are familiar with multithreading

Starting point is 00:12:17 and some of the complexities associated with it. Do you think we're getting to a point where we don't have to worry about creating new threads anymore? That's a very good question. And as always, the answer is it depends. I gave a talk in December at Meeting CPP where I kind of juxtapositioned plain thread with the goto statement. Fifty years ago, people were really afraid to get rid of goto, and Dextra wrote his famous goto statements considered harmful paper. And the reason they didn't want to get rid of goto

Starting point is 00:13:01 is because they were afraid that this would impose performance penalties if they use higher level features like structured programming or function calls or kind of abstract data types or things like that and interesting enough i believe we have today the very same situation that people don't want to let go of plain threads because they believe that plain threads is the only way for them to get the maximum performance out of the machine. And so I named that talk plain threads considered harmful or plain threads are the go-to of today's computing. Because I really believe, just to answer your question, that if we create higher level facilities, in our case C++, and we are already along the way in that direction, then I think we will get to a point where we can do multithreaded programming or parallel programming,

Starting point is 00:14:07 let me call it parallel programming, in a much safer way. The same way as we restrict ourselves voluntarily not to use GoTo, we will come to a point where we will restrict ourselves voluntarily not to use threads directly. But that will not preclude us from actually using threads when we want or when we really need to, but we will have higher-level facilities which will enable the average programmer to write applications which run millions of threads just easily. Yeah, so I hope that answers your question in some way.

Starting point is 00:14:50 I'm curious. You were careful to make the distinction between parallel and multi-threaded programming. I was wondering if you could elaborate on that a little bit. Right. Well, I deliberately usually make the distinction between concurrency and parallelism. Those are relying on multi-threaded work. When I say concurrency, I mean two threads or more threads working concurrently at the same point in time on the same piece of data. So that's what's causing all these issues we have today.

Starting point is 00:15:22 And that's why I have that white hair, right? Because I do that for a living. On the other hand, parallelism doesn't necessarily mean that the threads run at the same time. The threads are decoupled from each other. They have no direct interdependency where they have to touch the same data at the same time. And the functional programming guys have shown us that you can create race-free programs by definition if you restrict yourself to a certain way of doing things like value semantics, trying to isolate the threads in a way that they don't create global side effects, right? You give all arguments you need to calculate some things to the thread and you get the

Starting point is 00:16:14 value back. And by doing that, that thread can't interfere with any other threads, but it will still do the work, right? And this kind of thing. So building facilities and techniques and restrict ourselves to a certain style of programming will help us along the way to essentially turn all the concurrency where the problems are into parallelism. And everybody can do parallelism

Starting point is 00:16:39 because if you don't worry that the threat is doing any harm to any global data structure or to any other data maintained by another threat, well, you don't care, right? Because that threat just runs, it does its thing, and that's it. So you don't have to synchronize, you don't have to make sure that you create race conditions and this fun stuff. Right. That makes sense. Yeah. Let's talk about futures for a moment, which were introduced in C++11. Can you go over futures for someone who hasn't been listening or has been paying attention to them?

Starting point is 00:17:14 Yes, sure. Well, I'm very happy that C++ now has futures because futures have been invented, you won't believe it, in 1975. Wow. The first paper about futures was published at that point. So a future is an object which represents a result which has not been computed yet. That sounds kind of shrouded, so let me explain it a bit. In C++11, the library has a facility which is called async. It's a function which launches a new thread,

Starting point is 00:17:55 and you give to async your function you want to invoke on that thread together with a set of arguments. You want to be passed to that function which is invoked on a new thread. So far, so good. Nothing interesting. But async gives you back a future object. And that future object represents the result of the function

Starting point is 00:18:16 which will eventually be calculated when the thread runs to completion. So async returns immediately immediately gives you the future, so you can continue doing things on the current thread. And at the point when you actually need the result of that other function you spawned off, then you call.get on that future object you got. And either the result has been computed already

Starting point is 00:18:42 because the other thread ran to completion very quickly and has delivered the result back to the future object. So you just continue, you get the result, nothing happens. And if the result has not been returned yet, so the future has not become ready, then the thread suspends inside GET. So from the standpoint of the user of the future, all you see is calling.get. And whenever.get returns,

Starting point is 00:19:12 you get the result of the computation which has been performed on the other thread. So it's a very, very convenient synchronization mechanism between two threads. And that makes it so interesting because it's a higher level facility which hides the notion of plane threats from you completely. And it allows you – and the future's it's a monad which which follows all those uh rules we know or might not know from haskell right and that means that we can do more things with it

Starting point is 00:19:54 which which are very very powerful and we might discuss those later on today and what's a monad for someone who's not aware uh let's not talk about too much going into right now okay yeah well uh if somebody's interested wasn't what a monad is just google it and you will find hundreds of introductions to that because um you know monad is a is a problematic thing because either you get it or you don't get it and at the point when when you get it, you can't imagine that there was a point in time that you didn't understand it. So everybody who got what a monad is writes some introduction,

Starting point is 00:20:31 what is a monad on the web, because he's so proud that he finally understood it. So you'll find a lot of information about that. Okay. Yeah. Is there any work going on in the Standards Committee to improve futures? Definitely. And that's very important important because I believe the futures as they are specified in C++11

Starting point is 00:20:51 are not as powerful as they could be, to say it politely. Not to say they're almost useless. For two reasons. First, futures in the standard document C++11 are kind of standalone things. There are no facilities which allow you to combine futures, right? So if you have two futures, you can't wait for both of them at the same time. Or you can't compose them sequentially, that you can say, okay, when this future is ready, actually launch, please launch that task, attach a continuation to it, and this kind of thing.

Starting point is 00:21:35 And that's the work which is currently going on. There is a technical specification in the works, which is about to be finalized, which adds several facilities to the future object around describing additional facilities for composition, for parallel composition and for sequential composition. The other work which is currently being done in the standardization committee is related to building executors. And executors is kind of an encapsulation or a generalization of thread pools. You can think of it that way.

Starting point is 00:22:22 And I really hope that the executed proposal, which is being discussed, will allow us to create futures in a more flexible way based on different execution policies, not only on standard threats like the standard currently allows us to do, but that you can use futures in special execution environments or for special schedulers or for this kind of thing. So very interesting things are going on in the standardization committee, and I'm very happy that this is happening. And two weeks from now, I believe, the next committee meeting is in Lenexa in Kansas, and I hope that we will make some progress to move that along. Are you going to be attending that meeting?

Starting point is 00:23:08 I will be there for a couple of days, yes. Okay. Is it possible to describe what function or what future composition might look like, or would that be too difficult to just describe over the years? No, I can certainly do that. Essentially, the technical specification looks into two different ways to compose futures. The first is sequential composition. As I already

Starting point is 00:23:35 said, if you have a future, you will be able to attach a continuation to that future, a function which will be automatically scheduled and executed once the future becomes ready, without you getting involved, right? And the other composition would be parallel composition, where you can say, okay, I have two futures, or three futures, or n futures, please give me another future which represents all of those, which will future which represents all of those, which will become ready when all of those have become ready, or if one of those has become ready.

Starting point is 00:24:14 So that feature is called when all or when any. And the interesting thing here is, again, that when all and when any both give you another future. And that future represents all input futures and the sequential composition is a member function on future itself which is called then dot then and dot then you give another function which will be triggered whenever the future becomes ready but again the interesting thing is that dot then gives you a yet another future which represents a result of the continuation. Okay? Okay. So the continuation could have a different return type, say, than the original future.

Starting point is 00:24:53 But in the parallel, they would presumably all have to have the same return type? Not necessarily. There are two different versions of it. There's a polymorphic version, which returns a tuple of those futures. And there's a version which takes two iterators and two vector of futures, for instance. And that gives you a data type back which represents all the futures

Starting point is 00:25:23 with the same return type. Do you know what languages might be inspiring the direction that futures are going in? I know I'm familiar with C Sharp, Async and Await and some of this sounds very familiar from what C Sharp is doing. I don't want to instigate a religious war here.

Starting point is 00:25:42 But the paper in 75 was clearly based on Java. Okay. So the Java guys were the first ones who thought about that, as far as I know.

Starting point is 00:25:53 There might have been others, but that's what I know. I want to interrupt this discussion for just a minute to talk about this special offer that JetBrains has made for CppCast listeners. JetBrains makes some awesome tools

Starting point is 00:26:04 for C++ developers in any environment. There is the ReSharper C++ plugin for Visual Studio developers, AppCode if you're working on iOS or OSX apps, or their new cross-platform C++ IDE, C-Line, which runs on Linux, Windows, and OSX. JetBrains is offering a coupon code which can be used to get a personal license to

Starting point is 00:26:25 any of these tools for 25% off. The code is cppcastjetbrainscpptool. All one word, just enter that code during checkout and where it prompts you for the discount code. Again, that's cppcastjetbrainscpptool, which will get you 25% off any JetBrains C++ tools, CLion, AppCode, or ReSharper C++. I was told this coupon will only be good for one month, so don't let it go to waste. So let's talk about the HPX library. What are the goals of HPX, and what is HPX to get started with? Well, okay, so HPX is a library we are currently working on here at LSU

Starting point is 00:27:08 and we have many collaborators. It's an open source project. We have many collaborators all over the world, in Europe, in South America, here in the mainland of the U.S. It's a general-purpose parallel runtime system for distributed applications of any scale. So that's quite a mouthful. Essentially, it's a library which sits in between your application and the operating system

Starting point is 00:27:46 and provides your application with higher level facilities with higher level services related to parallelization and related to improving scalability of the application and that runtime system is designed so that you can run it on one machine or you can run your application on a single machine or on your Android phone, but you can also run your application on the very large supercomputers on thousands of nodes with millions of cores. That's why of any scale. So the goal of HPX is to, as I said, provide higher-level facilities to application writers, ensuring or enabling them to write applications which take advantage of the full power of the machine. I mean, my laptop has eight cores, and every decent desktop nowadays has 16 cores, or even more. And I believe that my desktop two years from now will have, I don't know,

Starting point is 00:28:56 250 or 500 compute elements in there. So that parallelism is not going away, right? So, we have to find ways to deal with it. And if you have tried to write multithreaded programming or parallel programs, then you know what I'm talking about. It's a hard business today. It's almost impossible to write code with more than 10 or 100 threads at the same time, because that concurrency is essentially what's killing us, right? So we have to find facilities which allow us to turn concurrency into parallelism.

Starting point is 00:29:37 And that's where the futures come in. And we kind of realize that the future concept is just beautiful, because it allows us to kind of abstract away all of that. One more note before I shut up. HPX, and the beauty of HPX, I believe, is that it essentially exposes the same API, the same interface as you have in your C++11 standard library. All what we did, we extended that a bit, A, to improve things to work better on the local machine, but we extended it also so that you can use your async and your futures and data flows on a cluster,

Starting point is 00:30:29 on a machine which consists out of hundreds of thousands of nodes which are interconnected with a network. So I think that's a quintessence of what HPX actually is. So on that last point you made, so if you've been using standard futures and async, and you want to gain the benefits of HPX, you could just do kind of a find-replace and immediately gain some improved performance, you're saying?

Starting point is 00:30:55 Yes, well, you have to change the namespace from std colon colon to hpx colon colon. But from that point on, it should actually recompile. So what we try to do, we try to really conform to the standard as closely as possible. And luckily enough, the standard doesn't specify an implementation. It just tells you what the interface has to be. So you can do whatever you want in the end. And we implemented a very highly efficient

Starting point is 00:31:22 threading model, which is essentially based on a user level threading model or some coroutine style model, which allows us to bring down the overheads of thread creation into the sub-microsecond range. So we can create a thread in 400 nanoseconds. We run things on that thread, and then the thread dies. And that means, in the end, a thread is not a scarce resource anymore. In normal threads,

Starting point is 00:31:55 when you talk about kernel threads, you are very cautious to create more than 2,800 threads, right? Because then the machine might not do what it's supposed to do, and you might run into issues. But in our case I can run applications with millions of threads on my laptop just fine, no problem whatsoever. And that gives you a lot of performance benefit because you... Well, the overhead of creating a kernel thread is in the millisecond range, right?

Starting point is 00:32:29 So we brought it down to three to four magnitudes. Sorry, I didn't look when I was first looking at HPX, but what platforms do you support? Oh, you said Android. You said almost anything, huh? Well, it's C++, so we support almost anything. We have it ported and use it on Windows, Mac, Linux, Android, BlueJingQ. Just yesterday, one of the collaborators in Germany told me, hey, by the way, it runs on the paralleler board, which was kind of cool. So 32-bit ARM. And that's possible because it's C++, right?

Starting point is 00:33:17 So it's 99.99% C++ and 20 lines of assembly code, which is nicely abstracted away, which is doing the context switching of this, right? And that's it. Everything else just compiles if you have a decent C++11 compiler. So you don't rely on any operating system-specific calls? Well, underneath, we certainly rely on POSIX-style things, because we have to create a kernel thread to run something. Otherwise, we couldn't do it.

Starting point is 00:33:40 So we do that with BoostThread currently, or with Stidthread. We rely on file operations if you want to read and write things. So the basic operating system functionality is still used, but it's underneath HPX, right? So it's kind of hidden. And we use the operating system as a service provider for us, essentially. Interesting. There's a mention of features in data flow in the description of HPX.

Starting point is 00:34:07 What's a data flow exactly? Let's go back to async. Async is that function which launches a function on a new thread and gives you a future. So what happens when you pass another future as an argument to async in order to pass it along to the function?

Starting point is 00:34:32 Then the function would have to make sure that the future becomes ready because it can use the value which is encapsulated in that future. So what Dataflow does, it will guarantee that the function will be spawned only when all futures, which are arguments, have become ready. It will delay the scheduling of the function until all argument futures have become ready. Okay? So it's essentially the same as async. The only difference is that it delays the function scheduling in time until all arguments, which are futures, have become ready. That's the only difference. You might think, hey, that's a minor semantic difference.

Starting point is 00:35:17 Why do you do that? Why can't the function just wait for the futures, right? Well, there are two answers. First, we quickly discovered that you want to avoid to wait for a future. You want to avoid to call.get on any future because a.get imposes a context switch on you, right? Because a threat might suspend if the value is not ready. Well, that is not a problem because the HPX would just run other work while that particular threat is waiting.

Starting point is 00:35:50 But you have to keep the stack of that threat alive, so you kind of use up resources. And if you have a lot of suspension operations in flight, you have a lot of stacks in the system lying around, and that just eats up your memory, the virtual memory, right? And the second reason is that it's just inconvenient if your function has to synchronize on the futures itself, right? If you pass a function and say, hey, I know that this function will be invoked only when all futures have become ready. So the function doesn't have to worry about that. And in the end, these minor semantic change essentially changes and is kind of the essence of that auto-parallelization I was mentioning.

Starting point is 00:36:40 So just imagine you have two, well, without showing code, it's very difficult to explain. But what you essentially can do is that data flow makes the data dependencies in your algorithm explicit. Because you know the function will be executed only at the point when all input arguments have been calculated. So the data dependencies of that function are expressed by the data flow itself. boils down to running threats or multiple operations in parallel in a way so that all inherent data dependencies of your algorithm are observed. You don't want to start executing some operation before all input arguments have been computed. When you have to do A plus B, you better don't start calculating A plus B

Starting point is 00:37:42 if A and B have not been calculated yet. In our minds, with that sequential or imperative programming, we are kind of used to the fact that, well, the next line will be executed when the previous line has been executed, right? So we have that implicitly in there. our parallelizing compilers today do, they look at the code they see and they try to reconstruct the dependencies of the variables you have in the code, right? So the compiler has to try very, very hard to reconstruct these data dependencies

Starting point is 00:38:18 just to figure out what can be run in what sequence. But if we make these data dependencies explicit in our algorithms, in our code, we can auto-paralyze without even having a parallelizing compiler. Because if we know which things have to be executed in what sequence, we can schedule them in that sequence easily. And the futures do that, because the future represents a data dependency and the future becomes ready only when the function has finished computing, which delivers that result. And when the future becomes

Starting point is 00:38:57 ready, it either triggers a continuation, which has been attached by dot then, so it triggers the next operation, or you have three futures you put together with one all, and you trigger another operation when these three futures have become ready. And Dataflow is a very, very nice high-level facility which enables us to write these auto-paralyzing algorithms. And if you just watch that talk or these two talks I gave last year in CPPCon and in Berlin, both are on YouTube.

Starting point is 00:39:27 I have a couple of examples in there which demonstrate that in a very nice way, I believe. Okay. Sounds complex, but if you look at it, it opens your eyes and you say, wow, really? And it's highly addictive, by the way. When you start coding that way, you suddenly can't let it go anymore. So HPX is currently in version 0.9.10. I'm just wondering, if you're a C++ developer interested in HPX,

Starting point is 00:39:58 should we be waiting for version 1, or is it pretty stable to start using now? That's a tough question. Well, API-wise, as I said, we align ourselves very strongly with the standard. On the other hand, the extensions we created in order to support things like data flow or to support that distributed operation are new stuff, which we kind of develop, try it out in our applications, and then we say, nah, not quite. So I'd say 0.9.10 is a stable release.

Starting point is 00:40:46 We believe that it's very stable in terms of usability. We have shown that we outperform a large amount of existing parallelization technologies like OpenMP or MPI in a distributed case or TBB or things like the PPL algorithms. So we can outperform them by using these techniques I was describing very nicely. And we have shown that we can run up to 20, no, 2,000 nodes with 48,000 cores. So that's quite sizable in terms of scalability. So it's stable in the sense that it does the job it's advertised, it's being advertised. What I can't guarantee is that things won't change a bit here and there. But I believe that's kind of normal for people who are kind of adventurous and say, yeah, that's cool technology.

Starting point is 00:41:56 I want to use that. I don't think they will not object if we try to improve things and document that properly and tell them at a time. And since we have quite some user community already, I think we developed some style of working with a community so that we will at least document breaking changes properly and make sure that the users can port their code to the next version if they want. Right.

Starting point is 00:42:26 How difficult is it to bring HPX into an application? Well, as you said, if you have a standards-conforming application, you essentially go over it and change the namespace. That might not give you the result you want in the first run. It will give you a running application, but you might not be able to take full advantage of HPX if you do that. On the other hand, as I said, in order to take full advantage of the techniques we developed, you have to redo

Starting point is 00:43:05 your applications a bit. You are a C++ developer and Jason, you are as well. You know that when you optimize your code, then the most performance gain you usually get from changing the algorithm.

Starting point is 00:43:23 Right? So changing just some micro-optimizations, just optimizing that function gives you only so much. The real performance gain you get when you change the algorithm, the underlying implementation of the thing you're looking at. And that's true here as well. If you want to take advantage of these techniques, which, by the way way we call futurization

Starting point is 00:43:46 in the metaphorical sense in the direct sense then you have to touch your application but we'll see there are other things in the standardization committee currently under discussion that await you were mentioning. Await is a language facility which will support synchronization

Starting point is 00:44:12 in these very fine-grained task-based parallel systems. And I believe that will simplify coding with HPX a lot. Okay. I'm a little bit curious. I feel like I have a good grasp for what the advantages are on a single system. You said you have much lower latency for starting new threads and you can do a lot more

Starting point is 00:44:34 things concurrently. What's the picture look like in a distributed system? I would assume there has to be more overhead in starting a future, but what you start a future and now you don't know the system will decide which computer it runs on is it that high level of an interface yes essentially what we what we built was hpx as a global address space in across all nodes you use

Starting point is 00:44:59 so all machines the application is running on sit on top of a single address space. It's essentially a virtual 128-bit address space. And when you create objects in that address space, you get that 128-bit address, and you don't care anymore on which nodes that particular object lives because you can't even move it around to a different node, and your program will not be the wiser. So you either can specify where to create things, or you can create something which we call a distribution policy, which allows you to make more intelligent decisions where to place things,

Starting point is 00:45:40 where to execute a certain piece of work and this kind of things. The main gain we get from parallelizing when we use HPX in a distributed context is from the ability to better overlap communication with computation. The existing systems like MPI kind of impose a lockstep mode on your application where each node is doing the same thing, and then they communicate at the same time. And then they do something, and then they communicate again. So that means you create these global barriers in the execution of your code. And if only one of the nodes takes 10 times longer than everybody else,

Starting point is 00:46:22 then everybody else has to wait for that slowest guy to come to that barrier, which imposes a lot of overhead. If you turn that into kind of this is very more fine grained task based parallelism, where you synchronize not all nodes, but only those objects which actually participate in a particular computation, then you have a much finer-grained parallelism, much finer synchronization, so parts of the computation can go ahead where other parts of the computation can lag behind because they have more work. And that allows you to do work, to do actual computation while a network communication is currently underway. Essentially, you can think of it in a simple case as a remote procedure invocation mechanism. You send off the message, and while the message is in flight, you actually do other things.

Starting point is 00:47:22 And you wait for the return message to come back at some later point. And that return message is represented by a future. So you can synchronize on that return message. So it's like a remote procedure call, which gives you a future representing the result of that remote operation. And that allows you much, much more flexibility and better utilization of the machines than we can do that today, especially in larger scale systems where these global barriers are just killing you. You talked about the HPX community for a moment there. Do you take pull requests or is it just being worked on from the students at LSU? Absolutely, we take LSU? Absolutely.

Starting point is 00:48:07 We take pull requests. Absolutely. HPX is a fairly mature open source project. It's about 500,000 lines of code. We have

Starting point is 00:48:20 about 10 people actively contributing every month. About 70 people have been contributing since HPX was born. So there's quite some activity going on. If somebody wants to look into that, by all means, try to find us on our IRC channel. It's on Freenode. I think you will publish a website of HBX where

Starting point is 00:48:50 everybody can find the links needed. The other thing I wanted to mention is that our group has been selected as a Google Summer of Code mentoring organization this year again. Last year, Google was funding three students to work on HPX.

Starting point is 00:49:10 This year, Google funds five students. So we are very, very proud because that puts HPX kind of into the realm of the 130 largest and well-known open source projects, Google funds and the Google Summer of Code projects. So we're very happy about that. So by all means, if somebody wants to contribute or just get involved or just use it, please get in contact. We have a very, very active IRC channel

Starting point is 00:49:38 where you can ask any questions and you will find always support. Because there are people all around the world on that IRC channel, so there isn't a time in the day when you don't have anybody answering questions. Somebody is always there. That's great. So thank you so much for your time, Hartmut. So where can people find you online if they want to see more information about you or more information about HPX?

Starting point is 00:50:04 Well, I hope you will publish the links on your web page. We have a GitHub repository. If you look for HPX on GitHub, you will find it. It's the first hit. So that's the easiest way to find us. Myself,

Starting point is 00:50:21 I have a web hosted by CCT, the Center of Computation Technology here at LSU. Or just send me an email and get in direct contact with me. No problem. Just Google for Hartmut Kaiser C++. It's the first hit. You'll find it. No problem. Okay.

Starting point is 00:50:42 Thank you so much. Thanks, Rob. Thanks, Jason. It was a pleasure talking to problem. Okay. Thank you so much. Thanks, Rob. Thanks, Jason. It was a pleasure talking to you. Okay. That was a great interview, wasn't it, Jason? Yeah, that was good. I had not been exposed to HPX before. Neither have I. It falls into line with a lot of stuff that I've looked at in the past. Yeah, it's definitely something I'm going to have to look into and see if it's something I could use at my day job. So I want to talk just for a minute about how we are both going to have a pretty busy schedule over the next few weeks.

Starting point is 00:51:15 I'm going out to build, and you're going to C++ now, right? Or C++ now. C++, yes, yes. So I don't think we're going to have a show next week unless I'm able to get us a guest for Monday or Tuesday before I fly out to San Francisco. But I just wanted to let everyone listening know that the two of us are going to be traveling to these conferences,

Starting point is 00:51:37 and I'm actually going to be wearing some CppCast shirts that I made for myself. So if you see me at any of the c++ sessions at build feel free to come up to me i'd love to hear what you think about the show i'm also going to be trying to talk to all the c++ session presenters at build to see if we can get some guests out of the visual c++ team from microsoft and i hope you can do the same, right, Jason? Yeah, I will definitely ask around. I don't have any CBPCast shirts to be

Starting point is 00:52:10 wearing. I'll have to get you one. You might see me in a ChaiScript shirt, too, if I get the chance. Sure. Okay, well, that's it for an episode. Thank you for joining me, Jason. I think it went really well. Thank you. Thanks so much for listening as we chat about C++.

Starting point is 00:52:27 I'd love to hear what you think of the podcast. Please let me know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, I'd love to hear that also. You can email all your thoughts to feedback at cppcast.com. I'd also appreciate if you can follow CppCast on Twitter and like CppCast on Facebook. And of course, you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode is provided by podcastthemes.com.

CppCast - Asynchronous Programming

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.