CppCast - Parallel Computing Strategies

Starting point is 00:00:00 This episode of CppCast is sponsored by Undo Software. Debugging C++ is hard, which is why Undo Software's technology has proven to reduce debugging time by up to two-thirds. Memory corruptions, resource leaks, race conditions, and logic errors can now be fixed quickly and easily. So visit undo-software.com to find out how its next-generation debugging technology can help you find and fix your bugs in minutes, not weeks. CppCast is also sponsored by CppCon, the annual week-long face-to-face gathering for the entire C++ community.

Starting point is 00:00:33 Get your ticket now during early bird registration until July 1st. I wanted to put in a quick disclaimer about this episode. Dory Eksterman, who's going to be our guest today, was a great guest. But there were some audio quality issues with this episode. Dory comes from Israel. He's on the other side of the world from me and Jason. So I think the internet just wasn't performing well or Skype wasn't performing well. And there's some audio hiccups in Dory's conversation with us.

Starting point is 00:01:03 So I just wanted to make you aware of that. I did try to take some of it out in editing, but there with us. So I just wanted to make you aware of that. I did try to take some of it out in editing, but there's only so much I can do. So please enjoy the episode. Episode 49 of CPP Cast with guest Dory Eksterman recorded March 16th, 2016. In this episode, we go over more of the trip reports from C++17. Then we talk to Dory Ekstman from Incredibuild. Dory tells us about the pros and cons of multi-threading versus multi-process and more. Welcome to episode 49 of CppCast, the only podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner.

Starting point is 00:02:15 Jason, how are you doing today? I'm doing all right, Rob. How about you? Doing pretty good. No real news to report from me. So let's just jump into a piece of feedback. This week we got an email from Demeter, I believe is the name. And he writes in, I finally managed to listen to all episodes. You're doing a great job. It's nice to learn new and surprising stuff about C++.

Starting point is 00:02:38 I especially liked the episode with Dimitri Nustruk. He was just talking and talking, never slowed for a second, and everything he said was super interesting. I bet he could go for hours and no one will get bored. He then has a couple suggestions for guests. Jeff Preshing from Ubisoft, John Carmack, Paul Pedriana from EA to talk about EA STL, and, of course, some of the big ISO folks like Herb Sutter, Bjorn Stroustrup, STL, or other compiler implementers. They're definitely some people who are on our radar to get on for the show. So we'll definitely keep trying to get some of them on.

Starting point is 00:03:14 Yeah. Yeah. We'd love to hear your thoughts about the show as well. You can always reach out to us on Facebook or Twitter, and you can email us at feedback at cppcast.com. Joining us today is Dory Eksterman. Dory is an expert software developer and product strategist. Dory has 20 years of experience in the software development industry. As CTO of Incredibuild, he directs the company's product strategy and is responsible for product

Starting point is 00:03:40 vision, implementation, and technical partnerships. Before joining Incredibuild, Dory held a variety of technical and product development roles at software companies with a focus on architecture, performance, and advanced technologies. He's an expert and frequent speaker on technological advancements in development tools specializing in Embarcadero, environments, and manages the Israeli Development Forum for these tools. Dory, welcome to the show. Hi, guys. Thanks very much for inviting me. Thanks for these tools. Dory, welcome to the show. Hi, guys. Thanks very much for inviting me.

Starting point is 00:04:06 Thanks for joining us. Yeah, so we have a couple of news items to discuss before we start talking to you about multi-core, multi-process, and some of the stuff you're doing at Incredibuild. The first one is we're still getting a lot of news from the C++17 ISO committee meeting. And Herb Sutter wrote a pretty excellent and very lengthy trip report that was a good read. I really like this graph that he puts in towards the end showing the progress of the various TSs and describing them kind of as feature branches

Starting point is 00:04:45 that will eventually get merged back into the trunk. And it does kind of highlight how things like modules and concepts are pretty new and we shouldn't be too upset that they didn't make it into C++17 yet. And he also writes how they're actually thinking that they could switch from a three-year cycle to a two-year cycle. So it might be C++19 next instead of C++20. What were your thoughts on this, Jason? Well, there's a few things in the actual report I'd like to call out. But looking at the graph that you just mentioned, which I had not yet looked at closely, I didn't really appreciate it. It looks like it took C++98 about eight years

Starting point is 00:05:27 to standardize and C++11 also about eight or nine years to standardize. I mean, now they're talking about a two-year cycle. Holy cow, that's four times the improvement. Well, I guess each version of C++ does give us efficiency gains that help us write faster code. So maybe it's working that way for the standards committee also. Well, he kind of points that out too. And that's why C++ 11 was such a major release because they had nine years to work on it and perfect it. And that's why they're not going to be able to have maybe as major a release with the new model. But with these two-year, three-year cycles, they'll have more kind of medium releases, I believe he's saying. Right.

Starting point is 00:06:16 Dory, what were your thoughts on this? So actually I see a lot of companies, you can see, especially I can see it in Microsoft, moving towards more agile kind of development. So it allows them to get more releases more frequently. And I think that especially in commercial companies, keeping the competitive edge, you know, you can have a feature ready. And unless you actually release it, this feature is not available for your customers. So we can see that everywhere in the industry.

Starting point is 00:06:50 People are moving towards the shorter release cycles. And areas that expand, such as continuous integration, DevOps, and L development, actually allow you to do that with more unit testing and integration tests and all these kinds of infrastructure allows companies and companies do consider and do understand the benefit of moving towards shorter release cycles. So I think that that's something very good that's going on in the entire industry and not only in C++ and in C++17.

Starting point is 00:07:25 But yeah, I'm very happy to see that they are moving towards this kind of agile, continuous releases as well. Yeah, I definitely agree. It's just because of the nature of the committee meetings and everything, because it's kind of a side project for everyone who does it.

Starting point is 00:07:42 It doesn't really feel agile. Two-year releases doesn't feel agile. But it definitely is compared to, you know, the 11-year or nine years' worth of work that went into C++11. Well, and really, if there was a release like every year, it would be just too hard to know which features you're allowed to use. Yeah, you can't do that. So there's a couple things, actually, in his trip report I don't believe we mentioned last time. Yeah, you can do that. is yes, they approved the following features that they were not yet voted in,

Starting point is 00:08:25 but they are on track. Okay. So it's kind of a static if. Personally, I really don't think the syntax looks very nice. Because, I mean, well, the constexpr if, fine. constexpr space if, and then it evaluates something at compile time. But if you have to have an else block it's constexpr underscore else and i feel like this kind of yeah that's a bit odd

Starting point is 00:08:53 yeah it's a disconnect in the in the yeah syntax there anything else you want to call out before i move on um i thought was the thing, but it does look interesting. I think everyone should take a look at that if they're interested in the standards development at all. Yeah. So this next article is another one from the Red Hat blog talking about testing GCC, and they have a pretty interesting method of testing the compiler by running it against thousands of packages that are actually

Starting point is 00:09:27 in the wild and making sure everything compiles. I thought this was pretty interesting. Right, Jason? Yeah, they basically rebuild Red Hat every once a year with the newest compiler to see what will happen. Is it just Red Hat or is it like open source projects as well? Reading between the lines, I assumed it was RPM packages. So packaged open source projects that can build on Red Hat was what I believe it is. Right. But it's at 17,000 packages now. Yeah, that's a lot of packages.

Starting point is 00:10:02 What were you going to say, Dory? It's actually amazing, and we can see that everywhere, because as the years go by, people have more code base and more tests that are running. And actually, I think

Starting point is 00:10:18 that I'll refer to it further on in our discussion, because it's very relevant. In IncrediBuild, we're doing something which is very similar inside IncrediBuild as well due to the fact that we accelerate build tools. We have a lot of open source projects that before each release, we check our release

Starting point is 00:10:39 and try to build with IncrediBuild all these open source projects. So we have kind of the same thing as they have with GCC. And we really see that going on and on in terms of size and compute capability that is really the required capacity that is required in order to do that in a reasonable amount of time. So I think that that's very important for our further discussion. Right. And in this last article is another

Starting point is 00:11:07 trip report from JF Bastion from Google, who we've had on the show before. And this one is a good read if you were angry or disappointed from the results of C++17. He's kind of talking the internet community down

Starting point is 00:11:23 a little bit, which is probably necessary for a lot of people. Not really too much to add to that, right, Jason? I just noticed that he quotes ISO Trump++. I don't know if you've seen that Twitter. I didn't notice that. It's right in the first section. He says, as ISO Trump++ says, let's make C++ operator greater again. That Twitter account is absolutely ridiculous and quite a sign of the times today, if you haven't checked it out. Yeah, I did notice this Twitter account the other day. It's an interesting merger of American politics and C++. Yes.

Starting point is 00:12:04 Yeah. Is there anything you wanted to talk about with this one, Dory? No, I think I'll pass on this one. Okay. Well, let's start talking to you about this talk you gave last year at Meeting C++ with parallel computing strategies. Can you give us an overview on your thoughts of multi-threaded versus multi-process? Yeah, so I think the first thing is the motivation for creating this presentation.

Starting point is 00:12:34 As a CTO in IncrediBuild, I frequently communicate with development managers that like to like their application to perform better, to use distributed computer or to use cloud computing in order to gain, you know, better performance and to be able to use more resources and not only the resource of the local machine. But during these conversations, I've seen many times that people chose the wrong kind of parallel computing strategy for their software. And they didn't have the means, you know, to understand which strategy is best for them. So following this, you know, I had a lot of these kind of discussions and I wanted to give this to our customers and, you know, the community and try to locate some kind of comprehensive list that allow architects to go over for choosing the architecture for more parallel computing

Starting point is 00:13:42 that suits their needs. Surprisingly, I couldn't find one in the Internet, so I took everything that I knew of parallel computing through my experience and things that I found in the web and tried to assemble some kind of guide, some bullet points that that allow you to determine which kind of part of the computing strategy better fits your needs. And I did it in order to contribute this to the community. And I think that that's kind of important because from my experience,

Starting point is 00:14:28 people are just starting their development. They are not thinking about the future of their product and the kind of capacity of CPU consumption that they will have in the future. And once their product succeeds and then usually the problem gets bigger, and then they need to spend more time in solving it. Solving, if we're speaking about solvers, or CAD, or compilations, or testing, or various other scenarios, they kind of reach a kind of bottleneck in their performance, and when they wanted to scale, they sometimes see that they are limited by the way that they implemented their parallel computing. So, what I did in this session is

Starting point is 00:15:19 to gather guidelines that allow you to ask yourself a set of questions that will allow you to determine which kind of parallel computing strategy better suits your problem. So that was the motivation of the motivation. And giving these presentations in C++, I recently did a similar presentation in St. Petersburg in a C++ conference as well. I also get a lot of feedback from developers that continue and contribute to this list

Starting point is 00:16:06 of bullets for developers and architects to consider before they choose whether to use multi-threading or multi-processing in their software. So can you give...

Starting point is 00:16:21 I'm sorry, go ahead. I was just going to ask if... Can you give us an'm sorry, go ahead. I was just going to ask if... Yeah, can you give us an overview of what the pros and cons of each option are? Yeah, I can. I think it will be a little bit short. Usually this session is an hour long. Oh, yeah, absolutely.

Starting point is 00:16:41 Just the highlights. Yeah, high level. Just the highlights. So the pros of, let's start with multi-threading. So I think that the pros is, the major one is the easy and fast way to communicate and share data between threads. So that's very easy. They all have the same process space and they can easily use variable classes and any other

Starting point is 00:17:07 kind of structure to share data and to communicate, and you don't need to have anything special in order to achieve that. It's also easy to communicate with the parent process, the process that invokes these threads. One of the areas in which mult-threading is very beneficial for, and I can see that with large customers that have, for example, small servers, is large data sets that can be divided to subsets. So if you have a very large data set, and you want to get all these data sets into your memory, and all the threads are looking into random areas in your data set. So you are actually not able to divide it into subsets, and you will not want to load this data set for each process,

Starting point is 00:17:56 because it will consume a lot of your memory. So that's another area in which multithreaded is relevant. For example, you can see that in artificial intelligence, in weather forecasting, in genetic algorithms, and in areas like that. So that's something that multithreading is better for. And also multithreading is supported by many libraries that allow you to have thread-safe access to data structures that allow you to access these data structures from multiple threads without you needing to be concerned about trace conditions

Starting point is 00:18:33 or score locking or things like that. So that makes it quite easy to develop. And it's easy to start. People are usually, when they see a scenario in their application that they need multiple elements that they want to execute in parallel, usually they go for multi-threading. And actually, that's also where the whole problem starts

Starting point is 00:18:57 because we as developers, we know that once we started, usually we don't go back. So people are starting with multi-thed, and then usually the entire application is multithreaded, and that's where they start having this problem that continues with the system, if that's not the right strategy for them. So the cons for multithreading is there are quite a few. So one of them is if one of your threads crashes,

Starting point is 00:19:29 it will crash your entire process, your entire application, which is not something that you want to have. And we know that bugs happen, and we sometimes don't catch all of them. And multithreaded is something which is risky. If you want your application to always run and not crash, that's something that's hard to recover from. Another thing is that multi-threaded are hard to debug. If you have race conditions or locking issues or memory stuff like this,

Starting point is 00:19:57 I'm sure that a lot of your audience already, you know, had a lot of time-consuming bugs similar to this. So these are things that are very, very difficult to solve. And also, you need to consider your developers. If you are a team leader and you have a developer which is not that experienced with multi-threaded development, he will usually be able to achieve the goals that you set for him. But the way that he develops the code can really affect your application in the future.

Starting point is 00:20:38 And, you know, maintaining a multi-threaded, a complex, multi-complex threaded application can be more costly than actually developing it from the first place, especially if you don't have the environment and you need to try and solve these problems remotely on your users' sites or things like that. So these are quite complex scenarios. If you have shared data between your threads, you need to maintain, and usually it's difficult to maintain, the set of 10 blocks and make sure that you don't have race condition stuff. Another thing is that having too many threads can result in having

Starting point is 00:21:23 too much time spent in context switching between the threads instead of using the CPU for actual calculations. And I think the last thing is, this that multithreaded is less relevant for scalability. You have, usually with multithreaded, you'll only be able to use your multicore machine. You will not be able to easily distribute these threads either to other machines or the public cloud, which is quite a limitation. So if you have a lot of tasks in the future, usually customer, you know, application developers start with small problems. They have

Starting point is 00:22:21 four threads, six threads, eight threads, eight threads, calculate the scenario. But when you start having thousands of threads, that's where things become a bottleneck in terms of performance. And that's where scalability is something that you'd like to have. With multi-threaded, scalability is very, very difficult to attain. Okay, so what about some of the pros and cons of multi-cess? Okay, great. So I think that there are kind of a mirror of one another. So the first thing that is very important to understand is if you have a multiprocess application,

Starting point is 00:22:57 it doesn't mean you don't have a multithreaded application. So if you can have multiple processes, but each process also contains multiple threads, and that's something which is, you know, that's the same thing that we have in IncrediBuild, for example. One of the main benefits is if one of your processes crashes, it won't crash your entire application,

Starting point is 00:23:20 especially if you know how to recover from it. For example, in IncrediBuild, we distribute processes, not our processes, but our customers' processes to remote machines, and a process can fail. The machine can be disconnected. Things can happen. And due to the fact that we know that these kind of things can happen, and we actually control that, that's not our processes,

Starting point is 00:23:44 it's our customers' processes. We have a mechanism that allows us to recover from this kind of crashes. So multiprocess allows you to recover from this kind of process crash, which is something that can be common, especially if you have a large deployment product. It's much easier to debug. It's much easier to debug a single process than debug a process that has hundreds or dozens of threads, especially if you know how to encapsulate or proxulate your process in an optimized manner.

Starting point is 00:24:21 You'll have usually less locking issues, but it actually depends on the infrastructure and API that you're using, and it's scalable. So if you started your application and you have eight processes and you have an eight-core machine, that's fine, but if you then need to execute 800 processes, you can use clusters, you can use distributed computing, you can use cloud computing to gain better performance and provide your customers with better performance. In terms of business, you know, people today are looking for SaaS services, service and the ability to sell, to upsell to their customers some kind of service.

Starting point is 00:25:06 And scalability, and especially scalability to the cloud, is something which is very interesting in terms of business aspects. The cons are really, and everything here that I speak is really dependent on the kind of application that you're writing. But generally, the cons of multiprocess are that if you want to have communication between your processes, you need to do some custom development. You need to use some kind of communication protocols, whether it's TCPAP, UDP, broadcasting, anything that exists today, in order to communicate between these processes.

Starting point is 00:25:47 And this can introduce both complexity of development and also some kind of latency in the communication between the processes. One of the ways to solve it, by the way way is to have some kind of shared memory which is an element that allows you to share a memory segment between processes in order to minimize the latencies and then you have a kind of performance which is similar to multi-threading but in order to develop it

Starting point is 00:26:21 you require some professional guys in your team because that's not quite easy to develop and maintain in a complex system. And the last thing I think is that you have the smallest set of libraries that support multi-process development in terms of, you know, data structure or as I said, shared memory. You don't have these kind of solutions as you have with multi-threaded.

Starting point is 00:26:52 So that's more or less in a nutshell very, very high highlights. You can, you know, your audio, your listeners can take a look at the entire session in which I delve into different kinds of applications and understand which strategy they took for example if it's a CAD, usually it will use multi-threading unless the problems are very large and then you have some

Starting point is 00:27:24 CAD CAM software which develops their application in a multi-process manner because they knew that they would want to scale in the future. If you have some AI weather forecasting genetic algorithms in which you want to have a lot of synchronization and communication between the multiple elements, then usually multi-threaded is also usable. If you're speaking about rendering and compilation and tests and things that are actually not dependent and they are quite atomic, so usually you'll benefit more from having multi-process technology. I don't see much value in having a multi-threaded application executing elements which are not

Starting point is 00:28:14 communicating between one another and don't use the same kind of data. So it really depends on the application that you develop. And I think it's very important to understand. And each of these highlights is delved into in the presentation that I gave, but I think that we have the time for it now. So that's just the highlights for it. Yeah, we will have a link to that YouTube version of that presentation in the show notes, if our listeners want more information on that, too. Great. Sorry, go ahead, Jason.

Starting point is 00:28:56 I was going to say, you mentioned IncrediBuild now. Do you want to go ahead and give us an overview as to what your product, IncrediBuild, is? Yeah, I'd be happy to. I think that just one more thing before we delve into IncrediBuild is in the presentation, I just took two build systems, Make versus Maven. Yes.

Starting point is 00:29:19 The reason that I took them is because they actually have, they try to solve a very similar problem. They are compiling code. They are very common. A lot of people are using them. And they both have a lot of tasks. These tasks are atomic.

Starting point is 00:29:37 They are compilations. They usually don't communicate with one another. They have very small data sets. And due to the fact that large projects have a lot of tasks to compile, scalability and performance can be an issue here. Sorry. So in this presentation, I just took a look at Maven versus Make.

Starting point is 00:30:02 And so how they both started early on. So, you know, in the past, where you didn't have a lot of multi-core machines, and also the code base was quite smaller than it is today. And what I've seen is that in Maven you are kind of restricted to your workstation or your build server. So if you have an 8-core workstation or 32-core build server, that's the amount of process, that's the amount of

Starting point is 00:30:41 computation that you'll have for your entire build. And it used to work, especially since Maven is usually compiling Java, and Java is shorter in terms of compilation than C++. But still, you know, people today have a larger and larger code basis. And

Starting point is 00:31:00 I saw in the internet people are asking for multiprocess from Maven in order for them to be able to use distributed solution, similar to IncrediBeal or GCC or others, in order to accelerate their compilation times. But due to the fact that Maven was constructed in a multithreaded way, a third-party tool cannot intervene and cannot distribute these threads.

Starting point is 00:31:28 Distributing mechanism cannot distribute a thread. The smallest atomic unit that you can distribute is a task, is a process. And that's quite a limitation. And the other thing is that you cannot intervene. For example, with IncrediBuild, we have the ability in Visual Studio to know and determine the optimized way

Starting point is 00:31:49 to execute tasks in parallel. So we can intervene with the way that processes are being executed. And if we see that a process, although Visual Studio, for example, will run a few processes in a sequential manner, IncrediBuild can intervene due to the fact that these are the processes and actually force Visual Studio to execute them in parallel.

Starting point is 00:32:11 And that will result in optimized utilization of your own resources, your own machine resources, even without distribution. With Maven, this is something that you can't have because you cannot determine and you cannot intervene with the way that Maven executes its threads. So, with Maven, you are limited to the amount of resources and CPUs and memory that you have in your machine. And with Make or Visual Studio, for example, if you're using CrediBuild, just to compile Qt, for example. So Qt will take you something like 16 minutes to compile on an eight core machine. But if you're using something like IncrediBuild to accelerate it and use additional cores and distribute

Starting point is 00:32:55 to a compilation task, you can share a build which takes one and a half minutes instead of 16 minutes, that will use 130 cores across your local network. So that's a huge gain for developers and for your product as well in terms of time to market and others. So I think that that showed the case, yeah. I wanted to interrupt this discussion for just a moment to bring you a word from our sponsors. You have an extensive test suite, right? You're using TDD, continuous integration, and other best practices. But do all your tests pass all the time? Getting to the bottom of

Starting point is 00:33:33 intermittent and obscure test failures is crucial if you want to get the full value from these practices. And Undo Software's live recorder technology allows you to easily fix the bugs that don't otherwise get fixed. Capture a recording of failing tests in your test suites and debug them offline so that you can collaborate with your development teams and customers. Get your software out of development and into production much more quickly, and be confident that it is of higher quality. Visit undo-software.com to see how they can help you find out exactly what your software really did, as opposed to what you expected it to do. And fix your bugs in minutes, not weeks. Okay, so let's go over an overview of what exactly IncrediBuild is.

Starting point is 00:34:24 So, IncrediBuild is a process distribution platform. It's a generic platform. It allows you to seamlessly distribute processes across machines in your network or the public cloud. Effectively, it transforms every machine, every developer machine or build server to become a virtual supercomputer

Starting point is 00:34:40 computer machine that can scale and use thousands of cores and gigs of memory instead of only the local resources that it has. So if you have a developer machine with eight cores and you now want to compile Photoshop, for example, that has thousands of source codes, instead of just using your own resources, IncrediBuild allow you to use all the resources, idle CPU cycles, whether of dedicated machines or machines that

Starting point is 00:35:09 people are actually working on and to use only the idle CPU cycles of this machine in order to accelerate the performance of your compilation. We have unique process virtualization technology embedded in IncrediBuild. This technology effectively allows us

Starting point is 00:35:33 to distribute the process to a remote machine without needing to have the environment that you have in your initiating machine on the remote machine. For example, if I'd like to distribute a GCC task to a remote machine using this unique technology that IncrediBuild has, you don't need to have GCC installed on the remote machine. You don't need to have the GCC libraries or any of your source code on the remote machine.

Starting point is 00:36:02 Essentially, what IncrediBuild does, it will bring on demand everything that the process needs in order to work on this remote machine, and will put it in a special sandbox that IncrediBuild keeps on every machine. Same thing goes for any output that will be generated by the distributed process. So any output that this processed, for example, GCC will create some

Starting point is 00:36:29 object file. It will be brought back to the initiating machine. So from the user's perspective, it's really as though he has thousands of cores and gigs of memory on his local machine. So with IncrediBuild, you can use both virtual machines,

Starting point is 00:36:48 you can scale to the cloud, you can use dedicated servers, but on top of that, you can use idle CPU cycles of machines that people are actually using. So in large organizations, you have a lot of machines from sales, from marketing, from BizDev and others. And these machines are usually pretty idle. You have an 8-core machine which only consumes 10% of its CPU cycle. So you have 7 cores that are just sitting there. And if you have 100 of these kind of machines, it aggregates to 700 cores that now every developer can use to

Starting point is 00:37:26 accelerate any kind of compute-intense processes that it has. This process virtualization technology also allows us to keep this platform generic. So we've started with accelerating Visual Studio, but essentially we've enhanced this platform to support any kind of multi-process execution. So IncrediBuild is used today to accelerate any kind of multi-process execution. And in fact I don't even know all the scenarios that our customers are using our platform for. I can speak with customers and suddenly find out that people are using CrediBuild,

Starting point is 00:38:09 and that's real scenarios, to accelerate C++ compilation. That's the usual case scenario, but they are using the same thing to do tests, unit tests, integration tests, code analysis, asset builds, whether rendering, shading, encoding, compression, IncrediBuild is being used to execute derivatives and run end-of-day calculation on the stock market.

Starting point is 00:38:39 Essentially, everything that is multi-processed can use this kind of platforms to distribute these tasks and gain hundreds and thousands of CPUs instead of only the local resources. So from a practical perspective, what does it look like when the user wants to actually execute something, execute a build or whatever through IncrediBuild? First, he needs to set up the environment. In order to do that, he will need to install a very small footprint IncrediBuild agent on each of the machines that he wants

Starting point is 00:39:18 to be part of the infrastructure. And then you have all these machines connected, and each of these machines can use idle CPU cycles from all of the other machines that are connected to the same platform. Very similar, that's a kind of grid solution, a grid compute solution. And once he has that environment set up, he can use the environment in order to accelerate any kind of multi-process execution that it has, in order to execute, for example,

Starting point is 00:39:48 a make command. The only thing that the developer needs to do is to execute his original make command, and just write incredibuild slash command before his original make command, and that's it. Incredibuild will kick in his make

Starting point is 00:40:04 command, and will distribute all the GCC tasks to remote machines across the network. The only thing that the user will need to do with its make command is to increase the minus j value for his make command. So the minus j flag for make actually tells make how many CPUs or how many jobs to execute in parallel,

Starting point is 00:40:28 the maximum number of jobs, compilations or others, to execute in parallel. And make goes through the dependencies that you have in your script. And by the dependencies, it determines the amount of tasks that can be executed in parallel. So when you are using only your local machine, you will usually execute make minus j8 if you have eight cores. With IncrediBuild, you will execute make minus j800, actually telling make that you have 800 cores

Starting point is 00:40:58 on your main. Make will then try and execute as much as 800 cores, and IncrediBuild will take over this queue of tasks, the tasks being executed by Make, and will distribute them across machines in the network. So the developer's experience is as if he has 800 cores on his laptop, which is quite cool. Is the process virtualization a newer feature of IncrediBuild?

Starting point is 00:41:25 Because I actually did work with it a little bit at my previous job, and I was under the impression that I was working with Visual Studio, and I thought you had to have Visual Studio installed in a machine that IncrediBuild used as an agent. So you might not necessarily want to install a Visual Studio license on someone in the marketing department just to use their machine as an agent, because that could get to be quite expensive. Is that a newer feature?

Starting point is 00:41:49 No, it's not a new feature. We actually had it from the beginning, and it was a misconception of a lot of people. You're not the only one. If you want to use IncrediBuild to accelerate your Visual Studio builds, you don't need to have anything installed on the remote machines. You donild to accelerate your Visual Studio builds,

Starting point is 00:42:07 you don't need to have anything installed on the remote machines. You don't need to have Visual Studio installed there. You don't need any DLL. You don't need a compiler. You don't need a source code. You don't need anything. It can be a machine brand new from the store or a virtual machine, a clean virtual machine. The only thing you need to have on that machine is an IncrediBuild agent,

Starting point is 00:42:22 and that's it. We'll take care of bringing everything that the process needs in order to properly execute on the remote machine while the process is being executed. That's quite nice technology, but it requires a different discussion in order to understand the way that it works behind the hood. Sure. So you've mentioned so far Linux and Windows. What all platforms do you guys support?

Starting point is 00:42:51 These are the platforms. In Windows, you can interoperate between any Windows operating system. And we are having the same kind of technology in Linux. So you can mix between an Ubuntu initiator and a CentOS or Reddit helper. But these are the two platforms, general platforms that IncrediBuild supports.

Starting point is 00:43:15 So no macOS support at the moment? Not yet. We have customers asking for that, and it's currently not yet supported, but we are speaking about it with customers.

Starting point is 00:43:33 So I believe that we might see that in the future. So you've mentioned that it can work with any multiprocess thing. Yep. So that kind of implies, looking back to the earlier part of our discussion, that it can't work with Maven.

Starting point is 00:43:52 Is that correct? Yes, that's correct. That's exactly one of the cons of multthreading. It does not allow third-party tools to intervene with the way that its tasks are being executed. So IncrediBuild can accelerate any kind of multi-process execution. If we are only speaking about build tools,

Starting point is 00:44:19 for build tools, for example, IncrediBuild supports today any build tool that has some kind of multi-process execution, secure kind of minus-j flag, for example. IncrediBuild supports today any build tool that has some kind of multiprocess execution, secure kind of minus J flag. For example, we have Visual Studio and MSBuild and Jam and WAF and Ninja and Make and GMake and CMake and any kind of build system that support multiprocessing. But, you know, our customers are using that not only for build, as they have IncrediBuild set up, the infrastructure is set up in their environments, and their build dropped from one hour to five minutes, for example, and that's a common scenario.

Starting point is 00:45:00 They start to see other things, the other they have as part of their development that now takes more time. Before it was one hour compilation and 20 minutes of testing, for example. But now it's five minutes compilation and 20 minutes testing. So they use the same infrastructure to accelerate any kind of multiprocess execution they have as part of their build. And if you are speaking about large customers, you spoke about EAM at the beginning. So EAM is a large customer of IncrediBuild, for example,

Starting point is 00:45:32 and they're using IncrediBuild to accelerate various... We have a lot of game studios. I think most of the major game studios are using IncrediBuild because you can see in the game studios, they have a lot of things that, besides compilation, that takes a long time to compute. In game studios, you have asset build. You need to render images, you need to render shaders,

Starting point is 00:45:59 you need to calculate a lot of artificial intelligence scenarios, you have texture compression, you have encoding, and you have all the other things that regular applications have, such as packaging and code analysis and integration tests and unit tests and things like that.

Starting point is 00:46:18 So IncrediBuild allows you to use the same infrastructure to accelerate any kind of multi-process execution that is part of your build and not only compilations. There was recently announced a free version of IncrediBuild that comes with Visual Studio. Could you tell us a little bit about that and what you get with that free version? Yeah, so we've made this partnership with Microsoft, with Visual Studio guys, and you can now download IncrediBuild and work with Incrediild from Visual Studio, you have a free version of IncrediBuild. This version allows you to get a few agents

Starting point is 00:47:09 to be used in the distributed main distributor for a limited time, for 30 days. But even after this time ended, you can still use IncrediBuild on your local machine. And I think that one of the

Starting point is 00:47:24 questions that comes after that is, okay, but IncrediBuild is all about distribution. How can IncrediBuild help if I'll only use IncrediBuild on my local machine? So the other thing that IncrediBuild has is the ability to optimize your parallel execution. If you'll see a scenario in Visual Studio in which you have Project A, and Project A depends on Project... Let's say Project B depends on Project A, the scenario you'll see is that

Starting point is 00:47:58 Visual Studio will compile all the files in Project A and then link Project A, and then compile all the files in Project A and then link project A and then compile all the project, all the files in project B and then link project B. But usually there are no dependencies between the files being compiled in project B and any of the files being compiled or linked in project A. So IncrediBuild knows how to determine the actual parallelization that you can have. And with IncrediBuild, even when you work on standalone, and this standalone version is free for life for eight cores,

Starting point is 00:48:31 you will see IncrediBuild executing all the compilation tasks for project A and project B in parallel. So while you link project A, you'll still be able to compile files from project B and then only link from project B. And only with this predicted execution, we call it predictive execution mechanism, only with that, only with... we can see scenarios in which in CrediBuild, using only your local cores can accelerate your build performance

Starting point is 00:49:06 from anything like anywhere between 10% faster to 200% faster, which the average is usually something around 25-30% faster builds using only your local resources. So it's quite meaningful. Besides that, IncrediBuild also offers a bunch of other things that accompany the main purpose of the product, which is the performance. We are generally in pro for productivity, and one of the things that IncrediBuild has is a graphical visualization tool

Starting point is 00:49:44 that allows you to see all the tasks that you are executing, either when you compile or do any kind of other execution within CrediBuild. This shows you, and it really requires an image, trying to explain it,

Starting point is 00:49:58 a graphical representation with audio is quite limited, but it allows you to see every task as a bar that goes over time, and it can show you all the cores that are being used, and each course you can see all the bars, all the tasks as a bar, tasks as a bar that this core is executing. And if a specific compilation task will fail, you will see this bar as red, and if it will succeed, you will see it as green. So now if you now have a failing bar, a failing compilation task,

Starting point is 00:50:35 you'll simply need to simply double-click the red bar in the incredible graphical representation, and it will jump straight to the line in the output that shows the failure, and you can also double-click that and go straight away to your source code. So instead of just going over, and you can have that for other build tools as well or any other execution that you're using within CrediBuild. So instead of going through a very, very long output, which is only textual, you can use a graphical representation

Starting point is 00:51:03 to jump straight away to the error. Another thing that you can have with this build monitor is see all the bottlenecks, the bugs you have in your build. So if I have 16 core machines, for example, I want to see all these 16 cores working, and if I can suddenly

Starting point is 00:51:19 see in the graphical representation that only a single core is working, I want to take a look at that area and see whether this dependency, because it means that other tasks are dependent on this task and they are waiting for it to finish in order to kick in. So I want to look at this part and see whether I can see them further parallelized and accelerate the performance of this area. And that's something which is very, very difficult to have if you only have a textual output to determine this kind of task that our bottlenecks.

Starting point is 00:51:54 It's quite an impossible task to do. And we have other features as well. We allow you to stop on errors, and we add batch builds that allow you to build in parallel multiple configurations, and there are many other features that are part of the product that comes for free, even in a standalone version, that allows you to get better productivity. So there is a very nice gain, even if you are not using the entire distribution mechanism of IncrediBuild for Visual Studio specifically. So for the distributed versions, it seems like you're mostly aiming towards large organizations or large teams.

Starting point is 00:52:37 Do you have licensing options that make sense for, like me, I've got three idle computers sitting here in my house right now. Yeah, so we have a bundle solution for IncrediTool, which allows you to have a package of five agents, eight you can use. So if you have three machines, that's something that you can use. There are various options for students or academic institutions and things like that. And actually, with IncrediBuild, one of the interesting things that recently occurred is all this hype on cloud computing. So even if you have only a single machine and you are an indie developer and you build your code and it takes a long time to do that

Starting point is 00:53:29 and you want to release your code or you have a peak time that you want to do more, build more, test more, you can actually use the public cloud. You can have IncrediBuild, you can try and grab some kind of rental solution from us for a specific duration of time and you can use the public cloud as of rental solution from us for a specific duration of time, and you can use the public cloud as well. You don't need to

Starting point is 00:53:48 have all these resources at your site. And the public cloud performs very well, especially if you have a good bandwidth, such as in the US or Europe or Japan, and you can use these machines on demand and you can scale that's quite interesting and we see more and more customers even in large organizations when they want to do more

Starting point is 00:54:15 before releases for example they usually test more and run more builds and do more stuff and run more asset builds and they can just grab more machines in the cloud and scale to the cloud as well, not only their local infrastructure. So the answer is yes.

Starting point is 00:54:34 One of the things that I recently see, which is more common, is that even if before you had a very small program, a small application, people are using more and more frequently, they are using open sources. And in these open sources, even if they are making minor changes,

Starting point is 00:54:56 they need to recompile the entire open source. So things that used to take a very short time in the past take a meaningful time in the present, and in the future, I believe that it will only be worse. And also, these open sources, you know, they are getting bigger and bigger, and they take more and more time to compile, and this is a pain for a lot of people.

Starting point is 00:55:27 Okay. There's been a lot of news lately about dependency managers with C++. We don't really have a great solution, but various solutions are coming out and trying to stake their claim. There's Bcode, and now there's Conan.io, I believe. It might be getting a little bit popular.

Starting point is 00:55:46 Does Incredibuild work well with some of these dependency managers? So one of the things that we did straight from the beginning is that we didn't want to handle dependencies due to the fact that we wanted to remain generic and to be able to support any kind of build system or any kind of execution, we are actually not handling

Starting point is 00:56:12 dependencies. We allow your build tool to handle the dependencies for us. So if we are, for example, running make, make will handle the dependencies. You will simply pass a large pass minus j value, and make will make the dependencies, you will simply pass a large pass minus J value, and make will make sure that it only executes in parallel

Starting point is 00:56:29 things that can execute in parallel. And the only thing that IncrediBuild will do is it will distribute these tasks to remote machines, remote machines at work. So even if you have a new build system or a new dependency manager, from IncrediBuild's perspective, it doesn't matter.

Starting point is 00:56:45 Things will work out of the box for you. Okay, that's good to know. I think that's all I have. Jason, do you have any more questions? That's all I have. Okay, Dory, it's been great to have you on the show. Where can people find you online and go to find more information about IncrediBuild?

Starting point is 00:57:02 So, you can go to IncrediBuild's website, and you can also find me on LinkedIn, link Dory Eksterman. And if you have anything that you'd want to write to me specifically, I'd be happy to hear from you. Okay. Thanks so much for your time today, Dory.

Starting point is 00:57:18 Thanks a lot. It was a pleasure. Thank you. Have a good day. You too. Thanks so much for listening as we chat about C++. I'd love to hear what you think of the podcast. Please let me

Starting point is 00:57:30 know if we're discussing the stuff you're interested in or if you have a suggestion for a topic. I'd love to hear that also. You can email all your thoughts to feedback at cppcast.com. I'd also appreciate if you can follow CppCast on Twitter and like CppCast on Facebook. And of course, you can find all that cast on twitter and like cpp cast on facebook and of course you

Starting point is 00:57:45 can find all that info and the show notes on the podcast website at cppcast.com theme music for

Your Ad Here

CppCast - Parallel Computing Strategies

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.