CppCast - Parallel Computing Strategies
Episode Date: March 17, 2016Rob and Jason are joined by Dori Exterman to discuss parallel computing strategies and Incredibuild. An expert software developer and product strategist, Dori Exterman has 20 years of experien...ce in the software development industry. As Chief Technical Officer of IncrediBuild, he directs the company's product strategy and is responsible for product vision, implementation, and technical partnerships. Before joining IncrediBuild, Dori held a variety of technical and product development roles at software companies, with a focus on architecture, performance and advanced technologies. He is an expert and frequent speaker on technological advancement in development tools specializing in Embarcadero (formerly Borland) environments, and manages the Israeli development forum for these tools. News Herb Sutter Trip Report Testing GCC in the wild JF Bastien Trip Report - Happy with C++17 Dori Exterman Dori Exterman Links Considerations for choosing the parallel computing strategy - Dori Exterman - Meeting C++ 2015 Incredibuild
Transcript
Discussion (0)
This episode of CppCast is sponsored by Undo Software.
Debugging C++ is hard, which is why Undo Software's technology
has proven to reduce debugging time by up to two-thirds.
Memory corruptions, resource leaks, race conditions, and logic errors
can now be fixed quickly and easily.
So visit undo-software.com to find out how its next-generation
debugging technology can help you find and fix your bugs in minutes, not weeks.
CppCast is also sponsored by CppCon, the annual week-long face-to-face gathering for the entire C++ community.
Get your ticket now during early bird registration until July 1st.
I wanted to put in a quick disclaimer about this episode.
Dory Eksterman, who's going to be our guest today, was a great guest.
But there were some audio quality issues with this episode.
Dory comes from Israel.
He's on the other side of the world from me and Jason.
So I think the internet just wasn't performing well or Skype wasn't performing well.
And there's some audio hiccups in Dory's conversation with us.
So I just wanted to make you aware of that.
I did try to take some of it out in editing, but there with us. So I just wanted to make you aware of that. I did try to
take some of it out in editing, but there's only so much I can do. So please enjoy the episode.
Episode 49 of CPP Cast with guest Dory Eksterman recorded March 16th, 2016. In this episode, we go over more of the trip reports from C++17.
Then we talk to Dory Ekstman from Incredibuild.
Dory tells us about the pros and cons of multi-threading versus multi-process
and more. Welcome to episode 49 of CppCast, the only podcast for C++ developers by C++ developers.
I'm your host, Rob Irving, joined by my co-host, Jason Turner.
Jason, how are you doing today?
I'm doing all right, Rob. How about you?
Doing pretty good. No real news to report from me.
So let's just jump into a piece of feedback.
This week we got an email from Demeter, I believe is the name.
And he writes in, I finally managed to listen to all episodes.
You're doing a great job.
It's nice to learn new and surprising stuff about C++.
I especially liked the episode with Dimitri Nustruk.
He was just talking and talking, never slowed for a second, and everything he said was super interesting.
I bet he could go for hours and no one will get bored.
He then has a couple suggestions for guests.
Jeff Preshing from Ubisoft, John Carmack, Paul Pedriana from EA to talk about EA STL,
and, of course, some of the big ISO folks like Herb Sutter, Bjorn Stroustrup, STL, or other compiler implementers.
They're definitely some people who are on our radar to get on for the show.
So we'll definitely keep trying to get some of them on.
Yeah.
Yeah.
We'd love to hear your thoughts about the show as well.
You can always reach out to us on Facebook or Twitter, and you can email us at feedback at cppcast.com.
Joining us today is Dory Eksterman.
Dory is an expert software developer and product strategist.
Dory has 20 years of experience in the software development industry.
As CTO of Incredibuild, he directs the company's product strategy and is responsible for product
vision, implementation, and technical partnerships.
Before joining Incredibuild, Dory held a variety of technical and product development roles
at software companies with a focus on architecture, performance, and advanced technologies.
He's an expert and frequent speaker on technological advancements in development tools specializing
in Embarcadero, environments, and manages the Israeli Development Forum for these tools.
Dory, welcome to the show.
Hi, guys.
Thanks very much for inviting me. Thanks for these tools. Dory, welcome to the show. Hi, guys. Thanks very much for inviting me.
Thanks for joining us.
Yeah, so we have a couple of news items to discuss
before we start talking to you about multi-core, multi-process,
and some of the stuff you're doing at Incredibuild.
The first one is we're still getting a lot of news
from the C++17 ISO committee meeting.
And Herb Sutter wrote a pretty excellent and very lengthy trip report that was a good read.
I really like this graph that he puts in towards the end showing the progress of the various TSs and describing them kind of as feature branches
that will eventually get merged back into the trunk. And it does kind of highlight how things
like modules and concepts are pretty new and we shouldn't be too upset that they didn't make it
into C++17 yet. And he also writes how they're actually thinking that they could switch from a three-year cycle to a two-year cycle.
So it might be C++19 next instead of C++20.
What were your thoughts on this, Jason?
Well, there's a few things in the actual report I'd like to call out.
But looking at the graph that you just mentioned, which I had not yet looked at closely, I didn't really appreciate it.
It looks like it took C++98 about eight years
to standardize and C++11 also about eight or nine years to standardize. I mean,
now they're talking about a two-year cycle. Holy cow, that's four times the improvement.
Well, I guess each version of C++ does give us efficiency gains that help us
write faster code. So maybe it's working that way for the standards committee also.
Well, he kind of points that out too. And that's why C++ 11 was such a major release because they
had nine years to work on it and perfect it. And that's why they're not going to be able to have maybe as major a release with the new model.
But with these two-year, three-year cycles, they'll have more kind of medium releases, I believe he's saying.
Right.
Dory, what were your thoughts on this?
So actually I see a lot of companies, you can see, especially I can see it in Microsoft,
moving towards more agile kind of development.
So it allows them to get more releases more frequently.
And I think that especially in commercial companies, keeping the competitive edge,
you know, you can have a feature ready.
And unless you actually release it, this feature is not available for your customers.
So we can see that everywhere in the industry.
People are moving towards the shorter release cycles.
And areas that expand, such as continuous integration, DevOps, and L development,
actually allow you to do that with more unit testing and integration tests
and all these kinds of infrastructure allows companies
and companies do consider and do understand the benefit
of moving towards shorter release cycles.
So I think that that's something very good that's going on in the entire industry
and not only in C++ and in C++17.
But yeah, I'm very happy to see
that they are moving towards this kind of agile,
continuous releases as well.
Yeah, I definitely agree.
It's just because of the nature
of the committee meetings and everything,
because it's kind of a side project
for everyone who does it.
It doesn't really feel agile.
Two-year releases doesn't feel agile.
But it definitely is compared to, you know,
the 11-year or nine years' worth of work that went into C++11.
Well, and really, if there was a release like every year,
it would be just too hard to know which features you're allowed to use.
Yeah, you can't do that.
So there's a couple things, actually, in his trip report I don't believe we mentioned last time. Yeah, you can do that. is yes, they approved the following features that they were not yet voted in,
but they are on track.
Okay.
So it's kind of a static if.
Personally, I really don't think the syntax looks very nice.
Because, I mean, well, the constexpr if, fine.
constexpr space if,
and then it evaluates something at compile time.
But if you have to have an else block it's constexpr underscore else and i feel like this kind of yeah that's a bit odd
yeah it's a disconnect in the in the yeah syntax there anything else you want to call out before
i move on um i thought was the thing, but it does look interesting.
I think everyone should take a look at that if they're interested in the standards development at all.
Yeah.
So this next article is another one from the Red Hat blog
talking about testing GCC,
and they have a pretty interesting method of testing the compiler
by running it against thousands of packages that are actually
in the wild and making sure everything compiles. I thought this was pretty interesting. Right,
Jason? Yeah, they basically rebuild Red Hat every once a year with the newest compiler to see what
will happen. Is it just Red Hat or is it like open source projects as well? Reading between the lines, I assumed it was RPM packages.
So packaged open source projects that can build on Red Hat
was what I believe it is.
Right.
But it's at 17,000 packages now.
Yeah, that's a lot of packages.
What were you going to say, Dory?
It's actually
amazing, and we can see that
everywhere, because
as the years go by, people have
more code base
and more tests that are running.
And actually, I think
that I'll refer to it further on
in our discussion, because it's
very relevant. In
IncrediBuild, we're doing something
which is very similar inside IncrediBuild as well
due to the fact that we accelerate build tools.
We have a lot of open source projects
that before each release, we check our release
and try to build with IncrediBuild
all these open source projects.
So we have kind of the same thing as they have with GCC.
And we really see that going on and on in terms of size
and compute capability that is really the required capacity
that is required in order to do that in a reasonable amount of time.
So I think that that's very important for our further discussion.
Right. And in this last article is another
trip report from
JF Bastion from Google, who we've had on the
show before. And
this one is a good read
if you were angry or disappointed
from the results of
C++17. He's kind of
talking the internet community down
a little bit, which is probably necessary for
a lot of people. Not really too much to add to that, right, Jason? I just noticed that he quotes
ISO Trump++. I don't know if you've seen that Twitter. I didn't notice that. It's right in the
first section. He says, as ISO Trump++ says, let's make C++ operator greater again.
That Twitter account is absolutely ridiculous and quite a sign of the times today, if you haven't checked it out.
Yeah, I did notice this Twitter account the other day.
It's an interesting merger of American politics and C++.
Yes.
Yeah.
Is there anything you wanted to talk about with this one, Dory?
No, I think I'll pass on this one.
Okay.
Well, let's start talking to you about this talk you gave last year
at Meeting C++ with parallel computing strategies.
Can you give us an overview on your thoughts of multi-threaded versus multi-process?
Yeah, so I think the first thing is the motivation for creating this presentation.
As a CTO in IncrediBuild, I frequently communicate with development managers that like to like their application to perform better, to use distributed computer
or to use cloud computing in order to gain, you know, better performance and to be able
to use more resources and not only the resource of the local machine.
But during these conversations, I've seen many times that people chose the wrong kind of parallel computing strategy for their software.
And they didn't have the means, you know, to understand which strategy is best for them. So following this, you know, I had a lot of these kind of
discussions and I wanted to give this to our customers and, you know, the community
and try to locate some kind of comprehensive list that allow architects
to go over for choosing the architecture for more parallel computing
that suits their needs.
Surprisingly, I couldn't find one in the Internet,
so I took everything that I knew of parallel computing through my experience and things that I found in the web
and tried to assemble some kind of guide,
some bullet points that that allow you to determine
which kind of part of the computing strategy better fits your needs.
And I did it in order to contribute this to the community.
And I think that that's kind of important because from my experience,
people are just starting their development. They are not thinking about the future of their product and the kind of capacity of CPU consumption that they will have in the future.
And once their product succeeds and then usually the problem gets
bigger, and then they need to spend more time in solving it.
Solving, if we're speaking about solvers, or CAD, or compilations, or testing, or various
other scenarios, they kind of reach a kind of bottleneck in their performance, and when they wanted
to scale, they sometimes see that they are limited by
the way that they implemented their parallel computing.
So, what I did in this session is
to gather guidelines
that allow you to ask yourself a set of questions that will allow you to determine
which kind of parallel computing strategy better suits your problem.
So that was the motivation of the motivation. And giving these presentations in C++,
I recently did a similar presentation in St. Petersburg
in a C++ conference as well.
I also get a lot of feedback from developers
that continue and contribute to this list
of bullets
for developers
and architects to consider before they
choose whether to use
multi-threading or multi-processing
in their
software.
So can you give...
I'm sorry, go ahead. I was just going to ask
if... Can you give us an'm sorry, go ahead. I was just going to ask if...
Yeah, can you give us an overview
of what the pros and cons of each option are?
Yeah, I can.
I think it will be a little bit short.
Usually this session is an hour long.
Oh, yeah, absolutely.
Just the highlights.
Yeah, high level.
Just the highlights.
So the pros of, let's start with
multi-threading. So I think that the pros is, the major one is
the easy and fast way to communicate and share data between threads.
So that's very easy. They all have the same process space and they
can easily use variable classes and any other
kind of structure to share data and to communicate, and you don't need to have anything special in
order to achieve that. It's also easy to communicate with the parent process, the process that invokes
these threads. One of the areas in which mult-threading is very beneficial for, and I can
see that with large customers that have, for example, small servers, is large data sets that
can be divided to subsets. So if you have a very large data set, and you want to get all these data
sets into your memory, and all the threads are looking into random areas in your data set.
So you are actually not able to divide it into subsets,
and you will not want to load this data set for each process,
because it will consume a lot of your memory.
So that's another area in which multithreaded is relevant.
For example, you can see that in artificial
intelligence, in weather forecasting, in genetic algorithms, and in areas like that. So that's
something that multithreading is better for. And also multithreading is supported by many libraries
that allow you to have thread-safe access to data structures that allow you to access these data structures
from multiple threads
without you needing to be concerned about trace conditions
or score locking or things like that.
So that makes it quite easy to develop.
And it's easy to start.
People are usually, when they see a scenario in their application
that they need multiple elements
that they want to execute in parallel,
usually they go for multi-threading.
And actually, that's also where the whole problem starts
because we as developers,
we know that once we started,
usually we don't go back.
So people are starting with multi-thed, and then usually the entire application is
multithreaded, and that's where they start having this problem that continues with the
system, if that's not the right strategy for them.
So the cons for multithreading is there are quite a few.
So one of them is if one of your threads crashes,
it will crash your entire process, your entire application,
which is not something that you want to have.
And we know that bugs happen, and we sometimes don't catch all of them.
And multithreaded is something which is risky.
If you want your application to always run and not crash,
that's something that's hard to recover from.
Another thing is that multi-threaded are hard to debug.
If you have race conditions or locking issues or memory stuff like this,
I'm sure that a lot of your audience already, you know,
had a lot of time-consuming bugs similar to this.
So these are things that are very, very difficult to solve.
And also, you need to consider your developers.
If you are a team leader and you have a developer
which is not that experienced with multi-threaded development,
he will usually be able to achieve the goals that you set for him.
But the way that he develops the code can really affect your application in the future.
And, you know, maintaining a multi-threaded, a complex, multi-complex threaded application
can be more costly than actually developing it
from the first place, especially if you don't have the environment and you need to try and solve
these problems remotely on your users' sites or things like that. So these are quite complex
scenarios. If you have shared data
between your threads, you need to maintain, and usually it's difficult to
maintain, the set of 10 blocks and make sure that you don't have race condition
stuff. Another thing is that having too many threads can result in having
too much time spent in context switching between the threads
instead of using the CPU for actual calculations.
And I think the last thing is, this that multithreaded is less relevant for scalability.
You have, usually with multithreaded,
you'll only be able to use your multicore machine.
You will not be able to easily distribute these threads either to other machines
or the public cloud, which is quite a limitation. So if you have a lot of tasks in the future,
usually customer, you know, application developers start with small problems. They have
four threads, six threads, eight threads, eight threads, calculate the scenario.
But when you start having thousands of threads, that's where things become a bottleneck in terms
of performance. And that's where scalability is something that you'd like to have. With
multi-threaded, scalability is very, very difficult to attain. Okay, so what about some of the pros
and cons of multi-cess? Okay, great.
So I think that there are kind of a mirror of one another.
So the first thing that is very important to understand
is if you have a multiprocess application,
it doesn't mean you don't have a multithreaded application.
So if you can have multiple processes,
but each process also contains multiple threads,
and that's something which is, you know,
that's the same thing that we have in IncrediBuild,
for example.
One of the main benefits is if one of your processes crashes,
it won't crash your entire application,
especially if you know how to recover from it.
For example, in IncrediBuild, we distribute processes,
not our processes, but our customers' processes to remote machines,
and a process can fail.
The machine can be disconnected.
Things can happen.
And due to the fact that we know that these kind of things can happen,
and we actually control that, that's not our processes,
it's our customers' processes.
We have a mechanism that allows us to recover from this kind of crashes.
So multiprocess allows you to recover from this kind of process crash, which is something
that can be common, especially if you have a large deployment product.
It's much easier to debug.
It's much easier to debug a single process than debug a process that has hundreds or
dozens of threads, especially if you know how to
encapsulate or proxulate your process in an optimized manner.
You'll have usually less locking issues, but it
actually depends on the
infrastructure and API that you're using, and it's scalable. So if you started your application
and you have eight processes and you have an eight-core machine, that's fine, but if you
then need to execute 800 processes, you can use clusters, you can use distributed computing, you can use cloud computing
to gain better performance and provide your customers with better performance.
In terms of business, you know, people today are looking for SaaS services, service and the ability
to sell, to upsell to their customers some kind of service.
And scalability, and especially scalability to the cloud,
is something which is very interesting in terms of business aspects.
The cons are really, and everything here that I speak is really dependent
on the kind of application that you're writing.
But generally, the cons of multiprocess are that if you want to have communication between your processes,
you need to do some custom development.
You need to use some kind of communication protocols, whether it's TCPAP, UDP, broadcasting,
anything that exists today, in order to communicate between these processes.
And this can introduce both complexity of development
and also some kind of latency in the communication between the processes.
One of the ways to solve it, by the way way is to have some kind of shared memory
which is an element that allows you to share
a memory segment between processes
in order to minimize the latencies
and then you have a kind of performance which is similar to multi-threading
but in order to develop it
you require some professional guys
in your team
because that's not quite easy to develop and maintain in a complex system.
And the last thing I think is that you have the smallest set of libraries
that support multi-process development in terms of, you know, data structure
or as I said, shared memory.
You don't have these kind of solutions
as you have with multi-threaded.
So that's more or less in a nutshell
very, very high highlights.
You can, you know, your audio,
your listeners can take a look at the entire session in which
I delve into different kinds of applications and
understand which strategy they took
for example if it's a CAD, usually it will use
multi-threading unless the problems are very large and then you have some
CAD CAM
software which develops their application in a multi-process manner because they knew that they
would want to scale in the future. If you have some AI weather forecasting genetic algorithms
in which you want to have a lot of synchronization and communication between the multiple elements,
then usually multi-threaded is also usable.
If you're speaking about rendering and compilation and tests and things that are actually not dependent and they are quite atomic,
so usually you'll benefit more from having multi-process technology.
I don't see much value in having a multi-threaded application executing elements which are not
communicating between one another and don't use the same kind of data.
So it really depends on the application that you develop. And I think it's very important to understand.
And each of these highlights is delved into in the presentation that I gave, but I think that
we have the time for it now. So that's just the highlights for it. Yeah, we will have a link to
that YouTube version of that presentation in the show notes,
if our listeners want more information on that, too.
Great.
Sorry, go ahead, Jason.
I was going to say, you mentioned IncrediBuild now. Do you want to go ahead and give us an overview as to what your product, IncrediBuild, is?
Yeah, I'd be happy to.
I think that just one more thing
before we delve into IncrediBuild
is in the presentation,
I just took two build systems,
Make versus Maven.
Yes.
The reason that I took them
is because they actually have,
they try to solve a very similar problem.
They are compiling code.
They are very common.
A lot of people are using them.
And they both have a lot of tasks.
These tasks are atomic.
They are compilations.
They usually don't communicate with one another.
They have very small data sets. And due to the fact that large projects
have a lot of tasks to compile,
scalability and performance can be an issue here.
Sorry.
So in this presentation,
I just took a look at Maven versus Make.
And so how they both started early on.
So, you know, in the past,
where you didn't have a lot of multi-core machines,
and also the code base was quite smaller
than it is today.
And what I've seen is that in Maven you are kind of restricted to your
workstation or your build server. So if you have an 8-core workstation or
32-core build server, that's the amount of process, that's the amount of
computation that you'll have for your entire build. And it used to work,
especially since Maven is
usually compiling Java, and Java
is shorter
in terms of compilation than C++.
But still, you know, people
today have a larger and larger
code basis. And
I saw in the internet
people are asking for
multiprocess from Maven
in order for them to be able to use distributed solution,
similar to IncrediBeal or GCC or others,
in order to accelerate their compilation times.
But due to the fact that Maven was constructed in a multithreaded way,
a third-party tool cannot intervene and cannot distribute these threads.
Distributing mechanism cannot distribute a thread.
The smallest atomic unit that you can distribute
is a task, is a process.
And that's quite a limitation.
And the other thing is that you cannot intervene.
For example, with IncrediBuild,
we have the ability in Visual Studio
to know and determine the optimized way
to execute tasks in parallel.
So we can intervene with the way
that processes are being executed.
And if we see that a process,
although Visual Studio, for example,
will run a few processes in a sequential manner,
IncrediBuild can intervene due to the fact that these are the processes
and actually force Visual Studio to execute them in parallel.
And that will result in optimized utilization of your own resources,
your own machine resources, even without distribution.
With Maven, this is something that you can't have
because you cannot determine and
you cannot intervene with the way that Maven executes its threads. So, with Maven, you are
limited to the amount of resources and CPUs and memory that you have in your machine. And with
Make or Visual Studio, for example, if you're using CrediBuild, just to compile Qt, for example.
So Qt will take you something like 16 minutes to compile on an eight core machine. But if you're using something like IncrediBuild to accelerate it and use additional cores and distribute
to a compilation task, you can share a build which takes one and a half minutes instead of 16 minutes, that will use 130 cores across your local network.
So that's a huge gain for developers
and for your product as well in terms of time to market and others.
So I think that that showed the case, yeah.
I wanted to interrupt this discussion for just a moment
to bring you a word from our sponsors.
You have an extensive test suite, right? You're using TDD, continuous integration,
and other best practices. But do all your tests pass all the time? Getting to the bottom of
intermittent and obscure test failures is crucial if you want to get the full value from these
practices. And Undo Software's live recorder technology allows you to easily fix the bugs
that don't otherwise get fixed.
Capture a recording of failing tests in your test suites and debug them offline so that you can collaborate with your development teams and customers.
Get your software out of development and into production much more quickly, and be confident that it is of higher quality.
Visit undo-software.com to see how they can help you find out exactly what your software really did, as opposed to what you expected it to do.
And fix your bugs in minutes, not weeks.
Okay, so let's go over an overview of what exactly IncrediBuild is.
So, IncrediBuild is a process distribution platform.
It's a generic platform.
It allows you to seamlessly distribute processes across machines in your
network or the public cloud.
Effectively, it
transforms every machine, every
developer machine or build server
to become a virtual supercomputer
computer machine that can
scale and use thousands of cores
and gigs of memory
instead of only the local resources that it has. So if you have a developer machine with eight
cores and you now want to compile Photoshop, for example, that has thousands of source codes,
instead of just using your own resources, IncrediBuild allow you to use all the resources, idle CPU cycles,
whether of
dedicated machines or machines that
people are actually working on
and to use only the idle CPU cycles
of this machine in order to
accelerate the performance
of your compilation.
We have unique process virtualization technology
embedded in IncrediBuild.
This technology effectively allows us
to distribute the process to a remote machine
without needing to have the environment
that you have in your initiating machine
on the remote machine.
For example, if I'd like to distribute a GCC task to a remote machine
using this unique technology that IncrediBuild has,
you don't need to have GCC installed on the remote machine.
You don't need to have the GCC libraries or any of your source code on the remote machine.
Essentially, what IncrediBuild does, it will bring on demand everything that the process needs
in order to work on this remote machine,
and will put it in a special sandbox
that IncrediBuild keeps on every machine.
Same thing goes for any output that will be generated
by the distributed process.
So any output that this processed, for example,
GCC will create some
object file.
It will be brought back to the initiating
machine. So from the user's
perspective, it's really as though
he has thousands of cores and gigs
of memory on his local machine.
So
with IncrediBuild, you can use both virtual machines,
you can scale to the cloud, you can use dedicated servers, but on top of that, you can use idle CPU
cycles of machines that people are actually using. So in large organizations, you have a lot of
machines from sales, from marketing, from BizDev and others.
And these machines are usually pretty idle.
You have an 8-core machine which only consumes 10% of its CPU cycle.
So you have 7 cores that are just sitting there.
And if you have 100 of these kind of machines,
it aggregates to 700 cores that now every developer can use to
accelerate any kind of compute-intense processes that it has. This process virtualization technology
also allows us to keep this platform generic. So we've started with accelerating Visual Studio,
but essentially we've enhanced this platform
to support any kind of multi-process execution. So IncrediBuild is
used today to accelerate any kind of multi-process execution. And in
fact I don't even know all the scenarios that our customers are using our
platform for.
I can speak with customers and suddenly find out that people are using CrediBuild,
and that's real scenarios,
to accelerate C++ compilation.
That's the usual case scenario,
but they are using the same thing to do tests,
unit tests, integration tests, code analysis, asset builds, whether rendering,
shading, encoding, compression,
IncrediBuild is being used to execute
derivatives and run end-of-day calculation on the stock market.
Essentially, everything that is multi-processed can use this kind of
platforms to distribute these tasks and gain hundreds and thousands of CPUs instead of only the local resources.
So from a practical perspective, what does it look like when the user wants to actually execute something, execute a build or whatever through IncrediBuild?
First, he needs to set up the environment. In order to do
that, he will need to install
a very small footprint
IncrediBuild agent on each
of the machines that he wants
to be part of the
infrastructure.
And then you have
all these machines connected, and each of these machines
can use idle CPU cycles from all of the other machines that are connected to the same platform.
Very similar, that's a kind of grid solution, a grid compute solution. And once he has that
environment set up, he can use the environment in order to accelerate any kind of multi-process execution that
it has, in order to execute, for example,
a make command.
The only thing that the developer
needs to do is to execute his
original make command,
and just write incredibuild
slash command before his original make
command, and that's it.
Incredibuild will kick in his make
command, and will distribute all the GCC tasks
to remote machines across the network.
The only thing that the user will need to do
with its make command
is to increase the minus j value
for his make command.
So the minus j flag for make
actually tells make how many CPUs or how many jobs to execute in parallel,
the maximum number of jobs, compilations or others, to execute in parallel.
And make goes through the dependencies that you have in your script.
And by the dependencies, it determines the amount of tasks that can be executed in parallel.
So when you are using only your local machine,
you will usually execute make minus j8
if you have eight cores.
With IncrediBuild, you will execute make minus j800,
actually telling make that you have 800 cores
on your main.
Make will then try and execute as much as 800 cores,
and IncrediBuild will take over this queue of tasks,
the tasks being executed by Make,
and will distribute them across machines in the network.
So the developer's experience is as if he has 800 cores on his laptop,
which is quite cool.
Is the process virtualization a newer feature of IncrediBuild?
Because I actually did work with it a little bit at my previous job,
and I was under the impression that I was working with Visual Studio,
and I thought you had to have Visual Studio installed in a machine
that IncrediBuild used as an agent.
So you might not necessarily want to install a Visual Studio license
on someone in the marketing department just to use their machine as an agent, because
that could get to be quite expensive.
Is that a newer feature?
No, it's not a new feature.
We actually had it from the beginning,
and it was a misconception of
a lot of people.
You're not the only one.
If you want to use
IncrediBuild to accelerate your Visual
Studio builds, you don't need to have anything installed on the remote machines. You donild to accelerate your Visual Studio builds,
you don't need to have anything installed on the remote machines.
You don't need to have Visual Studio installed there.
You don't need any DLL.
You don't need a compiler.
You don't need a source code.
You don't need anything. It can be a machine brand new from the store or a virtual machine,
a clean virtual machine.
The only thing you need to have on that machine is an IncrediBuild agent,
and that's it.
We'll take care of bringing everything that the process needs in order to properly execute on the remote machine
while the process is being executed.
That's quite nice technology, but it requires a different discussion
in order to understand the way that it works behind the hood.
Sure.
So you've mentioned so far Linux and Windows.
What all platforms do you guys support?
These are the platforms.
In Windows, you can interoperate
between any Windows operating system.
And we are having the same kind of technology in Linux.
So you can mix between an Ubuntu initiator
and a CentOS or Reddit helper.
But these are the two platforms,
general platforms that IncrediBuild supports.
So no macOS support at the moment?
Not yet.
We have customers asking for that,
and it's currently
not yet supported, but
we are
speaking about it
with customers.
So I believe that we might see
that in the future.
So you've mentioned that it can work with
any multiprocess
thing.
Yep.
So that kind of implies, looking back to the earlier part of our discussion,
that it can't work with Maven.
Is that correct?
Yes, that's correct.
That's exactly one of the cons of multthreading. It does not allow third-party tools
to intervene with the way
that its tasks are being executed.
So IncrediBuild can accelerate
any kind of multi-process execution.
If we are only speaking about build tools,
for build tools, for example,
IncrediBuild supports today any build tool
that has some kind of multi-process execution, secure kind of minus-j flag, for example. IncrediBuild supports today any build tool that has some kind of
multiprocess execution, secure kind of minus J flag. For example, we have Visual Studio
and MSBuild and Jam and WAF and Ninja and Make and GMake and CMake and any kind of build
system that support multiprocessing. But, you know, our customers are using that not only for
build, as they have IncrediBuild set up, the infrastructure is set up in their environments,
and their build dropped from one hour to five minutes, for example, and that's a common scenario.
They start to see other things, the other they have as part of their development that now takes more time.
Before it was one hour compilation and 20 minutes of testing, for example.
But now it's five minutes compilation and 20 minutes testing.
So they use the same infrastructure to accelerate any kind of multiprocess execution
they have as part of their build.
And if you are speaking about large customers,
you spoke about EAM at the beginning.
So EAM is a large customer of IncrediBuild, for example,
and they're using IncrediBuild to accelerate various...
We have a lot of game studios.
I think most of the major game studios are using IncrediBuild
because you can see in the game studios,
they have a lot of things that, besides compilation,
that takes a long time to compute.
In game studios, you have asset build.
You need to render images, you need to render shaders,
you need to calculate a lot of artificial intelligence scenarios,
you have texture compression,
you have encoding,
and you have all the other things
that regular applications have,
such as packaging and code analysis
and integration tests and unit tests
and things like that.
So IncrediBuild allows you to use the same infrastructure
to accelerate any kind of multi-process execution
that is part of your build and not only compilations.
There was recently announced a free version of IncrediBuild
that comes with Visual Studio.
Could you tell us a little bit about that and what you get with that free version?
Yeah, so we've made this partnership with Microsoft, with Visual Studio guys, and you can now download IncrediBuild and work with Incrediild from Visual Studio, you have a free version of IncrediBuild.
This version allows you to get a few agents
to be used in the distributed
main distributor for
a limited time, for 30 days.
But even after
this time ended,
you can still use IncrediBuild on your local
machine. And I think
that one of the
questions that comes after that is,
okay, but IncrediBuild is all about distribution. How can IncrediBuild help if I'll only use
IncrediBuild on my local machine? So the other thing that IncrediBuild has is the ability to
optimize your parallel execution. If you'll see a scenario in Visual Studio
in which you have Project A,
and Project A depends on Project...
Let's say Project B depends on Project A,
the scenario you'll see is that
Visual Studio will compile all the files in Project A
and then link Project A,
and then compile all the files in Project A and then link project A and then compile all the project,
all the files in project B and then link project B. But usually there are no dependencies between
the files being compiled in project B and any of the files being compiled or linked in project A.
So IncrediBuild knows how to determine the actual parallelization that you can have.
And with IncrediBuild, even when you work on standalone,
and this standalone version is free for life for eight cores,
you will see IncrediBuild executing all the compilation tasks
for project A and project B in parallel.
So while you link project A,
you'll still be able to compile files from
project B and then only link from project B. And only with this predicted
execution, we call it predictive execution mechanism, only with that, only
with... we can see scenarios in which in CrediBuild, using only your local cores
can accelerate your build performance
from anything like anywhere between 10% faster to 200% faster, which the average is usually
something around 25-30% faster builds using only your local resources. So it's quite meaningful.
Besides that, IncrediBuild also offers a bunch of other things
that accompany the main purpose of the product,
which is the performance.
We are generally in pro for productivity,
and one of the things that IncrediBuild has
is a graphical visualization tool
that allows you to see all the tasks
that you are executing,
either when you compile
or do any kind of other execution
within CrediBuild.
This shows you,
and it really requires an image,
trying to explain it,
a graphical representation
with audio is quite limited,
but it allows you to see every task
as a bar that goes over time,
and it can show you all the cores that are being used, and each course you can see all the bars,
all the tasks as a bar, tasks as a bar that this core is executing. And if a specific compilation
task will fail, you will see this bar as red, and if it will succeed, you will see it as green.
So now if you now have a failing bar, a failing compilation task,
you'll simply need to simply double-click the red bar in the incredible graphical representation,
and it will jump straight to the line in the output that shows the failure,
and you can also double-click that and go straight away to your source code.
So instead of just going over,
and you can have that for other build tools as well or any other execution that you're using within CrediBuild.
So instead of going through a very, very long output,
which is only textual,
you can use a graphical representation
to jump straight away to the error.
Another thing that you can have with this
build monitor is see all the bottlenecks,
the bugs you have in your build.
So if I have 16
core machines, for example,
I want to see all these 16 cores
working, and if I can suddenly
see in the graphical representation
that only a single core is
working, I want to take a look at
that area and see whether this dependency, because it means that other tasks are dependent on this
task and they are waiting for it to finish in order to kick in. So I want to look at this part
and see whether I can see them further parallelized and accelerate the performance of this area.
And that's something which is very, very difficult to have if you only have a textual output
to determine this kind of task that our bottlenecks.
It's quite an impossible task to do.
And we have other features as well.
We allow you to stop on errors,
and we add batch builds that allow you to build in parallel multiple configurations,
and there are many other features that are part of the product that comes for free,
even in a standalone version, that allows you to get better productivity.
So there is a very nice gain, even if you are not using the entire distribution mechanism of IncrediBuild for Visual Studio specifically.
So for the distributed versions, it seems like you're mostly aiming towards large organizations or large teams.
Do you have licensing options that make sense for, like me, I've got three idle computers sitting here in my house right now. Yeah, so we have a bundle solution for IncrediTool,
which allows you to have a package of five agents,
eight you can use.
So if you have three machines, that's something that you can use.
There are various options for students or academic institutions and things like that.
And actually, with IncrediBuild, one of the interesting things that recently occurred is all this hype on cloud computing. So even if you have only a single machine and you are an indie developer
and you build your code
and it takes a long time to do that
and you want to release your code
or you have a peak time that you want to do more,
build more, test more,
you can actually use the public cloud.
You can have IncrediBuild,
you can try and grab some kind of rental solution from us
for a specific duration of time and you can use the public cloud as of rental solution from us for a specific
duration of time, and you can use the public cloud as well. You don't need to
have all these resources at your site. And the public cloud performs
very well, especially if you have a good bandwidth, such as in the US or Europe or
Japan, and you can use these machines on demand
and you can scale
that's quite interesting
and we see more and more customers
even in large organizations
when they want to do more
before releases for example
they usually test more and run more builds
and do more stuff and run more
asset builds and they can
just grab more machines in the cloud
and scale to the cloud as well,
not only their local infrastructure.
So the answer is yes.
One of the things that I recently see,
which is more common,
is that even if before you had a very small program,
a small application,
people are using more and more frequently,
they are using open sources.
And in these open sources,
even if they are making minor changes,
they need to recompile the entire open source.
So things that used to take a very short time in the past
take a meaningful time in the present,
and in the future, I believe that it will only be worse.
And also, these open sources,
you know, they are getting bigger and bigger,
and they take more and more time to compile,
and this is a pain for a lot of people.
Okay.
There's been a lot of news lately
about dependency managers with C++.
We don't really have a great solution,
but various solutions are coming out
and trying to stake their claim.
There's Bcode, and now there's Conan.io, I believe.
It might be getting a little bit popular.
Does Incredibuild work well with some of these dependency managers?
So one of the things that we did straight from the beginning
is that we didn't want to handle dependencies
due to the fact that we wanted to remain generic
and to be able to
support any kind of build system
or any kind of execution,
we are actually not handling
dependencies. We allow your build
tool to handle the dependencies
for us. So if
we are, for example, running make,
make will handle the dependencies.
You will simply pass a large pass
minus j value, and make will make the dependencies, you will simply pass a large pass minus J value,
and make will make sure that it only executes in parallel
things that can execute in parallel.
And the only thing that IncrediBuild will do
is it will distribute these tasks to remote machines,
remote machines at work.
So even if you have a new build system
or a new dependency manager,
from IncrediBuild's perspective,
it doesn't matter.
Things will work out of the box for you.
Okay, that's good to know.
I think that's all I have. Jason, do you have any more questions?
That's all I have.
Okay, Dory, it's been great to have you
on the show. Where can people find you
online and go to find
more information about IncrediBuild?
So, you can go
to IncrediBuild's website,
and you can also find me on LinkedIn,
link Dory Eksterman.
And if you have anything that you'd want to write to me specifically,
I'd be happy to hear from you.
Okay.
Thanks so much for your time today, Dory.
Thanks a lot.
It was a pleasure.
Thank you.
Have a good day.
You too.
Thanks so much for listening as we chat
about C++. I'd love to
hear what you think of the podcast. Please let me
know if we're discussing the stuff you're interested in
or if you have a suggestion for a topic. I'd love
to hear that also. You can email
all your thoughts to feedback at
cppcast.com. I'd also
appreciate if you can follow CppCast on
Twitter and like CppCast on
Facebook. And of course, you can find all that cast on twitter and like cpp cast on facebook and of course you
can find all that info and the show notes on the podcast website at cppcast.com theme music for