Programming Throwdown - Parallel Computing with Incredibuild
Episode Date: December 19, 2017How can you use all of the computers in your lab/office at the same time to speed up tasks? Today we talk with Dori Exterman, CTO of Incredibuild, about parallel computing and the awesome too...l Incredibuild has created that can run any multi-process program on several machines. Show Notes: http://www.programmingthrowdown.com/2017/12/episode-73-parallel-computing-with.html ★ Support this podcast on Patreon ★
Transcript
Discussion (0)
programming throwdown episode 73 parallel computing with incredibuild take it away jason
hey everyone so um chances are you're you're listening to this after
the giveaway um we're recording this before the giveaway just before the giveaway but hopefully
uh if you're a patreon subscriber you got one of our really cool uh um you know laser cut it laser
cut acrylic little stencils uh that's a guarantee and hopefully if you're out there
you got something else cool too in the raffle but today we have a really cool episode um we're
gonna talk to dory exterman who's the cto of incredibuild and we're gonna talk all about
kind of parallel computing um at incredibuild they've built this really cool system for
um kind of combining.
I'll let him kind of explain in more detail, but combining parallel computing and allowing you to use kind of the machines you have right now sitting around your office in like a distributed fashion.
So, Dory, why don't you tell us kind of your background?
What got you into parallel computing and sort of bit of history?
So my background is I think I have more than 25 years of experience in software
development. I did a lot of development in the areas of information
and later on in low level as well and after that for a couple of
years I consulted many companies in different areas of technologies advanced
technologies client server multi-tenant etc which led me quite naturally to part of computing and efficiency and
optimization areas and seven years ago I joined in Credbill to lead the
technical side of the company and since then I'm here and it's one of the
coolest companies I ever worked in and I think I'm going to be here for a while so yeah.
Cool, you've been there seven years right?
Yeah, yeah, that's pretty amazing.
Yeah, that's awesome.
Was it, did you start the company or had it already been started?
The company is very mature, it's, I think we are more than 15 years in the market already.
Since, from the era in which power computing wasn't that popular,
you only had one core on each PC.
And then, you know, things went very slowly.
So in these eras, when you needed more horsepower,
you didn't have any way actually to achieve it.
You couldn't buy, purchase a machine with multiple cores unless you had a lot of money. And the idea behind the
company came actually from CETI at home, which is a NASA project from the beginning that tried to
find extraterrestrial lives using distributed computing. So in the 80s you were able to install some software on your computer
and then NASA would distribute these audio files
they recorded and your computer was able to analyze
these audio files and see if there are signs for extraterrestrial lives.
I remember that. Yeah, I remember having the SETI at home screensaver
and so basically when your computer went idle,
instead of having a bouncing ball or something like that,
you would just be scanning for anything.
And it's still a live project, I think.
They're still doing that.
And in this era when you had just very limited resources,
so NASA really needed your own computers in order to scale this.
And the founders of the Incredibles thought,
well, if NASA is doing this for extraterrestrial life finding,
why can't we do that for other stuff as well?
And they started in accelerating visual studio builds.
So this was 15 years ago, when you only had one core.
And when I joined the
company, we already had multiple cores. So we thought that once people would have 16
cores or even four cores, the problem would go away. But in fact, what we are always finding
is that in the same ratio as the amount of cores are raising in your local machine, the problem grows bigger as well.
So people always need more resources,
which makes us still relevant and even more relevant than we were in the past.
Cool, that makes sense.
Yeah, I mean, I think you kind of touched on it,
but yeah, it sounds like the big motivation behind parallel computing
is that you have sort of some very
expensive process but it can be broken up into pieces and then solved in parallel yeah um what
what are uh you what are you most of your customers what what are they you know trying to
solve i mean not you know i'm sure there's there's you know confidential things and things like that. But in general, what's the sort of areas that your company can help?
Essentially, we've started with accelerating Visual Studio.
So we are focused there even today.
So many of our customers, I wouldn't say all,
but many of our customers, I wouldn't say all, but many of our customers are Visual Studio users and are accelerating either their Visual Studio compilations or any kind of computation that they have as part of their continuous integration systems. Six, five years ago, we opened the technology. We wrote it from the beginning to be generic.
And we opened the technology for users to be able to use it for any kind of compute-intense execution.
So today, users are using it for compilation, for testing, packaging, artificial intelligence training,
weather forecasting, financial derivatives.
We are highly popular in the gaming industry.
In the gaming industry, I think we are used by something like the largest 1,000 studios in the world
doing Xbox, Sony, Nvidia Shield, Nintendo, PC, and Android, and VR, and any kind of game.
And the reason that it's highly popular in the gaming industry, I believe,
is that they have so many things that they do as part of game development
that require a lot of resources that are compute-intensive.
So it's not only C++ compilation, but usually it's rendering,
video rendering, image processing.
You have a lot of shaders, which, for example, when you have a physics engine,
you need to pre-calculate the shades of the objects in order for you not to do that
while the game is running.
Yeah, this is something not a lot of people know but basically the game uh developers they
they try to sort of cheat as much as possible like yeah if there's sort of really nice soft
lighting chances are that that's not being computed on the fly because that's extremely
extremely expensive um there's this thing called monte car, you know, photon mapping, where literally each light just emits a bunch of rays of energy.
And every time one of those rays of energy hits anything, it bounces and it becomes basically a new light source.
And so it just blows up exponentially. Right.
And that's how, you know, you could imagine even in real life, if you open up a window just a tiny amount, you can light up a whole room.
And that's because the photons are bouncing all over the place. Right.
And so to to make that same effect in the game where it's not like everything is pitch black except for where the sun hits, they need to do this really, really expensive process.
And so that always happens, you know before uh the game is even
shipped they just pre-compute that um for everything that that isn't moving and as you
said yeah it's extremely expensive but it's completely parallelizable you could do every
room independently yeah and also for example if you take uh the fifa game for example you have
different kind of stadium different different kind of lighting.
And this lighting affects the way that lights and shades are going to interact.
And also the players themselves.
For example, Messi and Ibrahimovic have different sizes and different way of interacting with the light.
So you can pre-compute that.
And that's what they're doing.
And they have millions of pre-computations, and they can prepare that in advance,
and that's what they're doing, and they're doing it with IncrediBuild.
Cool, that makes sense.
So what are the big differences between, you know, for example, instead of IncrediBuild,
I could go and get some really powerhouse machine.
Maybe it has two processors.
Maybe it has 36 cores or something like that.
I don't know if that's even supported.
Let's say 36 cores exist.
No, you can say 80.
Oh, okay.
Oh, we have customers with 80 cores, yeah.
And so what's the difference between me doing that,
spending maybe a lot of money up front
and buying some 80-core machine
versus using something like IncrediBuild doing that, spending maybe a lot of money up front and buying some 80 core machine versus
using something like Incredibuild that's going to go back and forth over the network.
So I'll give you a live scenario.
For example, let's assume you are a very large, I don't want to name names, but let's assume
you're a very large company, an enterprise company, and you're working on a game or a different project software,
and you have 100 engineers working on this software.
So you'll need to purchase 100 multiplied by 80 core machine in order for each engineer
to have these 80 cores to really run faster.
With IncrediBuild, you can have each developer having eight cores
on his machine and each of them will be able to use all the IOS CPU cycles with IncrediBuild
seamlessly of all the other engineers working in his local network. So every engineer can
essentially tap with IncrediBuild into 800 cores instead of just 80,
which will also cost you much less.
An 80-core machine is very expensive.
You won't be able to purchase one for each of your developers.
And also, 80-core, you think it's huge,
but when you have a computation that takes 24 hours, that takes two days,
even 80 cores is not fast enough.
Our users are using CrediBuild
to distribute to hundreds
of ports, and not only 84.
84 is not
the high end of our users.
That makes sense. Also,
Bill goes on vacation to the
Canary Islands or something. His machine
is just sitting there wasted
if he didn't
have some type of distributed.
With Intranet you have other stuff as well. So for example, we have
companies that have multiple sites, geographical sites. So one
site, when in one site it's daylight and working hours, the other site it's
nighttime and their resources are idle. So they can use
the resources from one site
to accelerate
the computation of the other sites,
which is quite cool.
Another thing is that you can always
with IncrediBit, and that's not something
you can do when you purchase hardware,
and this is something we see more and more. You can
always scale to the public cloud. Essentially
all we need is a virtual machine.
So especially in peak times, before game release,
before Christmas, when you need to do more testing,
you have more compilation.
And before releases, that's usually when you need more resources
and you don't have them.
With IncrediVity, you can always scale.
You can say, okay, I'll just add some
100 cores
in any kind of public cloud
available and connect them to my
local network, and boom,
you have more resources to use.
And, you know, time to market
is very essential in these
sectors.
That makes sense.
I guess, you know, what if, you know, what if you have a
problem where there's sort of a lot of data? You know, so is that sort of, is that something that
IncrediBuild can handle? Or how would you go about solving that problem? Like, for example, I mean,
you brought the example of the arenas, the football arenas.
I mean, maybe those files aren't that large and they can get passed around.
It's really the computation.
But what if it's something like... Yeah, gigs of data, that's right.
That's a problem.
So there are large meshes.
There are...
That's usually specific. So there are large meshes. That's usually in specific types of software.
For example, in genetic algorithms,
when you're trying to calculate genetic algorithm stuff,
usually the data is very, very large.
If you can break the data into multiple subsets,
then it will work for you.
It will be good because you can say,
okay, when I'm running these types of algorithms,
I need only this subset of the data,
and then you can pass it away, which is cool.
But there are scenarios in which you have really, really huge data,
and not only that you have huge data,
you need to load all this data
to your memory, or else the computation time will be very, very large.
In these kind of very specific types of calculation, usually you won't be able to get around it,
nothing will help you, not an incredible, not a cluster, and you need a supercomputer
in order to compute that in an efficient manner.
So there are specific problems that you'll need an HPC machine, and that's why HPC machine exists.
But if you can break the data into multiple and smaller data sets along with the computation that go around that,
then you can use IncrediBuild or you can use clusters or any kind of distributed computing technology.
So I guess what's the difference then between, you know, I've seen there's, let's say, OpenMP or MPI or these kind of things.
Or I guess maybe OpenMP is a good example where you, you know, or maybe LaPack or something where there's parallelism kind of built frameworks where sort of every machine is running some dedicated
kind of server that can
handle chunks of data and processes. It sounds like IncrediBuild
kind of fits in between. It's sort of for people who don't want
to have this huge farm of machines just sitting there
to do computation and have to
deal with map reduce and all of that um but it's also not all the way on the other end where
you're having to write like modify a lot of c++ code this is kind of right in the middle
so i think that uh the major difference so hadoop is usually usually for data analysis and it's less used for computation.
But although it is, the main purpose of Hadoop is usually for big data analytics and things like that.
And with both in Hadoop and OpenMP or other infrastructure for parallel computing, you'll need to write your software
in order to accommodate these technologies,
these infrastructures.
You need to deeply integrate with these technologies
and you need to write your software in advance
in order for it to be able to work
with these kind of technologies.
You need to consider a lot of stuff.
You need to consider how you handle fault tolerance or if a node just goes down, what
is the computer doing in your software, how you handle scheduling, how you handle in advance,
how you handle the data transfer and synchronization.
With IncrediBuild, the idea behind the product was to give you a solution that will simply work out of the box.
So, for example, if you have something that you wrote, okay, and if I provide you, so you wrote and you can run eight processes,
hundreds of processes in Parallel, but you only have eight cores.
With IncrediBuild, the idea was that you just plug in IncrediBuild, you install it on every
agent you have in your local network, and IncrediBuild will seamlessly use these idle
resources of the other machines as though they reside in your own laptop.
You don't need to install anything on this remote machine besides IncrediBuild.
You don't need to transfer files.
That's kind of magic. We're doing it with a very unique technology
which allows us to virtualize the process on the fly.
But essentially, the idea was to give you a fly-and-play solution.
So you don't need to write anything specific in your software
in order to allow IncrediBuild to do its trick and allow you to scale.
So I think that's the major difference between these kind of solutions.
Cool. So how does that actually work? So in other words, let's say I have some program that,
let's say it plays a game and it outputs the result of that.
It plays a game between two AIs
and it outputs the result of that game to a file.
And you want to do that, let's say,
a hundred times or a thousand times.
So what you want in the end is to have, you know,
a thousand files on your own machine
with all of the replays.
And so if IncrediBuild is farming all of that out, how does it
go and collect all the files and how does it sort of manage all of that?
So I think that's the very unique technology we developed
in IncrediBuild, which is a process-level
virtualization technology, which allows us essentially to take
any kind of process and
distribute it to any kind of remote machine and emulate the environment that the process
requires in order to successfully run on this remote machine as though the process is running
on your local computer.
The way that we're doing that for any kind of process, it's not only the output file,
because for example, so let's assume that as you said you have this game that you want to play
100, 1000 times. This game is a process that gets some parameters. With IncrediBuild,
automatically you run all these processes in parallel. IncrediBuild, automatically you run all these processes in parallel. In CrediBuild, you simply have your command line.
Let's assume that you have a power meter saying how many of these processes to run in parallel.
So you will have playgames.exe 100,
which will say playgames.exe to execute 100 game.exe.
In CrediBuild, your main process will run all these 100 game.exe. Incredibly, your main process will run all these 100 processes.
Incredibly, it will then take hold of this queue of you trying to execute 100 processes,
and it will interact with a coordinator component of Incredibly,
telling you, listen, I need 100 cores.
Just give me whatever you have. And let's assume that the coordinator looks around all the agents that are installed in your network
and will be able to provide you 100 cores.
And then the IncrediBit on your local machine will tell a remote machine to run an instance of a Game.exe.
Now, on this remote machine,
you don't have anything besides IncrediBit.
You don't have the game.exe process.
You don't have the DLLs that maybe the libraries
that this exe requires.
You don't have input files.
Nothing is there besides IncrediBit.
The way that we allow this to happen
is on the remote machine, you have an IncrediBuild agent.
And the IncrediBuild agent will run this process on the remote machine
and will essentially inject IncrediBuild code into the process level.
So this code of IncrediBuild, this is injection technologies,
this code of IncrediBuild will actually act as a middleman
between the process running remotely and the remote operating system.
So all the calls that interest us, that tries to reach the operating system from your process,
will first be intercepted by IncrediBuild.
So once you try to open the file on a specific location,
this file does not exist on the remote machine
because you don't have anything, any DLL, any input file on the remote machine. But
Incredibly will intercept the call. So if you try to open a file in my documents, a.txt,
for example, Incredibly will intercept the call. It will see that you don't have the
file, the Incredibly cache. It's kind of sandbox we manage on the remote machine. And it will see that you don't have the file in the Incredible cache, it's a kind of sandbox we manage on the remote machine,
and it will then go to the Incredible agent on your local machine
and ask for this file, and it will copy this file on demand
to a special location, special cache that we have on the remote machine.
And it will then redirect the API code, the OS API code,
instead of opening the file in my documents in Windows, it will say open
the file in C program files in credit build cache a.txt and then it will
only then the API code will be passed to the operating system. The operating
system will then go to the location we provided instead of the original
location. The file exists there because we copied it, and it will open a file and bring back a
handle. Incredible will then take this handle, and it will forward it to your process running on the
remote machine. So from the OS perspective, there was a file, and it opened it. And from the process
running remotely, the process perspective, it has handled it and can work with.
So this is what we're doing for any kind of file system related call or any kind of thing that we'd like to virtualize.
So it can be opening and loading a DLL, writing a file, opening a file, creating a directory, accessing the registry.
Every kind of thing is virtualized on the fly by IncrediGrid. So essentially the remote
process thinks is
actually it has on the fly
providing with an emulation of everything
that it needs as though it's being
executed on the local machine.
So once the file, once
this process tries to write it out
we do the same thing. We intercept
the create file and write
operation and we redirect them to our we do the same thing. We intercept the create file and write operation,
and we redirect them to our own special location.
And once the process finishes running, we simply synchronize back the files that were created by this process
to the original place where the process tried to write them on your local machine.
So from your perspective as a user, it's really as though you have 100 cores running for you on your local device.
And it's not only cores, it's also memory,
it's also any kind of network bandwidth you can use more
because you're actually using more computers, more memory,
more cores, more CPU processing power, etc.
That makes sense.
So just to walk through the example here,
so you would, let's say you had. So just to walk through the example here, so you would
let's say you had 100 machines
you called play game with 100
it would
create 100 processes, let's say
one on each machine, and then
when it went to play the game, that game might
require all sorts of rules
and other files
and on demand those get pulled from
your machine and then distributed out, like fanned out to these 100 machines,
they're going to go and play 100 games in the time it would take you
to play one game on one thread locally.
And then when the game finishes, they'll save the replay,
and the Incredibuild will detect, oh, you know,
there's a new file here that was created by this remote process.
I better send it back to the main computer so that it knows it's there.
Yeah, exactly. Simple, right?
That makes sense. Yeah, I think it's clever. Yeah, very clever.
But it requires a lot of deep understanding into the operating system, low level, intercepting all these calls and implementing this.
It's highly complex.
And you're speaking about multi-tenant, asynchronous, highly parallel execution that makes it very complex.
Fault tolerant.
If one machine goes down, we need to recover automatically.
These are things when you're trying to do something like this yourself,
you need to take care of that with Incredibilit taking care of it for you.
So this is quite complex to develop.
Yeah, that makes sense.
Go ahead, Patrick. Yeah, so doing this process-level virtualization,
does that imply that all of the machines are running a fairly similar version of the same operating system?
So we support different flavors of the same.
So if you are running Windows, you can work with any kind of Windows version.
You can distribute Windows 7 to Windows Service 2016.
But you cannot distribute Windows to
Linux because the way the operating
system works are different. But in Linux,
for example, you can work
with Ubuntu, CentOS, Fedora,
etc. in the same grid.
So we can't distribute Ubuntu
processes to Fedora or to
CentOS or etc.
So then for things like shared libraries and dynamically
loaded things, those all still have to be passed over the
network then? Yeah, but
we only need to pass them
once and then we cache them
on the remote machine. So the next time
you run any process
that will require the same libraries,
they're already there on your helper machine
so we don't need to transfer them back.
So you only get this
latency usually on the first process you ever execute on your infrastructure.
Because in a regular scenario, your 100 developers will more or less use the same kind of DLLs,
the same infrastructure.
So your helpers, after a while, their cache will be filled with data, and then their network
latency will be very minimal.
From our test, it's just a warm-up call, a few milliseconds.
And then how do you sort of understand up front the weight of a process?
So in Jason's example of playing 100 games, all 100 processes are doing roughly the same
amount of work um but what happens if you know you're spanning up 100 and you have you know a exponential
distribution where you have a few processes doing you know orders of magnitude more work but yet
you know in your your you know sort of networked set of computers some are very powerful some are
not yeah so uh we have we have so we have a component which we call a coordinator.
Its role is to do assignment algorithms.
So all the agents, all the incredible agents report to this coordinator,
and the only job the coordinator needs to do is to coordinate between these agents.
So the way that we solve this question that you have is by
providing each agent some kind of grade. We grade each agent, each computer, and
then we will always try to use the, you know, the strongest machines at the
beginning and not the weak machines. And we have some optimizations for
that. And for example some optimizations for that.
And for example, let's give another example.
For example, let's assume that I'm using your machine as a helper.
Your machine was idle, so I used your machine.
But then you as a developer, suddenly you do something on your machine. You copy a large amount of data or you just start running a game.
We don't want to disturb you so once you started the game and you want to play
it and Incredibility is utilizing your idle resources these resources are not idle
anymore so one of the mechanism is to be able to detect this and to stop all the
Incredibility processes running on your machine as a helper in order not to
disturb you doing your own work and re-executing them on a different machine.
So from my perspective as a user, I won't even know that
my processes were terminated on your machine and were rescheduled
to run on a different machine. So there are a lot of things we need to take into
consideration in order to streamline the experience.
What about inter-process
communication? How does that work? So that's a good question. With IncrediBuild
there are a set of limitations that you need to meet in order to be able to use
our product. So inter-process communication is one of them but it
really depends on the way you do inter-process communication.
If you're using, for example, shared memory
in order to communicate between processes,
I won't be able to distribute the process to your machine
if it tries to use the shared memory
to communicate with the process running on my machine.
So this is something we do not support.
But if you're using TCP IP,
which usually you won't do for inter-process communication,
but if you do, that's supported.
That's not a problem because it can work across the network.
So it really depends on how you implement process communication.
Another thing that we have a lot is a scenario
in which you execute multi-process,
which do inter-process communication using shared memory, for example.
But it's okay to run all of these kind of processes on a single machine.
So you just need to tell us in a way that these processes should run together on a single machine and we'll distribute all of them as a batch to a specific machine.
Yeah, that makes sense. So, you know, I guess a lot of people,
I would imagine almost everyone listening has built code.
Almost everyone listening has, you know,
opened up Visual Studio or run GCC or something like that, right?
And for most, you know, when I was in college,
most of my projects compiled pretty quickly.
So going to just the sort of origin of the company,
why do builds take a long time?
Why do they need to be parallelized?
Why don't builds just finish in three seconds?
Yeah, I wish they would.
I'm not sure that I would say that again because otherwise I won't have a lot of things to do here.
Although, as I said, we are doing today, I think that the industry is going towards testing and not only building.
If you're going, for example, just as a side note, if we're speaking about the trend of continuous integration, continuous delivery, DevOps, so you want to streamline the ability to take your product to deployment every time and have short iterations.
I see a trend of companies investing a lot of effort in testing even more than in other areas
because if you want to deploy automatically,
you need to have a very, very large coverage.
Yeah, that makes sense.
And nowadays with the Internet,
they're constantly pushing out new patches.
And so they want to make a new release every week or every month
or something like that.
And every single release needs to be good.
Yeah, not only that, we see a trend and that's very popular.
And I think the industry is moving towards this direction.
And that's the holy grail.
But we already see companies doing this holy grail.
So every commit that a developer pushes can be automatically deployed into production automatically.
And that's amazing
because that really gives you
the competitive edge.
As a CTO,
as someone who manages software delivery,
I can tell you that
a lot of times the developer says,
yeah, we finished it one month
ago, but unless it's
on the users running,
you didn't do anything. If it's only
on your environment, you actually didn't
deploy anything.
The trend is towards
continuous deployment, which
means that once I do something,
I'll fix a bug as a developer,
I'll push commits, and
then I'll have a completely
automatic flow that will
take my commit into production
without me doing any manual thing in the middle.
And it will be able to also automatically roll back.
But in order to achieve that,
you need to have a lot of testing
because you need to make sure things work correctly
in order to be automatic, full automatic.
And that's where I see more and more users using Credible to actually accelerate their testing.
So that was a side note. But for compilations you're right. If you are doing a very
small software, it usually takes a few seconds. It actually also depends on the language you're using. So
languages that compile
to machine code
such as C and C++
will take longer to compile
than languages which compile to
intermediate language such as
C Sharp and Java.
So essentially
you have, we are working
with users that have very, very large code base.
I don't want to mention names, but very large software running on your Windows OS,
they can have the largest that I'm familiar with working with IncrediBuild has 20 gigs of source code.
And I'm speaking only source code.
Yeah, that's pretty amazing.
That's unbelievable.
Yeah, that's one of the, it's a very known product.
It's used by a lot of users worldwide.
And that's 20 gigs of source code.
20 gigs?
I mean, I'm trying to wrap my head around that.
Just sources.
Each line is 80 bytes, right?
And so,
yeah,
that's a lot.
Yeah, that's a lot.
And it takes something like,
to compile this software, it takes
commercial software, it takes
20 hours to compile it.
And you can reduce it to
less than an hour.
Maybe I'm getting this wrong, and I'm going to get a ton of hate mail,
but I just did 20 billion divided by 80.
So assuming every line is completely full of code,
which is a very conservative estimate,
you're still looking at 250 million lines of code.
Yeah, that's a lot of software having.
Yeah, but a lot of it could be auto-generated. Yeah, that's... But you have a lot of software having...
Yeah, but a lot of it could be auto-generated code.
Yeah, that's true.
Yeah, exactly.
So that's one of the...
You're asking how do we reach this kind of large code.
So auto-generated code, using templates,
working with third-party libraries
that you don't know exactly what's going on there.
So this really makes your code very loud.
I saw a game recently where they actually had, it was EVE Online,
actually has Chrome, the entire Chrome framework in the game so that they can do browsers.
So, for example, the help in the game is actually a web browser that renders in the game so that they can do browsers. So, for example, the help in the game is actually a web browser
that renders in the game.
And to make that happen, they have all of Chrome.
So I'm pretty sure Chrome is a ton of code.
Actually, it's part of our regression.
The Chrome project is part of our regression tests. So we want to make sure
that we ship products correctly and we have our own tests. The way that we test our product
and see that we didn't introduce new bugs into it is to run a lot of these kind of products.
We compile Chrome, we compile Qt,
and we compile a lot of large open sources and physics engines, et cetera,
in order to see that everything works well.
And Chrome really is quite large.
It takes a long time to do that,
but it's highly powerful.
So if you have a lot of resources,
you can compile it quite fast,
as opposed to if you only have 8 or 16 cores in your machine.
So yes, that's how you increase your source code.
And in today's trend, you have a lot of open sources, you have a lot of open software, etc.
So we see a lot of developers, if it's not well-architectured,
we see a lot of software having a specific problem and they just find some kind of open source in the net and say,
okay, I'll use that to solve my problem.
And bam, you have, I don't know, 200,000 lines of code you need to compile.
And that grows and grows.
Yeah, think also the
auto generation,
you probably
have a lot
of things
that they
want baked
into the
code.
Even like
an icon
image and
things like
that,
they'll
actually
have some
software that
converts that
into a C
file so
that it
can't be
tampered
and things
like that.
And yeah, that makes the code base huge.
Another thing that I see a lot, and that's something when we see code samples from users,
sometimes we see that people are placing huge amounts of stuff in their include files.
So they include more and more stuff as part of their headers, which require you to do a lot of compilations in order to get it running.
So there are a lot of good practices,
how to write a good code that will actually not overload your compilation time.
And it's a good practice to adhere and understand that there are some good practices to develop this kind of efficient software.
Can we rant against Boost now?
Against Boost?
We're talking about keeping build times low? So I worked on this project, this project that worked, but it was kind of a side project called Eternal Terminal,
which is just kind of a replacement for SSH that we're using at the place I work.
And one of the things, so I ended up kind of not being able to dedicate as much time to it as it really needs.
So the company actually hired somebody who's full-time working on it now.
And that person removed Boost.
The only thing I was using Boost for
was the circular buffer
and a couple of other kind of minor things.
And yeah, the compilation time sped up enormously.
It's just because it's all, I guess,
header libraries that are getting analyzed.
So I mean, that's the joke about Boost,
is that it always makes build times bad.
And it's not an unfounded joke,
but there are good reasons for it.
Yeah, that's a common problem that we see with users,
with our users.
A lot of our users are using Boost,
and that's good for us, you know,
because we help them compile faster.
Another thing you need to take into consideration is the fact that sometimes you take another open source.
For example, you just want to solve something.
I don't know why.
You want to have some math calculations or a specific problem that you see a nice library, a nice open source that solves that.
And in the background, this open source uses Boost,
and you don't even know that.
Yeah.
So that's something we see a lot.
If you are using, people that develop open source
tend to rely on other open sources as well
in order to get it into the market faster
and in a more stable manner.
That's cool.
But then you just add one open source and in the background,
it adds 10 more open sources to your code without you knowing that.
And that's how software gets bigger.
And that's one of the things that we see more and more.
And I expect to see even more in the future
because open source is really where the market is going.
You can see a lot of companies, commercial companies,
opening up their code to be open source.
Yeah.
And yeah, that's where the industry is moving.
So I think that we'll see more of that.
Yeah, I think it's in some sort of like game,
from a game theoretic standpoint,
it's in some kind of well right now
where if a company isn't going to open source
their technology that actually creates a liability for the people who are working at that company
right like microsoft was like this for a long time they were suing the mono people um they were
actually uh they made their closed source version of java the j plus plus and i think oracle sued
i don't know but they were trying to keep everything locked down.
And what they found is that it was just very hard to get talent
and even to get people to use the software because it creates this liability.
And now you really see Microsoft kind of being one of the last people to the party there,
but open sourcing a lot more of their technology and and now embracing
mono and things like that so i i can tell you that uh from recent uh series that i saw etc
that microsoft is one of the today's one of the largest contributors to open source which is uh
which is yeah yeah and they open source and the speed they open source i think also dot net
and they open source many of their tools that once were very close.
And it's very good for them.
It's doing very good for the industry, for them as well, and for the adoption.
And people now are able to add and work with this more efficiently.
And that's where everything is going.
So I think that we'll see as i said more and more
of that yeah i actually tried the uh visual studio code which is a brand new browser they forked uh
adam which is a github browser created by the github company and uh visual studio code is pretty
amazing uh it's totally open source and i was was really, really impressed. It has tons of users.
It's really good.
It works really fast.
They're really investing in making it light and fast and cross-platform, of course.
And it supports.
The last time I saw it was something like more than 100 languages.
Yeah.
It's really cool.
It's a cool product.
Yeah, yeah.
It's amazing.
And you're going to see Incredibil there as well, I believe, next year.'s really cool. It's a cool project. Yeah, it's amazing. And you're going to see IncrediBuilder as well, I believe,
next year. So it's not only
Visual Studio.
We have more and more
commercial tools, also
IDEs, so you can work with it in
Qt Creator, in Visual
Studio, as I said, and
you're going to see it soon in Eclipse
and in CLI and others as well.
So when you try to, I guess, in the sense that you're mentioning, oh, you're going to see it here, you're going to see it there.
So there's still something that the end user, like kind of walk us through what the end user has to do.
Let's say I made some evolutionary computation system,
and it's some binary I wrote.
It creates a bunch of processes, and each of them does some simulation,
and then I collect the results and do some analysis, right?
So I have this EXE.
It does all of this on my machine, and I want to use IncrediBuild.
What do I have to do?
In other words,
what is involved in getting it
to support a new application?
Yeah, so let's assume
that your main process is main
and your sub-processes are sub.
And main process executes
100 sub-processes.
So the way to integrate IncrediBuild into that is to open an XML file
and to say main.exe space allow intercept equals true.
That tells IncrediBuild that this is the parent process that executes sub-processes.
And then you have a new line and you say sub.exe space allow remote equals true, which tells IncrediBuild
every time that this main process will execute a sub-process,
I want this sub-process to be executed remotely by IncrediBuild.
And that's it. That's the only configuration file you need
to edit. And then let's assume that your main command was
main.exe 100, which
tells your main.exe to run 100 sub-processes. The only thing you need to do is say,
ibconsole, which is the IncrediBuild command line interface, slash command, and pass your
original command to IncrediBuild. And that's it. That's all the integration you needed to do.
And your 100 sub-processors will be automatically distributed by IncrediBuild,
and all your outputs will be automatically synced back to your local machine.
And from your perspective, it's really as though you have 100 cores on your local laptop.
That's it.
It will take you two minutes.
So when you say IncrediBuild is coming to Visual Studio Code, for example.
So what that would be is like a module that would come with an XML file that's designed for Visual Studio Code.
And and yeah, I guess just that XML.
Oh, and also it would have to call the IncrediBuild function when it's launching the binary.
So it would do those two things,
but you would provide that as like a package for Visual Studio Code.
Actually, users are already using IncrediBuild,
both in Visual Studio Code with Eclipse, with CLion, et cetera,
because they all have essentially a command line behind them.
So you can use today, you can actually execute any command line that you have
with IncrediBuild. So if you're using CLion for example, once you compile your code
with CLion it generates a CMake which is a build system, a CMake command line.
You can simply take this CMake command line and run it also today. Customers are doing
that. They're just running their CMA command line with IncrediBuild and they accelerate their C-line
executions. When I'm saying that we integrate into that, it means that we need to have a plugin
and extensions that you see as part of the IDE itself. It doesn't mean that people are not working with it today,
but it's just a matter of doing usually a plugin of IncrediBuild
that will wrap everything out and will interact with the IDE itself.
We also have a very, very cool visualization,
which is very, very great because it allows you to have all these
hundreds of thousands of lines of textual output drawn to you by IncrediBuild
as a graphical representation.
So you can see very easily what's running, where it's running.
If it fails, you'll see a red bar. If it fails, you'll see a red bar. If it succeeds,
you'll see a green bar. You can see
how many computation power we're using,
how many file
your processes are doing, etc.
And once you execute
anything with IncrediBuild, you'll have that
out of the box. So it's either from
computation or testing or anything else.
You'll do with IncrediBuild, you'll get this very cool
graphical representation
of the executions that you're making.
So in the free version that Incredibly has inside,
the chips with Visual Studio 2017,
you can just run your compilations
and you'll see your compilation in a graphical manner.
And you'll be able to quickly analyze gaps
and overloaded areas
and where you are under-provisioning your cores, et cetera.
So when we do that, usually we want to put all this graphic visualization in Visual Studio.
This part of Visual Studio, you'll see it as a Windows embedded inside Visual Studio.
And when we integrate with another ID, we want to keep the same kind of experience.
We want you to have a plugin, an extension
that you can just build
with IncrediBuild and then the IncrediBuild
visualization won't open a separate
window but inside the ID
so this is kind of
how it makes sense
so you mentioned freemium
so if I'm a college student
living on ramen noodles
and I just want to install this on you
know my whole dorm so that all of us can can build our code faster uh you know what what is the what
is free and then what features you know uh costs require the professional bill yeah so uh the free
version gives you the ability to use incrediblyiBuild in a distributed manner only for five agents
up to 16 core each, but only
for a month. Although we have special
discounts for students, but
I know for a student that I
was a student in the past as well, so I know
how it works. But the actual
free-for-life part of the
free version is the ability
for you to run it on your local machine.
And then you can ask, well, that incredibly great technology
is the distribution technology. How would it help me if I can only run it on my local machine?
So first, the visualization part that I said
is you'll be able to use it for free for
life in the freemium edition. And it's really cool and it really
helps to see what's going on,
and it really helps to analyze your builds
and see errors more clearly.
And it's a podcast, so I can't show anything,
but it's really cool.
You can go into our website and see some galleries.
And another thing that we have specifically for us,
and we have a very rich
command line interface. So, for example, you can say stop on first error instead of just
letting your compilation continue, which is the default way of Visual Studio to run.
And another thing we did in Incredible, and it's free as part of the free room edition
for Visual Studio, for example, is we have, and that's something I didn't delve into
because there are so many things that we are doing,
I can't go into all the details.
We just spoke about the main concept of the technology.
But for Visual Studio, for example,
we have predictive execution.
So it allows us to actually utilize your own local cores
much better than the default way it's being used usually.
So if you have some, yeah, so we know what's the real dependencies in your solution,
and we know how to actually run it better.
Even if you, for example, if you define some dependencies that are not needed,
we'll be able to detect.
Sometimes when, for example,
if you have project two,
and project two depends on project one,
then usually the way that you'll see that
is you see the combinations of project one running,
and then the link of project one,
and then the combinations of project two, and then the link of project 1, and then the compilations of project 2, and then
the link of project 2. But essentially
most of the times, not always, but
most of the time,
only the link part of project 2
depends on the link part of project 1,
and not the compilations.
So with IncrediBuild, our predictive execution
will know that,
and will be able to run in parallel the
compilations of project one and project two
and then the link of project one and the link of project two so we can actually increase your
performance your build time performance even on your local infrastructure not only when you're
doing distributive builds very cool so this is kind of a random question, but I seem to remember on a GCC mailing list a few years back,
someone talking about like a multiprocess link or a distributed linker.
Does that exist or is that just kind of a fantasy?
Because I remember the linking part.
For example, if all your libraries are, let's say, static libraries that you're building,
then the linking part is really
what's going to kill you in terms of performance.
Can you use IncrediBuild
to speed up the link, or
is the link in Visual Studio still
only on one process?
It's not only Visual Studio.
I didn't hear
something that can break your
linking into multiple processes.
Yeah, I remember seeing someone just basically positing it.
But yeah, I guess it never actually existed.
Yeah, I know that that's something always coming from the industry
because links are bottlenecked because they can only run sequentially on a single core.
And unless you have multiple links,
with Incredible, for example,
once we link a specific project,
we can in parallel compile other projects that will need to be compiled later on.
So this is something that allows us to,
the link will not be a bottleneck.
But still, the link is a major bottleneck everywhere.
And the way that Microsoft, for example,
Visual Studio and others are addressing it
is to try and minimize the latency,
to maximize the performance
and optimize the main process itself.
So, for example, in Visual Studio recently,
Microsoft introduced by default a flag
which is called FastLink
which optimizes the
link time. But that's the way they
address this problem currently.
I didn't hear of anything that can
break the link time into multiple processes
but we really want it to
happen because then IncrediBit will
be able to distribute these
multi-linking processes into additional machines and will actually even happen because then IncrediBit will be able to distribute these multi linking
processes into additional machines and will actually even more increase the
link time so once this will be available I will be the first to adopt it so yeah
I actually I looked it up and it exists it's called gold so it's a gold only
it's only for Linux maybe or maybe not or maybe not but uh but yeah if you use
the gold linker then it's uh then it uh it runs in multiple cores so yeah potentially uh maybe
this gold will work with incredibuild although it might it might have so many dependencies that
it might not be worth it in terms of you know of the file transfer might end up being the bottleneck.
I heard that the file transfer can be a bottleneck,
but not that much with today's networks.
I know that there are users working with Gold with IncrediBuild
in the Linux edition of IncrediBuild,
but I never considered that.
I never looked into that,
so that's something I would do.
Thanks.
Sure, yeah, it's pretty cool.
Yeah, I know there's, I've seen companies where they build everything as a static library,
and then they do this enormous link, and so, yeah, it must be,
it's probably some combination of that and some other technology.
So what about the Incredibuild as sort of a company?
So you said that the company's been around for more than a decade.
So kind of where is it located?
Are you hiring?
What kind of positions are you hiring for?
Do you do internships for people listening who are college students?
That sort of stuff, like company-related
stuff. Okay, so just one
note before that
about Gold Inc.
in general.
One of the things, and it's really cool
and incredible because it's a generic
infrastructure. We actually
don't even know a lot of times what
our users are doing with
our product.
So I can just...
That's true.
Yeah, yeah.
And that's really cool because I'm speaking with the bank, for example, that can come
and say, listen, we are doing this huge amount of financial derivatives.
Or I can have a customer telling me that he actually...
It was a few years ago, for example, I had a very, very, very large,
one of the largest game studios telling me,
listen, we are accelerating Maya,
which is a commercial product with IncrediBuild.
And I said, wow, that's cool.
I didn't know that IncrediBuild
is accelerating Maya computations.
And that's something related also to the gold linker.
I'm sure that we have users
that are using any kind of compiler,
any kind of tool, because we are agnostic to the processes that you're running with us.
So we don't need to actually do a specific integration with any kind of process.
And that's why we sometimes don't even know for all the users that IncrediBit is used for.
I'll just give another example.
I visited another very large company in Japan.
They told me that they are doing a stress test in Incredibly.
And I said, well, how do you do that?
So stress test is where you want to stress the server.
So in order to stress the server in the past,
they needed to provision a lot of virtual machines on the fly,
copy the processes there, and make sure that they are running against the server
exactly at the same time in order
to stress the server with
requests and processing
things, etc. But they
said, with Incredikit, we only have now
a script file, and then every developer
can run this script file, these
processes will be automatically
distributed to a lot of machines, and they will connect to the server.
So these are use cases that I actually learned from customers and not coming from us,
which is very nice and very cool.
Yeah, that's amazing.
It's remarkable.
So regarding your question, we are hiring everywhere. We are hiring software developers. If you're doing internal, if you're an internal C++ developer, you are into operating system internals, Windows. We are looking for professional services guys,
QA interns.
It depends on what it is that you know how to do.
Usually when we are working with interns,
it's around deploying IncrediBuild
and testing IncrediBuild
with a variety of open source tools
and doing some white papers and benchmarks
and trying to integrate IncrediBuild with open sources
because there are so many tools and software out there
that are doing multi-process execution.
And we'd like to notify the market, listen, we can do that as well.
We can do this kind of compression, this kind of encoding,
and this kind of obfuscation, et cetera.
And that's usually the things that we're doing with interns
that are working with us.
We are located in Tel Aviv,
but we are working with people abroad as well.
We have an office in Japan.
We are working with the U.S.
So if you're located elsewhere and you want to work with us
and you want to intern with us, just let us know.
Cool.
Cool.
Good to know.
So what is – tell us the coolest thing about working at IncrediBuild,
like either the office, something kind of really unique,
like could be the location it could be yes I think I actually maybe I
would eat but I actually like the our customers because I'm working with the
largest customers doing the coolest products in the world today it can be
the largest software is the commercial commercial software can be the most popular games. And I
think that one of the most interesting
things is the ecosystem
IncrediBuild is working
inside. So it's gaming
and continuous integration, public
cloud and DevOps and
Visual Studio and other IDs and
financial derivatives. And
there are so many things I need to learn
all the time because
Incredibility is used in such a large variety of problems that I will
never be able to cope with the entire ecosystem we are working in.
And this environment of continuous learning and, you know,
sticking with the customers and users and always learning new stuff,
physics engines, you know other
infrastructure new technologies this is something we always need to keep up with the pace because
we are used inside this enormous industry to do practically anything inside this industry
so that's i think one of the coolest things i i never stop learning here and this is something
i really love to do. Cool. Very cool.
Makes sense.
You should have a regression test that mines a Bitcoin.
You already have solutions for that.
You could finance the new office.
We needed to start with that, I think, a few years ago.
Yeah, right. Cool. Well, thanks so to start with that, I think, a few years ago, and then. Yeah, yeah, right,
cool. Well, thanks so much for being on the show. This is super awesome, and so there's a free version, and there's also a student discount, so if you're a student, definitely check it out.
I think it would be, you know, the ease of use is, I think, by far the most compelling part. I mean, you could, you know, in your dorm or in your lab,
you could install this on a bunch of machines
and just kind of see what happens.
Like any software you're developing right now,
you could just, I mean, assuming it's multi-processed,
you could just run it in this environment
and just, you know, it's kind of wild
just to see what would happen maybe
it would save you from having to uh either wait a lot or or or parallelize it all by hand so yeah
project that you want to try and accelerate it within credit build in your in your university
or etc uh just uh drop us a note and we will help Oh yeah, actually with that in mind, give us some ways
to reach out, so what's
the website, incredibuild.com
or what's your website?
Yeah, incredibuild.com
and if you have something specific you'd like
to ask or contact us
I think that the best way, if it's a technical
question, you can just
email support at incredibuild.com
and we'll get back to you.
Cool.
Very cool.
Are you on Twitter or Facebook or any of that?
Everywhere.
All right.
Everywhere.
Okay.
Okay.
We'll get the Twitter handle and Facebook and all of that
and we'll post it with the show notes.
Great.
Okay.
Cool.
Thank you so much, much dory for your time
and uh yeah if anyone has any questions uh yeah check out the show notes we'll put away for you
to get in touch with dory and you can uh shoot them an email support that incredible.com great
thanks a lot guys it was great fun
the intro music is axo by Binar Pilot.
Programming Throwdown is distributed under a Creative Commons Attribution Sharealike 2.0 license.
You're free to share, copy, distribute, transmit the work, to remix, adapt the work,
but you must provide attribution to Patrick and I and sharealike in kind.