Programming Throwdown - Parallel Computing with Incredibuild

Starting point is 00:00:00 programming throwdown episode 73 parallel computing with incredibuild take it away jason hey everyone so um chances are you're you're listening to this after the giveaway um we're recording this before the giveaway just before the giveaway but hopefully uh if you're a patreon subscriber you got one of our really cool uh um you know laser cut it laser cut acrylic little stencils uh that's a guarantee and hopefully if you're out there you got something else cool too in the raffle but today we have a really cool episode um we're gonna talk to dory exterman who's the cto of incredibuild and we're gonna talk all about kind of parallel computing um at incredibuild they've built this really cool system for

Starting point is 00:01:03 um kind of combining. I'll let him kind of explain in more detail, but combining parallel computing and allowing you to use kind of the machines you have right now sitting around your office in like a distributed fashion. So, Dory, why don't you tell us kind of your background? What got you into parallel computing and sort of bit of history? So my background is I think I have more than 25 years of experience in software development. I did a lot of development in the areas of information and later on in low level as well and after that for a couple of years I consulted many companies in different areas of technologies advanced

Starting point is 00:01:54 technologies client server multi-tenant etc which led me quite naturally to part of computing and efficiency and optimization areas and seven years ago I joined in Credbill to lead the technical side of the company and since then I'm here and it's one of the coolest companies I ever worked in and I think I'm going to be here for a while so yeah. Cool, you've been there seven years right? Yeah, yeah, that's pretty amazing. Yeah, that's awesome. Was it, did you start the company or had it already been started?

Starting point is 00:02:37 The company is very mature, it's, I think we are more than 15 years in the market already. Since, from the era in which power computing wasn't that popular, you only had one core on each PC. And then, you know, things went very slowly. So in these eras, when you needed more horsepower, you didn't have any way actually to achieve it. You couldn't buy, purchase a machine with multiple cores unless you had a lot of money. And the idea behind the company came actually from CETI at home, which is a NASA project from the beginning that tried to

Starting point is 00:03:18 find extraterrestrial lives using distributed computing. So in the 80s you were able to install some software on your computer and then NASA would distribute these audio files they recorded and your computer was able to analyze these audio files and see if there are signs for extraterrestrial lives. I remember that. Yeah, I remember having the SETI at home screensaver and so basically when your computer went idle, instead of having a bouncing ball or something like that, you would just be scanning for anything.

Starting point is 00:03:53 And it's still a live project, I think. They're still doing that. And in this era when you had just very limited resources, so NASA really needed your own computers in order to scale this. And the founders of the Incredibles thought, well, if NASA is doing this for extraterrestrial life finding, why can't we do that for other stuff as well? And they started in accelerating visual studio builds.

Starting point is 00:04:21 So this was 15 years ago, when you only had one core. And when I joined the company, we already had multiple cores. So we thought that once people would have 16 cores or even four cores, the problem would go away. But in fact, what we are always finding is that in the same ratio as the amount of cores are raising in your local machine, the problem grows bigger as well. So people always need more resources, which makes us still relevant and even more relevant than we were in the past. Cool, that makes sense.

Starting point is 00:04:56 Yeah, I mean, I think you kind of touched on it, but yeah, it sounds like the big motivation behind parallel computing is that you have sort of some very expensive process but it can be broken up into pieces and then solved in parallel yeah um what what are uh you what are you most of your customers what what are they you know trying to solve i mean not you know i'm sure there's there's you know confidential things and things like that. But in general, what's the sort of areas that your company can help? Essentially, we've started with accelerating Visual Studio. So we are focused there even today.

Starting point is 00:05:40 So many of our customers, I wouldn't say all, but many of our customers, I wouldn't say all, but many of our customers are Visual Studio users and are accelerating either their Visual Studio compilations or any kind of computation that they have as part of their continuous integration systems. Six, five years ago, we opened the technology. We wrote it from the beginning to be generic. And we opened the technology for users to be able to use it for any kind of compute-intense execution. So today, users are using it for compilation, for testing, packaging, artificial intelligence training, weather forecasting, financial derivatives. We are highly popular in the gaming industry. In the gaming industry, I think we are used by something like the largest 1,000 studios in the world doing Xbox, Sony, Nvidia Shield, Nintendo, PC, and Android, and VR, and any kind of game.

Starting point is 00:06:46 And the reason that it's highly popular in the gaming industry, I believe, is that they have so many things that they do as part of game development that require a lot of resources that are compute-intensive. So it's not only C++ compilation, but usually it's rendering, video rendering, image processing. You have a lot of shaders, which, for example, when you have a physics engine, you need to pre-calculate the shades of the objects in order for you not to do that while the game is running.

Starting point is 00:07:20 Yeah, this is something not a lot of people know but basically the game uh developers they they try to sort of cheat as much as possible like yeah if there's sort of really nice soft lighting chances are that that's not being computed on the fly because that's extremely extremely expensive um there's this thing called monte car, you know, photon mapping, where literally each light just emits a bunch of rays of energy. And every time one of those rays of energy hits anything, it bounces and it becomes basically a new light source. And so it just blows up exponentially. Right. And that's how, you know, you could imagine even in real life, if you open up a window just a tiny amount, you can light up a whole room. And that's because the photons are bouncing all over the place. Right.

Starting point is 00:08:10 And so to to make that same effect in the game where it's not like everything is pitch black except for where the sun hits, they need to do this really, really expensive process. And so that always happens, you know before uh the game is even shipped they just pre-compute that um for everything that that isn't moving and as you said yeah it's extremely expensive but it's completely parallelizable you could do every room independently yeah and also for example if you take uh the fifa game for example you have different kind of stadium different different kind of lighting. And this lighting affects the way that lights and shades are going to interact. And also the players themselves.

Starting point is 00:08:53 For example, Messi and Ibrahimovic have different sizes and different way of interacting with the light. So you can pre-compute that. And that's what they're doing. And they have millions of pre-computations, and they can prepare that in advance, and that's what they're doing, and they're doing it with IncrediBuild. Cool, that makes sense. So what are the big differences between, you know, for example, instead of IncrediBuild, I could go and get some really powerhouse machine.

Starting point is 00:09:23 Maybe it has two processors. Maybe it has 36 cores or something like that. I don't know if that's even supported. Let's say 36 cores exist. No, you can say 80. Oh, okay. Oh, we have customers with 80 cores, yeah. And so what's the difference between me doing that,

Starting point is 00:09:40 spending maybe a lot of money up front and buying some 80-core machine versus using something like IncrediBuild doing that, spending maybe a lot of money up front and buying some 80 core machine versus using something like Incredibuild that's going to go back and forth over the network. So I'll give you a live scenario. For example, let's assume you are a very large, I don't want to name names, but let's assume you're a very large company, an enterprise company, and you're working on a game or a different project software, and you have 100 engineers working on this software.

Starting point is 00:10:10 So you'll need to purchase 100 multiplied by 80 core machine in order for each engineer to have these 80 cores to really run faster. With IncrediBuild, you can have each developer having eight cores on his machine and each of them will be able to use all the IOS CPU cycles with IncrediBuild seamlessly of all the other engineers working in his local network. So every engineer can essentially tap with IncrediBuild into 800 cores instead of just 80, which will also cost you much less. An 80-core machine is very expensive.

Starting point is 00:10:50 You won't be able to purchase one for each of your developers. And also, 80-core, you think it's huge, but when you have a computation that takes 24 hours, that takes two days, even 80 cores is not fast enough. Our users are using CrediBuild to distribute to hundreds of ports, and not only 84. 84 is not

Starting point is 00:11:11 the high end of our users. That makes sense. Also, Bill goes on vacation to the Canary Islands or something. His machine is just sitting there wasted if he didn't have some type of distributed. With Intranet you have other stuff as well. So for example, we have

Starting point is 00:11:30 companies that have multiple sites, geographical sites. So one site, when in one site it's daylight and working hours, the other site it's nighttime and their resources are idle. So they can use the resources from one site to accelerate the computation of the other sites, which is quite cool. Another thing is that you can always

Starting point is 00:11:56 with IncrediBit, and that's not something you can do when you purchase hardware, and this is something we see more and more. You can always scale to the public cloud. Essentially all we need is a virtual machine. So especially in peak times, before game release, before Christmas, when you need to do more testing, you have more compilation.

Starting point is 00:12:14 And before releases, that's usually when you need more resources and you don't have them. With IncrediVity, you can always scale. You can say, okay, I'll just add some 100 cores in any kind of public cloud available and connect them to my local network, and boom,

Starting point is 00:12:33 you have more resources to use. And, you know, time to market is very essential in these sectors. That makes sense. I guess, you know, what if, you know, what if you have a problem where there's sort of a lot of data? You know, so is that sort of, is that something that IncrediBuild can handle? Or how would you go about solving that problem? Like, for example, I mean,

Starting point is 00:13:01 you brought the example of the arenas, the football arenas. I mean, maybe those files aren't that large and they can get passed around. It's really the computation. But what if it's something like... Yeah, gigs of data, that's right. That's a problem. So there are large meshes. There are... That's usually specific. So there are large meshes. That's usually in specific types of software.

Starting point is 00:13:28 For example, in genetic algorithms, when you're trying to calculate genetic algorithm stuff, usually the data is very, very large. If you can break the data into multiple subsets, then it will work for you. It will be good because you can say, okay, when I'm running these types of algorithms, I need only this subset of the data,

Starting point is 00:13:54 and then you can pass it away, which is cool. But there are scenarios in which you have really, really huge data, and not only that you have huge data, you need to load all this data to your memory, or else the computation time will be very, very large. In these kind of very specific types of calculation, usually you won't be able to get around it, nothing will help you, not an incredible, not a cluster, and you need a supercomputer in order to compute that in an efficient manner.

Starting point is 00:14:28 So there are specific problems that you'll need an HPC machine, and that's why HPC machine exists. But if you can break the data into multiple and smaller data sets along with the computation that go around that, then you can use IncrediBuild or you can use clusters or any kind of distributed computing technology. So I guess what's the difference then between, you know, I've seen there's, let's say, OpenMP or MPI or these kind of things. Or I guess maybe OpenMP is a good example where you, you know, or maybe LaPack or something where there's parallelism kind of built frameworks where sort of every machine is running some dedicated kind of server that can handle chunks of data and processes. It sounds like IncrediBuild kind of fits in between. It's sort of for people who don't want

Starting point is 00:15:40 to have this huge farm of machines just sitting there to do computation and have to deal with map reduce and all of that um but it's also not all the way on the other end where you're having to write like modify a lot of c++ code this is kind of right in the middle so i think that uh the major difference so hadoop is usually usually for data analysis and it's less used for computation. But although it is, the main purpose of Hadoop is usually for big data analytics and things like that. And with both in Hadoop and OpenMP or other infrastructure for parallel computing, you'll need to write your software in order to accommodate these technologies,

Starting point is 00:16:29 these infrastructures. You need to deeply integrate with these technologies and you need to write your software in advance in order for it to be able to work with these kind of technologies. You need to consider a lot of stuff. You need to consider how you handle fault tolerance or if a node just goes down, what is the computer doing in your software, how you handle scheduling, how you handle in advance,

Starting point is 00:16:54 how you handle the data transfer and synchronization. With IncrediBuild, the idea behind the product was to give you a solution that will simply work out of the box. So, for example, if you have something that you wrote, okay, and if I provide you, so you wrote and you can run eight processes, hundreds of processes in Parallel, but you only have eight cores. With IncrediBuild, the idea was that you just plug in IncrediBuild, you install it on every agent you have in your local network, and IncrediBuild will seamlessly use these idle resources of the other machines as though they reside in your own laptop. You don't need to install anything on this remote machine besides IncrediBuild.

Starting point is 00:17:41 You don't need to transfer files. That's kind of magic. We're doing it with a very unique technology which allows us to virtualize the process on the fly. But essentially, the idea was to give you a fly-and-play solution. So you don't need to write anything specific in your software in order to allow IncrediBuild to do its trick and allow you to scale. So I think that's the major difference between these kind of solutions. Cool. So how does that actually work? So in other words, let's say I have some program that,

Starting point is 00:18:20 let's say it plays a game and it outputs the result of that. It plays a game between two AIs and it outputs the result of that game to a file. And you want to do that, let's say, a hundred times or a thousand times. So what you want in the end is to have, you know, a thousand files on your own machine with all of the replays.

Starting point is 00:18:44 And so if IncrediBuild is farming all of that out, how does it go and collect all the files and how does it sort of manage all of that? So I think that's the very unique technology we developed in IncrediBuild, which is a process-level virtualization technology, which allows us essentially to take any kind of process and distribute it to any kind of remote machine and emulate the environment that the process requires in order to successfully run on this remote machine as though the process is running

Starting point is 00:19:18 on your local computer. The way that we're doing that for any kind of process, it's not only the output file, because for example, so let's assume that as you said you have this game that you want to play 100, 1000 times. This game is a process that gets some parameters. With IncrediBuild, automatically you run all these processes in parallel. IncrediBuild, automatically you run all these processes in parallel. In CrediBuild, you simply have your command line. Let's assume that you have a power meter saying how many of these processes to run in parallel. So you will have playgames.exe 100, which will say playgames.exe to execute 100 game.exe.

Starting point is 00:20:03 In CrediBuild, your main process will run all these 100 game.exe. Incredibly, your main process will run all these 100 processes. Incredibly, it will then take hold of this queue of you trying to execute 100 processes, and it will interact with a coordinator component of Incredibly, telling you, listen, I need 100 cores. Just give me whatever you have. And let's assume that the coordinator looks around all the agents that are installed in your network and will be able to provide you 100 cores. And then the IncrediBit on your local machine will tell a remote machine to run an instance of a Game.exe. Now, on this remote machine,

Starting point is 00:20:46 you don't have anything besides IncrediBit. You don't have the game.exe process. You don't have the DLLs that maybe the libraries that this exe requires. You don't have input files. Nothing is there besides IncrediBit. The way that we allow this to happen is on the remote machine, you have an IncrediBuild agent.

Starting point is 00:21:07 And the IncrediBuild agent will run this process on the remote machine and will essentially inject IncrediBuild code into the process level. So this code of IncrediBuild, this is injection technologies, this code of IncrediBuild will actually act as a middleman between the process running remotely and the remote operating system. So all the calls that interest us, that tries to reach the operating system from your process, will first be intercepted by IncrediBuild. So once you try to open the file on a specific location,

Starting point is 00:21:43 this file does not exist on the remote machine because you don't have anything, any DLL, any input file on the remote machine. But Incredibly will intercept the call. So if you try to open a file in my documents, a.txt, for example, Incredibly will intercept the call. It will see that you don't have the file, the Incredibly cache. It's kind of sandbox we manage on the remote machine. And it will see that you don't have the file in the Incredible cache, it's a kind of sandbox we manage on the remote machine, and it will then go to the Incredible agent on your local machine and ask for this file, and it will copy this file on demand to a special location, special cache that we have on the remote machine.

Starting point is 00:22:21 And it will then redirect the API code, the OS API code, instead of opening the file in my documents in Windows, it will say open the file in C program files in credit build cache a.txt and then it will only then the API code will be passed to the operating system. The operating system will then go to the location we provided instead of the original location. The file exists there because we copied it, and it will open a file and bring back a handle. Incredible will then take this handle, and it will forward it to your process running on the remote machine. So from the OS perspective, there was a file, and it opened it. And from the process

Starting point is 00:23:01 running remotely, the process perspective, it has handled it and can work with. So this is what we're doing for any kind of file system related call or any kind of thing that we'd like to virtualize. So it can be opening and loading a DLL, writing a file, opening a file, creating a directory, accessing the registry. Every kind of thing is virtualized on the fly by IncrediGrid. So essentially the remote process thinks is actually it has on the fly providing with an emulation of everything that it needs as though it's being

Starting point is 00:23:34 executed on the local machine. So once the file, once this process tries to write it out we do the same thing. We intercept the create file and write operation and we redirect them to our we do the same thing. We intercept the create file and write operation, and we redirect them to our own special location. And once the process finishes running, we simply synchronize back the files that were created by this process

Starting point is 00:23:56 to the original place where the process tried to write them on your local machine. So from your perspective as a user, it's really as though you have 100 cores running for you on your local device. And it's not only cores, it's also memory, it's also any kind of network bandwidth you can use more because you're actually using more computers, more memory, more cores, more CPU processing power, etc. That makes sense. So just to walk through the example here,

Starting point is 00:24:24 so you would, let's say you had. So just to walk through the example here, so you would let's say you had 100 machines you called play game with 100 it would create 100 processes, let's say one on each machine, and then when it went to play the game, that game might require all sorts of rules

Starting point is 00:24:39 and other files and on demand those get pulled from your machine and then distributed out, like fanned out to these 100 machines, they're going to go and play 100 games in the time it would take you to play one game on one thread locally. And then when the game finishes, they'll save the replay, and the Incredibuild will detect, oh, you know, there's a new file here that was created by this remote process.

Starting point is 00:25:07 I better send it back to the main computer so that it knows it's there. Yeah, exactly. Simple, right? That makes sense. Yeah, I think it's clever. Yeah, very clever. But it requires a lot of deep understanding into the operating system, low level, intercepting all these calls and implementing this. It's highly complex. And you're speaking about multi-tenant, asynchronous, highly parallel execution that makes it very complex. Fault tolerant. If one machine goes down, we need to recover automatically.

Starting point is 00:25:50 These are things when you're trying to do something like this yourself, you need to take care of that with Incredibilit taking care of it for you. So this is quite complex to develop. Yeah, that makes sense. Go ahead, Patrick. Yeah, so doing this process-level virtualization, does that imply that all of the machines are running a fairly similar version of the same operating system? So we support different flavors of the same. So if you are running Windows, you can work with any kind of Windows version.

Starting point is 00:26:18 You can distribute Windows 7 to Windows Service 2016. But you cannot distribute Windows to Linux because the way the operating system works are different. But in Linux, for example, you can work with Ubuntu, CentOS, Fedora, etc. in the same grid. So we can't distribute Ubuntu

Starting point is 00:26:37 processes to Fedora or to CentOS or etc. So then for things like shared libraries and dynamically loaded things, those all still have to be passed over the network then? Yeah, but we only need to pass them once and then we cache them on the remote machine. So the next time

Starting point is 00:26:54 you run any process that will require the same libraries, they're already there on your helper machine so we don't need to transfer them back. So you only get this latency usually on the first process you ever execute on your infrastructure. Because in a regular scenario, your 100 developers will more or less use the same kind of DLLs, the same infrastructure.

Starting point is 00:27:18 So your helpers, after a while, their cache will be filled with data, and then their network latency will be very minimal. From our test, it's just a warm-up call, a few milliseconds. And then how do you sort of understand up front the weight of a process? So in Jason's example of playing 100 games, all 100 processes are doing roughly the same amount of work um but what happens if you know you're spanning up 100 and you have you know a exponential distribution where you have a few processes doing you know orders of magnitude more work but yet you know in your your you know sort of networked set of computers some are very powerful some are

Starting point is 00:28:01 not yeah so uh we have we have so we have a component which we call a coordinator. Its role is to do assignment algorithms. So all the agents, all the incredible agents report to this coordinator, and the only job the coordinator needs to do is to coordinate between these agents. So the way that we solve this question that you have is by providing each agent some kind of grade. We grade each agent, each computer, and then we will always try to use the, you know, the strongest machines at the beginning and not the weak machines. And we have some optimizations for

Starting point is 00:28:44 that. And for example some optimizations for that. And for example, let's give another example. For example, let's assume that I'm using your machine as a helper. Your machine was idle, so I used your machine. But then you as a developer, suddenly you do something on your machine. You copy a large amount of data or you just start running a game. We don't want to disturb you so once you started the game and you want to play it and Incredibility is utilizing your idle resources these resources are not idle anymore so one of the mechanism is to be able to detect this and to stop all the

Starting point is 00:29:19 Incredibility processes running on your machine as a helper in order not to disturb you doing your own work and re-executing them on a different machine. So from my perspective as a user, I won't even know that my processes were terminated on your machine and were rescheduled to run on a different machine. So there are a lot of things we need to take into consideration in order to streamline the experience. What about inter-process communication? How does that work? So that's a good question. With IncrediBuild

Starting point is 00:29:52 there are a set of limitations that you need to meet in order to be able to use our product. So inter-process communication is one of them but it really depends on the way you do inter-process communication. If you're using, for example, shared memory in order to communicate between processes, I won't be able to distribute the process to your machine if it tries to use the shared memory to communicate with the process running on my machine.

Starting point is 00:30:21 So this is something we do not support. But if you're using TCP IP, which usually you won't do for inter-process communication, but if you do, that's supported. That's not a problem because it can work across the network. So it really depends on how you implement process communication. Another thing that we have a lot is a scenario in which you execute multi-process,

Starting point is 00:30:49 which do inter-process communication using shared memory, for example. But it's okay to run all of these kind of processes on a single machine. So you just need to tell us in a way that these processes should run together on a single machine and we'll distribute all of them as a batch to a specific machine. Yeah, that makes sense. So, you know, I guess a lot of people, I would imagine almost everyone listening has built code. Almost everyone listening has, you know, opened up Visual Studio or run GCC or something like that, right? And for most, you know, when I was in college,

Starting point is 00:31:22 most of my projects compiled pretty quickly. So going to just the sort of origin of the company, why do builds take a long time? Why do they need to be parallelized? Why don't builds just finish in three seconds? Yeah, I wish they would. I'm not sure that I would say that again because otherwise I won't have a lot of things to do here. Although, as I said, we are doing today, I think that the industry is going towards testing and not only building.

Starting point is 00:32:00 If you're going, for example, just as a side note, if we're speaking about the trend of continuous integration, continuous delivery, DevOps, so you want to streamline the ability to take your product to deployment every time and have short iterations. I see a trend of companies investing a lot of effort in testing even more than in other areas because if you want to deploy automatically, you need to have a very, very large coverage. Yeah, that makes sense. And nowadays with the Internet, they're constantly pushing out new patches. And so they want to make a new release every week or every month

Starting point is 00:32:40 or something like that. And every single release needs to be good. Yeah, not only that, we see a trend and that's very popular. And I think the industry is moving towards this direction. And that's the holy grail. But we already see companies doing this holy grail. So every commit that a developer pushes can be automatically deployed into production automatically. And that's amazing

Starting point is 00:33:05 because that really gives you the competitive edge. As a CTO, as someone who manages software delivery, I can tell you that a lot of times the developer says, yeah, we finished it one month ago, but unless it's

Starting point is 00:33:21 on the users running, you didn't do anything. If it's only on your environment, you actually didn't deploy anything. The trend is towards continuous deployment, which means that once I do something, I'll fix a bug as a developer,

Starting point is 00:33:38 I'll push commits, and then I'll have a completely automatic flow that will take my commit into production without me doing any manual thing in the middle. And it will be able to also automatically roll back. But in order to achieve that, you need to have a lot of testing

Starting point is 00:33:57 because you need to make sure things work correctly in order to be automatic, full automatic. And that's where I see more and more users using Credible to actually accelerate their testing. So that was a side note. But for compilations you're right. If you are doing a very small software, it usually takes a few seconds. It actually also depends on the language you're using. So languages that compile to machine code such as C and C++

Starting point is 00:34:31 will take longer to compile than languages which compile to intermediate language such as C Sharp and Java. So essentially you have, we are working with users that have very, very large code base. I don't want to mention names, but very large software running on your Windows OS,

Starting point is 00:34:56 they can have the largest that I'm familiar with working with IncrediBuild has 20 gigs of source code. And I'm speaking only source code. Yeah, that's pretty amazing. That's unbelievable. Yeah, that's one of the, it's a very known product. It's used by a lot of users worldwide. And that's 20 gigs of source code. 20 gigs?

Starting point is 00:35:20 I mean, I'm trying to wrap my head around that. Just sources. Each line is 80 bytes, right? And so, yeah, that's a lot. Yeah, that's a lot. And it takes something like,

Starting point is 00:35:33 to compile this software, it takes commercial software, it takes 20 hours to compile it. And you can reduce it to less than an hour. Maybe I'm getting this wrong, and I'm going to get a ton of hate mail, but I just did 20 billion divided by 80. So assuming every line is completely full of code,

Starting point is 00:35:56 which is a very conservative estimate, you're still looking at 250 million lines of code. Yeah, that's a lot of software having. Yeah, but a lot of it could be auto-generated. Yeah, that's... But you have a lot of software having... Yeah, but a lot of it could be auto-generated code. Yeah, that's true. Yeah, exactly. So that's one of the...

Starting point is 00:36:11 You're asking how do we reach this kind of large code. So auto-generated code, using templates, working with third-party libraries that you don't know exactly what's going on there. So this really makes your code very loud. I saw a game recently where they actually had, it was EVE Online, actually has Chrome, the entire Chrome framework in the game so that they can do browsers. So, for example, the help in the game is actually a web browser that renders in the game so that they can do browsers. So, for example, the help in the game is actually a web browser

Starting point is 00:36:48 that renders in the game. And to make that happen, they have all of Chrome. So I'm pretty sure Chrome is a ton of code. Actually, it's part of our regression. The Chrome project is part of our regression tests. So we want to make sure that we ship products correctly and we have our own tests. The way that we test our product and see that we didn't introduce new bugs into it is to run a lot of these kind of products. We compile Chrome, we compile Qt,

Starting point is 00:37:27 and we compile a lot of large open sources and physics engines, et cetera, in order to see that everything works well. And Chrome really is quite large. It takes a long time to do that, but it's highly powerful. So if you have a lot of resources, you can compile it quite fast, as opposed to if you only have 8 or 16 cores in your machine.

Starting point is 00:37:52 So yes, that's how you increase your source code. And in today's trend, you have a lot of open sources, you have a lot of open software, etc. So we see a lot of developers, if it's not well-architectured, we see a lot of software having a specific problem and they just find some kind of open source in the net and say, okay, I'll use that to solve my problem. And bam, you have, I don't know, 200,000 lines of code you need to compile. And that grows and grows. Yeah, think also the

Starting point is 00:38:26 auto generation, you probably have a lot of things that they want baked into the code.

Starting point is 00:38:33 Even like an icon image and things like that, they'll actually have some

Starting point is 00:38:38 software that converts that into a C file so that it can't be tampered and things

Starting point is 00:38:43 like that. And yeah, that makes the code base huge. Another thing that I see a lot, and that's something when we see code samples from users, sometimes we see that people are placing huge amounts of stuff in their include files. So they include more and more stuff as part of their headers, which require you to do a lot of compilations in order to get it running. So there are a lot of good practices, how to write a good code that will actually not overload your compilation time. And it's a good practice to adhere and understand that there are some good practices to develop this kind of efficient software.

Starting point is 00:39:31 Can we rant against Boost now? Against Boost? We're talking about keeping build times low? So I worked on this project, this project that worked, but it was kind of a side project called Eternal Terminal, which is just kind of a replacement for SSH that we're using at the place I work. And one of the things, so I ended up kind of not being able to dedicate as much time to it as it really needs. So the company actually hired somebody who's full-time working on it now. And that person removed Boost. The only thing I was using Boost for

Starting point is 00:40:11 was the circular buffer and a couple of other kind of minor things. And yeah, the compilation time sped up enormously. It's just because it's all, I guess, header libraries that are getting analyzed. So I mean, that's the joke about Boost, is that it always makes build times bad. And it's not an unfounded joke,

Starting point is 00:40:32 but there are good reasons for it. Yeah, that's a common problem that we see with users, with our users. A lot of our users are using Boost, and that's good for us, you know, because we help them compile faster. Another thing you need to take into consideration is the fact that sometimes you take another open source. For example, you just want to solve something.

Starting point is 00:40:56 I don't know why. You want to have some math calculations or a specific problem that you see a nice library, a nice open source that solves that. And in the background, this open source uses Boost, and you don't even know that. Yeah. So that's something we see a lot. If you are using, people that develop open source tend to rely on other open sources as well

Starting point is 00:41:17 in order to get it into the market faster and in a more stable manner. That's cool. But then you just add one open source and in the background, it adds 10 more open sources to your code without you knowing that. And that's how software gets bigger. And that's one of the things that we see more and more. And I expect to see even more in the future

Starting point is 00:41:38 because open source is really where the market is going. You can see a lot of companies, commercial companies, opening up their code to be open source. Yeah. And yeah, that's where the industry is moving. So I think that we'll see more of that. Yeah, I think it's in some sort of like game, from a game theoretic standpoint,

Starting point is 00:42:00 it's in some kind of well right now where if a company isn't going to open source their technology that actually creates a liability for the people who are working at that company right like microsoft was like this for a long time they were suing the mono people um they were actually uh they made their closed source version of java the j plus plus and i think oracle sued i don't know but they were trying to keep everything locked down. And what they found is that it was just very hard to get talent and even to get people to use the software because it creates this liability.

Starting point is 00:42:36 And now you really see Microsoft kind of being one of the last people to the party there, but open sourcing a lot more of their technology and and now embracing mono and things like that so i i can tell you that uh from recent uh series that i saw etc that microsoft is one of the today's one of the largest contributors to open source which is uh which is yeah yeah and they open source and the speed they open source i think also dot net and they open source many of their tools that once were very close. And it's very good for them. It's doing very good for the industry, for them as well, and for the adoption.

Starting point is 00:43:15 And people now are able to add and work with this more efficiently. And that's where everything is going. So I think that we'll see as i said more and more of that yeah i actually tried the uh visual studio code which is a brand new browser they forked uh adam which is a github browser created by the github company and uh visual studio code is pretty amazing uh it's totally open source and i was was really, really impressed. It has tons of users. It's really good. It works really fast.

Starting point is 00:43:49 They're really investing in making it light and fast and cross-platform, of course. And it supports. The last time I saw it was something like more than 100 languages. Yeah. It's really cool. It's a cool product. Yeah, yeah. It's amazing.

Starting point is 00:44:05 And you're going to see Incredibil there as well, I believe, next year.'s really cool. It's a cool project. Yeah, it's amazing. And you're going to see IncrediBuilder as well, I believe, next year. So it's not only Visual Studio. We have more and more commercial tools, also IDEs, so you can work with it in Qt Creator, in Visual Studio, as I said, and

Starting point is 00:44:21 you're going to see it soon in Eclipse and in CLI and others as well. So when you try to, I guess, in the sense that you're mentioning, oh, you're going to see it here, you're going to see it there. So there's still something that the end user, like kind of walk us through what the end user has to do. Let's say I made some evolutionary computation system, and it's some binary I wrote. It creates a bunch of processes, and each of them does some simulation, and then I collect the results and do some analysis, right?

Starting point is 00:44:58 So I have this EXE. It does all of this on my machine, and I want to use IncrediBuild. What do I have to do? In other words, what is involved in getting it to support a new application? Yeah, so let's assume that your main process is main

Starting point is 00:45:16 and your sub-processes are sub. And main process executes 100 sub-processes. So the way to integrate IncrediBuild into that is to open an XML file and to say main.exe space allow intercept equals true. That tells IncrediBuild that this is the parent process that executes sub-processes. And then you have a new line and you say sub.exe space allow remote equals true, which tells IncrediBuild every time that this main process will execute a sub-process,

Starting point is 00:45:53 I want this sub-process to be executed remotely by IncrediBuild. And that's it. That's the only configuration file you need to edit. And then let's assume that your main command was main.exe 100, which tells your main.exe to run 100 sub-processes. The only thing you need to do is say, ibconsole, which is the IncrediBuild command line interface, slash command, and pass your original command to IncrediBuild. And that's it. That's all the integration you needed to do. And your 100 sub-processors will be automatically distributed by IncrediBuild,

Starting point is 00:46:30 and all your outputs will be automatically synced back to your local machine. And from your perspective, it's really as though you have 100 cores on your local laptop. That's it. It will take you two minutes. So when you say IncrediBuild is coming to Visual Studio Code, for example. So what that would be is like a module that would come with an XML file that's designed for Visual Studio Code. And and yeah, I guess just that XML. Oh, and also it would have to call the IncrediBuild function when it's launching the binary.

Starting point is 00:47:06 So it would do those two things, but you would provide that as like a package for Visual Studio Code. Actually, users are already using IncrediBuild, both in Visual Studio Code with Eclipse, with CLion, et cetera, because they all have essentially a command line behind them. So you can use today, you can actually execute any command line that you have with IncrediBuild. So if you're using CLion for example, once you compile your code with CLion it generates a CMake which is a build system, a CMake command line.

Starting point is 00:47:40 You can simply take this CMake command line and run it also today. Customers are doing that. They're just running their CMA command line with IncrediBuild and they accelerate their C-line executions. When I'm saying that we integrate into that, it means that we need to have a plugin and extensions that you see as part of the IDE itself. It doesn't mean that people are not working with it today, but it's just a matter of doing usually a plugin of IncrediBuild that will wrap everything out and will interact with the IDE itself. We also have a very, very cool visualization, which is very, very great because it allows you to have all these

Starting point is 00:48:27 hundreds of thousands of lines of textual output drawn to you by IncrediBuild as a graphical representation. So you can see very easily what's running, where it's running. If it fails, you'll see a red bar. If it fails, you'll see a red bar. If it succeeds, you'll see a green bar. You can see how many computation power we're using, how many file your processes are doing, etc.

Starting point is 00:48:54 And once you execute anything with IncrediBuild, you'll have that out of the box. So it's either from computation or testing or anything else. You'll do with IncrediBuild, you'll get this very cool graphical representation of the executions that you're making. So in the free version that Incredibly has inside,

Starting point is 00:49:11 the chips with Visual Studio 2017, you can just run your compilations and you'll see your compilation in a graphical manner. And you'll be able to quickly analyze gaps and overloaded areas and where you are under-provisioning your cores, et cetera. So when we do that, usually we want to put all this graphic visualization in Visual Studio. This part of Visual Studio, you'll see it as a Windows embedded inside Visual Studio.

Starting point is 00:49:40 And when we integrate with another ID, we want to keep the same kind of experience. We want you to have a plugin, an extension that you can just build with IncrediBuild and then the IncrediBuild visualization won't open a separate window but inside the ID so this is kind of how it makes sense

Starting point is 00:49:56 so you mentioned freemium so if I'm a college student living on ramen noodles and I just want to install this on you know my whole dorm so that all of us can can build our code faster uh you know what what is the what is free and then what features you know uh costs require the professional bill yeah so uh the free version gives you the ability to use incrediblyiBuild in a distributed manner only for five agents up to 16 core each, but only

Starting point is 00:50:27 for a month. Although we have special discounts for students, but I know for a student that I was a student in the past as well, so I know how it works. But the actual free-for-life part of the free version is the ability for you to run it on your local machine.

Starting point is 00:50:44 And then you can ask, well, that incredibly great technology is the distribution technology. How would it help me if I can only run it on my local machine? So first, the visualization part that I said is you'll be able to use it for free for life in the freemium edition. And it's really cool and it really helps to see what's going on, and it really helps to analyze your builds and see errors more clearly.

Starting point is 00:51:11 And it's a podcast, so I can't show anything, but it's really cool. You can go into our website and see some galleries. And another thing that we have specifically for us, and we have a very rich command line interface. So, for example, you can say stop on first error instead of just letting your compilation continue, which is the default way of Visual Studio to run. And another thing we did in Incredible, and it's free as part of the free room edition

Starting point is 00:51:41 for Visual Studio, for example, is we have, and that's something I didn't delve into because there are so many things that we are doing, I can't go into all the details. We just spoke about the main concept of the technology. But for Visual Studio, for example, we have predictive execution. So it allows us to actually utilize your own local cores much better than the default way it's being used usually.

Starting point is 00:52:08 So if you have some, yeah, so we know what's the real dependencies in your solution, and we know how to actually run it better. Even if you, for example, if you define some dependencies that are not needed, we'll be able to detect. Sometimes when, for example, if you have project two, and project two depends on project one, then usually the way that you'll see that

Starting point is 00:52:38 is you see the combinations of project one running, and then the link of project one, and then the combinations of project two, and then the link of project 1, and then the compilations of project 2, and then the link of project 2. But essentially most of the times, not always, but most of the time, only the link part of project 2 depends on the link part of project 1,

Starting point is 00:52:56 and not the compilations. So with IncrediBuild, our predictive execution will know that, and will be able to run in parallel the compilations of project one and project two and then the link of project one and the link of project two so we can actually increase your performance your build time performance even on your local infrastructure not only when you're doing distributive builds very cool so this is kind of a random question, but I seem to remember on a GCC mailing list a few years back,

Starting point is 00:53:27 someone talking about like a multiprocess link or a distributed linker. Does that exist or is that just kind of a fantasy? Because I remember the linking part. For example, if all your libraries are, let's say, static libraries that you're building, then the linking part is really what's going to kill you in terms of performance. Can you use IncrediBuild to speed up the link, or

Starting point is 00:53:51 is the link in Visual Studio still only on one process? It's not only Visual Studio. I didn't hear something that can break your linking into multiple processes. Yeah, I remember seeing someone just basically positing it. But yeah, I guess it never actually existed.

Starting point is 00:54:13 Yeah, I know that that's something always coming from the industry because links are bottlenecked because they can only run sequentially on a single core. And unless you have multiple links, with Incredible, for example, once we link a specific project, we can in parallel compile other projects that will need to be compiled later on. So this is something that allows us to, the link will not be a bottleneck.

Starting point is 00:54:38 But still, the link is a major bottleneck everywhere. And the way that Microsoft, for example, Visual Studio and others are addressing it is to try and minimize the latency, to maximize the performance and optimize the main process itself. So, for example, in Visual Studio recently, Microsoft introduced by default a flag

Starting point is 00:55:06 which is called FastLink which optimizes the link time. But that's the way they address this problem currently. I didn't hear of anything that can break the link time into multiple processes but we really want it to happen because then IncrediBit will

Starting point is 00:55:21 be able to distribute these multi-linking processes into additional machines and will actually even happen because then IncrediBit will be able to distribute these multi linking processes into additional machines and will actually even more increase the link time so once this will be available I will be the first to adopt it so yeah I actually I looked it up and it exists it's called gold so it's a gold only it's only for Linux maybe or maybe not or maybe not but uh but yeah if you use the gold linker then it's uh then it uh it runs in multiple cores so yeah potentially uh maybe this gold will work with incredibuild although it might it might have so many dependencies that

Starting point is 00:56:02 it might not be worth it in terms of you know of the file transfer might end up being the bottleneck. I heard that the file transfer can be a bottleneck, but not that much with today's networks. I know that there are users working with Gold with IncrediBuild in the Linux edition of IncrediBuild, but I never considered that. I never looked into that, so that's something I would do.

Starting point is 00:56:27 Thanks. Sure, yeah, it's pretty cool. Yeah, I know there's, I've seen companies where they build everything as a static library, and then they do this enormous link, and so, yeah, it must be, it's probably some combination of that and some other technology. So what about the Incredibuild as sort of a company? So you said that the company's been around for more than a decade. So kind of where is it located?

Starting point is 00:56:57 Are you hiring? What kind of positions are you hiring for? Do you do internships for people listening who are college students? That sort of stuff, like company-related stuff. Okay, so just one note before that about Gold Inc. in general.

Starting point is 00:57:15 One of the things, and it's really cool and incredible because it's a generic infrastructure. We actually don't even know a lot of times what our users are doing with our product. So I can just... That's true.

Starting point is 00:57:27 Yeah, yeah. And that's really cool because I'm speaking with the bank, for example, that can come and say, listen, we are doing this huge amount of financial derivatives. Or I can have a customer telling me that he actually... It was a few years ago, for example, I had a very, very, very large, one of the largest game studios telling me, listen, we are accelerating Maya, which is a commercial product with IncrediBuild.

Starting point is 00:57:52 And I said, wow, that's cool. I didn't know that IncrediBuild is accelerating Maya computations. And that's something related also to the gold linker. I'm sure that we have users that are using any kind of compiler, any kind of tool, because we are agnostic to the processes that you're running with us. So we don't need to actually do a specific integration with any kind of process.

Starting point is 00:58:16 And that's why we sometimes don't even know for all the users that IncrediBit is used for. I'll just give another example. I visited another very large company in Japan. They told me that they are doing a stress test in Incredibly. And I said, well, how do you do that? So stress test is where you want to stress the server. So in order to stress the server in the past, they needed to provision a lot of virtual machines on the fly,

Starting point is 00:58:46 copy the processes there, and make sure that they are running against the server exactly at the same time in order to stress the server with requests and processing things, etc. But they said, with Incredikit, we only have now a script file, and then every developer can run this script file, these

Starting point is 00:59:01 processes will be automatically distributed to a lot of machines, and they will connect to the server. So these are use cases that I actually learned from customers and not coming from us, which is very nice and very cool. Yeah, that's amazing. It's remarkable. So regarding your question, we are hiring everywhere. We are hiring software developers. If you're doing internal, if you're an internal C++ developer, you are into operating system internals, Windows. We are looking for professional services guys, QA interns.

Starting point is 00:59:49 It depends on what it is that you know how to do. Usually when we are working with interns, it's around deploying IncrediBuild and testing IncrediBuild with a variety of open source tools and doing some white papers and benchmarks and trying to integrate IncrediBuild with open sources because there are so many tools and software out there

Starting point is 01:00:14 that are doing multi-process execution. And we'd like to notify the market, listen, we can do that as well. We can do this kind of compression, this kind of encoding, and this kind of obfuscation, et cetera. And that's usually the things that we're doing with interns that are working with us. We are located in Tel Aviv, but we are working with people abroad as well.

Starting point is 01:00:38 We have an office in Japan. We are working with the U.S. So if you're located elsewhere and you want to work with us and you want to intern with us, just let us know. Cool. Cool. Good to know. So what is – tell us the coolest thing about working at IncrediBuild,

Starting point is 01:00:57 like either the office, something kind of really unique, like could be the location it could be yes I think I actually maybe I would eat but I actually like the our customers because I'm working with the largest customers doing the coolest products in the world today it can be the largest software is the commercial commercial software can be the most popular games. And I think that one of the most interesting things is the ecosystem IncrediBuild is working

Starting point is 01:01:31 inside. So it's gaming and continuous integration, public cloud and DevOps and Visual Studio and other IDs and financial derivatives. And there are so many things I need to learn all the time because Incredibility is used in such a large variety of problems that I will

Starting point is 01:01:52 never be able to cope with the entire ecosystem we are working in. And this environment of continuous learning and, you know, sticking with the customers and users and always learning new stuff, physics engines, you know other infrastructure new technologies this is something we always need to keep up with the pace because we are used inside this enormous industry to do practically anything inside this industry so that's i think one of the coolest things i i never stop learning here and this is something i really love to do. Cool. Very cool.

Starting point is 01:02:25 Makes sense. You should have a regression test that mines a Bitcoin. You already have solutions for that. You could finance the new office. We needed to start with that, I think, a few years ago. Yeah, right. Cool. Well, thanks so to start with that, I think, a few years ago, and then. Yeah, yeah, right, cool. Well, thanks so much for being on the show. This is super awesome, and so there's a free version, and there's also a student discount, so if you're a student, definitely check it out. I think it would be, you know, the ease of use is, I think, by far the most compelling part. I mean, you could, you know, in your dorm or in your lab,

Starting point is 01:03:08 you could install this on a bunch of machines and just kind of see what happens. Like any software you're developing right now, you could just, I mean, assuming it's multi-processed, you could just run it in this environment and just, you know, it's kind of wild just to see what would happen maybe it would save you from having to uh either wait a lot or or or parallelize it all by hand so yeah

Starting point is 01:03:33 project that you want to try and accelerate it within credit build in your in your university or etc uh just uh drop us a note and we will help Oh yeah, actually with that in mind, give us some ways to reach out, so what's the website, incredibuild.com or what's your website? Yeah, incredibuild.com and if you have something specific you'd like to ask or contact us

Starting point is 01:03:57 I think that the best way, if it's a technical question, you can just email support at incredibuild.com and we'll get back to you. Cool. Very cool. Are you on Twitter or Facebook or any of that? Everywhere.

Starting point is 01:04:12 All right. Everywhere. Okay. Okay. We'll get the Twitter handle and Facebook and all of that and we'll post it with the show notes. Great. Okay.

Starting point is 01:04:23 Cool. Thank you so much, much dory for your time and uh yeah if anyone has any questions uh yeah check out the show notes we'll put away for you to get in touch with dory and you can uh shoot them an email support that incredible.com great thanks a lot guys it was great fun the intro music is axo by Binar Pilot. Programming Throwdown is distributed under a Creative Commons Attribution Sharealike 2.0 license. You're free to share, copy, distribute, transmit the work, to remix, adapt the work,

Starting point is 01:05:00 but you must provide attribution to Patrick and I and sharealike in kind.

Programming Throwdown - Parallel Computing with Incredibuild

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.