CppCast - Semantic Merge

Starting point is 00:00:00 Episode 180 of CppCast with guest Pablo Santos recorded December 18th, 2018. Today's sponsor of CppCast is PVS Studio. PVS Studio is a tool for bug detection in the source code of programs written in C, C++, and C Sharp. PVS Studio team will also release a version that supports analysis of programs written in Java. And by JetBrains, maker of intelligent development tools to

Starting point is 00:00:25 simplify your challenging tasks and automate the routine ones. JetBrains is offering a 25% discount for an individual license on the C++ tool of your choice. CLion, ReSharper, C++, or AppCode. Use the coupon code JetBrains for CppCast during checkout

Starting point is 00:00:41 at JetBrains.com. In this episode, we discuss cross-platform mobile frameworks in SG20. Then we talk to Pablo Santos from Codis Software. Pablo talks to us about semantic merge, plastic SCM, and more. Welcome to episode 180 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? Good. How are you doing, Rob? Doing pretty good. 180. We're starting to get awful close to 200. We'll hit that in like two or three months into the new year, right?

Starting point is 00:01:56 Yeah, 20 weeks, something like that. Four or five months. Five months, yeah. But it won't take too long. Right. Yeah, pretty crazy yeah pretty crazy all ready for uh the holiday season i am yes i am ready to not be traveling for a time i just got back from hamburg um and i will be not traveling again until folkestone until c++ on by the way, if I might say, you're running out of time to buy tickets to my class there. So if you were thinking about it, go ahead and do it now. That's right. C++ on C, which is in February, right? It's in February, first week of February in Folkestone, England, and I'm giving a class

Starting point is 00:02:38 on constexpr. And it will apply to anyone who's using c++ 11 or forward c++ 11 constexpr is a little bit harder but we'll talk about what the limitations are sounds great at the top of your episode i'd like to read a piece of feedback uh two weeks ago um jens from meeting c++ put out this survey uh asking on what speed you listen listen to CppCast and the results of the survey he put out the options 0.5 which I thought was ludicrous and kind of forgot

Starting point is 00:03:14 that that was even an option that we have on the website to listen to the podcast at 0.5 speed 1x 1.25x or 1.5 or 2x and the vast majority of the poll answers said 1x, 1.25x, or 1.5 or 2x. And the vast majority of the poll answers said 1x speed, 62%. But a couple people did put down the 0.5, which really fascinates me. I would never imagine listening to a podcast at half speed. Well, it's kind of funny, because I don't know if

Starting point is 00:03:39 our listeners have noticed, but it has happened a couple of times that your recording has gotten out of sync a little bit and you start to sound kind of like you're speaking at half speed. I think I've always done a pretty good job of editing that. So I doubt they've ever heard that. Maybe one episode came out. I don't remember though. Next time it happens, I won't tell you so that it doesn't. But I've had plenty of people actually tell me, like students and whatever, like, you sound different in person. Oh, right, you're not talking as fast as I expect you to.

Starting point is 00:04:11 Because they watch all of the talks. Less than that 1.5. Yeah. CBPCast and my YouTube channel. Okay. Well, we'd love to hear your thoughts about the show as well. You can always reach out to us on Facebook, Twitter, or email us at feedback at cbcast.com. And don't forget to leave us a review on iTunes. Joining us today is Pablo Santos.

Starting point is 00:04:30 Prior to entering startup mode to launch Plastic SCM back in 2005, Pablo worked as an R&D engineer in fleet control software development and later digital television software stack. Then he moved to a project management position leading the evolution of an ERP software package for industrial companies. During these years, he became an expert in version control and software configuration management, working as a consultant and participating in several events as a speaker. Pablo founded CodeEyes Software in 2005, and since then is focused on his role as chief engineering, designing and developing plastic SCM and semantic merge, among other SCM products. Pablo, welcome to the show.

Starting point is 00:05:07 Hello, thank you for hosting me. I'm curious about this fleet management that you were working on. Can you tell us something about that project? Well, it was actually a C++ project, and it was my first job once I got out of the university. And it was, you know, a piece of software to actually show information about the buses going through the city. So in every bus stop, you have like, you know, information about the next one and so on. And then specifically the part I was working on was the the control center for that like the the entire software that the the the guys had in the control center to actually

Starting point is 00:05:53 monitor if there were delays or something was wrong or something like that the gps at the time it was around 2001 uh we're not i mean not that we're not that precise but probably they were not so available that they are today. So we had to do a lot of corrections in things like, you know, interpolating the position, stuff like that, because you couldn't trust the device all the time. So it had a mix of things like from, you know, from socket level sending messages to actually low level stuff

Starting point is 00:06:21 to DirectX rendering things and graphics, which was my favorite part, by the way. And so, yeah, a lot of things. And my first experience with version control, too. That sounds pretty awesome, actually. I don't know if I've ever been to a city in America that had bus stops that well-organized that said what the next bus was coming

Starting point is 00:06:44 and how far away it was. Have you ever seen that, Rob? I've seen it in Germany. Yeah, that's pretty neat. I never really thought about what's behind that to actually tell you the real truth of what's happening. Yeah, it was something, you know, today here in Europe, I will say it's sort of everywhere. At that time when I joined was new, sort of new. There were just a few cities adopting that. But yeah, now it's something like you sort of expect, right? You get to a bus stop and then you get this digital information. Years ago when I was a kid, it was just something printed down there.

Starting point is 00:07:20 So you have to make your calculation when the next one is going to come or something. But yeah, that's good. And I think right now they even have an app. I don't use it that often anymore, but I think they even have an app. And the nice thing is they can even reuse the whole thing for other uses, like police and stuff like that, where the cars are and things like that, all for control centers and so on. So yeah, that was pretty exciting.

Starting point is 00:07:44 Wow. Yeah. We're going to talk more about source control in a minute, but can you give us a preview and tell us maybe what source control you used then? Of course. My first version control then was source safe, nothing super fancy. Oh, yes.

Starting point is 00:08:01 That was probably the motivation why I started thinking I have to create something better. To be fair, yes. That was probably the motivation why I started thinking I have to create something better. To be fair, yeah. I mean, SourceSafe was kind of terrible. I don't think it's maintained by Microsoft anymore. But if you had a team of like two people, it was a cheap and easy way to set things up. But yeah, I mean, you know, it's better than nothing. By that time, I mean, you know, it's better than nothing. By that time, I mean, right now, every new student,

Starting point is 00:08:28 they learn Git or, well, not SuperSafe anymore, probably Git at the university or college and so on. So they get this understanding. But 20 years ago, you didn't have that information where you were at the university. So you saw SourceFace was kind of cool and well i i started to hate it when i started using it but now in my memories is kind of cool because you know uh as bad as it was it created a lot of conversation after that so it's really i mean you

Starting point is 00:09:00 you meet any developer around the world and you can spend quite a good time talking about how bad it was. So it's kind of social, right? Yeah, and as you said, it's better than nothing, at least to give you an introduction to not copy-paste file management. Absolutely, absolutely. This nightmare of this is the good one, dot zip, good one, really good, dot zip. This is the one, the final, you know, yeah, yeah, yeah.

Starting point is 00:09:30 Oh, yeah, yeah, yeah. That's terrible. Okay, so Pablo, we've got a couple news articles to discuss. Feel free to comment on any of these, and then we'll start talking more about SEM and merge control tools and the other stuff that you work on okay okay perfect okay so this first one is uh boden cross-platform framework and there's a preview release that they have up on github and it looks pretty cool it's it's made

Starting point is 00:09:59 for android and ios applications if you want to make an Android or iOS app and do so in completely, you know, native C++ without having to do any Objective-C or Java, then this is a good solution for you. I think what makes it different from Qt controls is that where Qt, it's like you're rendering everything in Qt's own widgets. This is kind of creating a wrapper, but then doing the actual rendering of a button with the actual iOS button control. Yeah. Android button control. It's very much in that vein like WX widgets, which I don't know if we've ever discussed that on this program. Have we?

Starting point is 00:10:44 Definitely not in any depth. Okay. But yeah, WX is the same idea. It's always native controls, which has its limitations because you're limited to what is the union of all the things that are possible with native controls on those platforms. Is WX widgets a, you know, does that do mobile or is it just like a desktop framework?

Starting point is 00:11:06 I'm going to have to look to see if it was ever updated to add mobile, because mobile wasn't a thing when I last used WXWidgets. So it might be kind of like a WXWidgets for mobile. Yeah, it might be. Okay, well, just a small comment. I'm more on C Sharp these days than C++, but I find this news really exciting because this is one of the things we really miss

Starting point is 00:11:33 on the C Sharp landscape. There are many options, but not really a single one for all platforms, so like a QT for everything, right? And if this one can make the cut, probably we can have nice wrappers too. So kind of excited about it. Well, and since you're talking about C Sharp, I mean, this is also kind of similar

Starting point is 00:11:53 to what Xamarin.Forms does, I think, right? Yeah, exactly. But the thing is that they are bigger on the mobile side of things, not that much on the desktop side of things. I'm personally more interested on desktop because of the kind of tools we develop. Right.

Starting point is 00:12:11 But they are really focused on the... I mean, they can also work on desktop, but it's not the thing they do, right? It's like a secondary thing. Right. Jason, did you find out whether WX Wid widgets has a mobile support or not i see no mobile operating systems listed yeah okay so the next article we have is um this is from chris tabella and it's sg20 education and recommended videos for teaching c++ and i

Starting point is 00:12:40 think we probably talked a little bit about this new study group when we were talking to Ashley and JF about the San Diego ISO C++ meetings. And this new group is just going to be focusing on education. So Christabella and JC Van Winkle are the ones who are heading the group. And I thought this was a really interesting article talking about kind of how this group got started. And he also has a list of, um, several recorded talks that I thought was a really nice curated list of good,

Starting point is 00:13:15 uh, CPP con and, you know, meeting C++ and et cetera, talks that are, uh, all worth watching. Yeah.

Starting point is 00:13:22 There's definitely a lot in there. Yeah. And several of them I have not watched and should probably go back and watch at some point. Yeah, there's always plenty of talks I need to catch up on. I'll have to look through these. Do you ever have any interaction with the ISO C++ committee, Pablo?

Starting point is 00:13:42 Not really. I try to follow where they are going and so on and I really miss this part of having a roadmap of what they are going to do and so on. I think that's

Starting point is 00:13:57 something we missed for a long time, right? Yeah, there's certainly a lot more forthcoming with the direction of the committee and everything these days it seems okay and then the last thing we have to discuss is uh sable's now call for submissions is now live uh the conference this year or next year is going to be may 5th to may 10th and what is the deadline to get your submission in? It's relatively soon, I think. Yes, January 23rd.

Starting point is 00:14:27 January 23rd, and then proposal decisions will be sent out February 25th. And the program goes online in March. Are you working on your talk submission yet, Jason? Nope. You planning on one, though? Let's say probably. Okay. And I say probably because there's a lot of conferences this year

Starting point is 00:14:48 right yeah and i am hoping to go to core c++ which is in tel aviv the week after this right right so that complicates my schedule and so no firm decisions from my part have been made i'll probably submit something and see what happens, what gets accepted, whatever. And C++ now is always easy for me to get to because I just have to drive across the Rocky Mountains. Much easier than it would have been 150 years ago, anyhow. Okay, so Pablo, to start off, why don't you tell us a little bit about Semantic Merge? Okay. So, well, as you said at the beginning, we started developing our own version control system. Like, you know, we wanted to come up with a product.

Starting point is 00:15:41 And soon after, we had to develop one of the pieces that you need to provide in a version control, which is a merge tool, right? You get two branches you you want to merge them you go to the file level and then you have to get you know like the two contributors and a common ancestor and merge them together and all that right and then we started thinking you know is there a better way to do this uh we we developed the entire algorithm for the text base d for merge so that's something we we we did and we continue developing and improving over the years. But then we said, okay, what every single programmer you talk to has in mind is something like,

Starting point is 00:16:15 okay, why your diff and merge tool doesn't understand the code? Why it works on a text-based instead of on a method-based based or function based and that's exactly what i want you to do right at the very beginning and i remember drawing the first uh you know blueprints or the first ideas i will say uh uh really soon like really soon after just founding the company and starting up but it had to to pass a lot of times since we really had the basics and enough technology and enough of the different things you need for the core to actually start thinking on it.

Starting point is 00:16:51 And then it was around 2012 or something like that when we really started working on it. And basically what we did is, okay, let's parse the code and then let's calculate the diff and the merge based on the code structure, not the actual positions of lines and text blocks and so on, right? And that's

Starting point is 00:17:11 basically it, right? I mean, once you describe that to a developer, he immediately imagines what the thing is about, right? If you go and move a method to a different location, well, a function to a different location and then someone a function to a different location, and then someone else modifies it on the original location,

Starting point is 00:17:30 and then you merge it, you get the move and the change is put into the right location, right? And that's the magic of the semantic thing, right? That's basically one of the things. At the end of the day, we rerun a lot of merges. We have something called a replay, and we replay public repos on GitHub and try to repeat all the merges to figure out how we can do better and so on.

Starting point is 00:17:51 And at the end of the day, even if some teams try to avoid refactoring, because for them it's not a good idea to actually be refactoring code while it's life evolving, let's say, right? While it's still evolving. They prefer to do a freeze or something like that. But they try to avoid it, but even though's say, right? Well, it's still evolving. They prefer to do like a freeze or something like that. But they try to avoid it.

Starting point is 00:18:06 But even though it happens, right? And a lot of, there's a good percentage of conflicts that we consider semantic because they involve moving pieces of code. And that's a little bit what semantic is about, right? about right so uh what languages uh does semantic uh merge work with if i'm because i'm thinking about the fact that you have to parse these absolutely the the downside of this approach i mean and we support both right we we have tools to actually and in fact semantic provides a regular text based if tool our merge tool right Like KD3 or any other. So it does that.

Starting point is 00:18:47 I mean, it does this basic. But then if it has support for the language, it does more because then it understands structure and so on. We can even plug external parsers and all that. But to your question, well, surprisingly, we support C++. And we also support C. We support Java, C Sharp.

Starting point is 00:19:09 And we also have some external parsers. I mean, not developed by us, but actually by the community for Delphi, the object Pascal from Portland. Well, yeah. And also we have something for Python and a little bit of, we have an experimental one for JavaScript and things like that, right? I mean, it's not difficult to add more languages.

Starting point is 00:19:36 In fact, we are working on Swift at this point. So it's not difficult to add more languages. And most of the time we're based on a standard parse. I mean, for instance, I can tell you a little bit of the story of... I mean, for C Sharp, it's very easy because we have this Roslyn project by Microsoft where they provide the entire compiler infrastructure and parsing and everything.

Starting point is 00:19:56 So it's very simple to get the AST. So that's sort of simple. For C, we use LeafClang. I don't know if my presentation is correct for that that c lang or everyone says it differently yeah okay okay okay so that's fine and that's the strategy we tried for c++ at the beginning but it well you know c++ is by far the most complex of them all of all of them so it was kind of, should we go that direction? It's going to be crazy. And it was a little bit crazy

Starting point is 00:20:28 with leave C lang because you need to apply it or at least understand it was that you need to give it all the dependent, well, not dependencies, all the includes. So it made it like a little bit of a nightmare because you're just merging a file so you get the other two

Starting point is 00:20:44 files and you don't want to configure your merge tool to know or your include paths and stuff like that but at the end of the day we got a very nice uh parser from the folks at the eclipse cdt project right they have a they have a library to actually parse c++, which is what they use for their IDE in the Eclipse project, and we use that. So actually we parse C++ using Java, which is probably like a little bit of a sin, but it works pretty well because it's sort of very independent, right?

Starting point is 00:21:18 It's not like you have to feed it. If you need to ask the user to actually configure his merge tool almost like a makefile, he's not going to feed it. If you need to ask the user to actually configure his merge tool almost like a make file, he's not going to use it. He's going to be like, okay, you're crazy. So now you need zero configurations. Like, okay, it's there. You just use it and it goes and it's fine. At the end of the day, we don't need super strict parsing either.

Starting point is 00:21:42 It's more like, okay, where are the functions? Where does it start? Where does it start? Where does this end? If this, you know, it's all what we need. We could even probably have gone the, you know, pairing brackets and stuff like that. But, you know, it's never that simple with C++. So I'm just trying to imagine how all this comes together.

Starting point is 00:22:03 You've got a parser that builds some sort of AST or semantic information about the code. And then do you actually diff then the ASTs effectively? Exactly. That's the trick, right? Since we develop our own version control system, Plastic SCM actually, which is like an alternative to Git. So since we develop all that and all the algorithms inside, we actually reuse part of the code we have to actually merge directories

Starting point is 00:22:32 and the elements of directories, which are the, you know, you have trees there. So we sort of reuse the same, actually not the same code, but the same know-how and the same, know and some of the tests too to be honest we reuse that to actually put it into the file so it was like okay you are merging at the directory level and then you're zooming into the file level but you split it into components so you have a namespace and then inside the namespace you have a class and then you have functions and well depending on the language there's a different construction If you are doing Java, it's a little bit different than C Sharp or C++, but at the end of the day, it's more or less the same, right?

Starting point is 00:23:10 What we don't do, I mean, we stop at the method level. So if you move a method to a different location, we need to identify whether it's the same method. So basically, we parse the ASTs, we div the ASTs, and then we need to figure out if something you move and rename is still the same thing. And there we use some similarity algorithms to actually find if the text blocks inside are similar enough. Something that is worth saying is that in order to cut complexity, because the problem can get super complex, right?

Starting point is 00:23:53 What we did was, okay, we parse just up to the declaration level. So inside the function itself, we don't parse. I mean, the parsers actually do its job, but we don't use that, right? It's like, I know there is a function and then you have 50 lines. Okay, these 50 lines are text for me. I know it was moved to a different location. I know it was renamed. But then I just, I mean, we don't merge at the level of ifs and else and stuff like that, right? That's what I mean.

Starting point is 00:24:13 It's more at the method level. I mean, at the end of the day, you get it merged, of course, but we use regular text for that because otherwise the complexity explodes, you know? Right. It's just unsolvable thing, right? So, yes, we create the ASTs, we take the ASTs, we div them, we get two pairs of differences, basically how it was with your version,

Starting point is 00:24:35 how it was plus your version, and then that's what we use to calculate the actual merge. And the super nice thing is that it's amazing the number of manual conflicts that are a nightmare to solve that become even automatic, right? Like you move a method to a different location, then you change it, you try to merge that with a regular merge tool

Starting point is 00:24:58 and you get the old method that has nothing to do that is now in the top of the file or something, try to be matched with another method that has nothing to do, that is now in the top of the function or of the file or something, try to be matched with another method that has nothing to do with it and you go crazy. Then as soon as you go in semantic mode, okay, it knows the same method. It doesn't matter what it is, right? It just knows how to merge it.

Starting point is 00:25:17 And the super nice thing, well, I get super excited about it. So I can talk for hours, but don't worry, I'll stop. So the super nice thing about this is that we can even do it across files, which is not, of course, something you do on a daily basis, but when you do it, it's really good. Like, you move a function to a different file

Starting point is 00:25:35 because you're cleaning up code and you're refactoring stuff and you just do that. While someone else was modifying the function on the original location, we can merge that and diff it too, right? We can diff and merge that thing. And that's pretty amazing. I mean, I have this vision. My view is that sooner or later,

Starting point is 00:25:54 all diffs and merge tools are going to do this. I don't know if it's going to be with our technology. I hope it is. But I think it's going to happen because there's no good reason not to do it this way. I mean, it's like, it's simply, it's better than regular diffs on a daily basis, right? Of course, you can tell me, okay, you don't do that all day. I mean, you don't do that every single day. Of course, you don't.

Starting point is 00:26:17 But when you do it, and it's more often than you think, it's super useful. Wow. Sounds pretty cool. Yeah, it does. One question I had was since you're doing all this AST parsing and everything, is there any performance concern if you have like a multi-million line code base or is it

Starting point is 00:26:36 not doing enough parsing that it doesn't become much of a concern? Okay. Suppose you parse duplex, sorry, not parse, duplex semantic to your Git, which is something you can do, you plug, sorry, not parse, you plug semantic to your Git, right? Which is something you can do. You can just say, okay, Git, use this tool as my merge tool, that's all.

Starting point is 00:26:52 So you run git merge tool and it runs semantic instead of, I don't know, KD3 or whatever. Okay, when that happens, it will only parse the files in conflict. So if some files are not in conflict, then it won't be parsed at all because it's Git driving. I mean, if Git already knows how to solve it, then there's no worries in there.

Starting point is 00:27:11 The second thing is about the multi-file. When the multi-file is in place, and that's something, well, we developed, we reused some of the technology we have in Plastic SCN, which is our version control, Plasmatic to create a free, well, it's free, it's still free. We don't know what is going to be commercial anytime soon.

Starting point is 00:27:32 A free Git client, which is called Gmaster. You can go to gmaster.io and download it. And it puts together parts of the two products. And this is the one doing the multi-file semantic merge. And when that happens, we have to interfere a little bit with Git in the sense that it's not the one actually driving the merge process, but we are the ones doing that because we can find conflicts in parts where Git doesn't.

Starting point is 00:28:01 For instance, the typical thing of move code is you add a file and then you have some code move from the original file into this new one. When you merge that, Git is not going to find any conflict in the added file because it's new, right? There can be a conflict in there. But we can find that there was code move from the original file into the second one. And then we expand the scope of the conflict, let's say. So Git

Starting point is 00:28:25 will say, okay, foo.c has a conflict, but we know that foo.c is involved in a refactor group with var.c because of that. I have some well, it's slightly more difficult to explain just talking,

Starting point is 00:28:42 but I have a few graphics that we can share later on at BlogPost where they can find a very neat explanation in just a few paragraphs, right? Of how it works. So that's a little bit about it, right? It's all about

Starting point is 00:28:56 the main motivation is okay, let's make it simpler to actually find and solve conflicts. I mean, merges has been really feared by developers for, I don't know, I would say generations, but maybe it's a little bit too much. But that's, yeah, but that's, you know, many people say, okay, I get a merge conflict.

Starting point is 00:29:21 It's like, oh my God. And we just try to make it something much simpler and much more powerful. That's what it is. So you already mentioned that it works with Git. Does it work with any source control system? Yeah, you can plug semantic into... It's just a standard diff and merge tool

Starting point is 00:29:43 because you can also use it to diff, right? It's not only merging because, in fact, you are going to use the diff part. We call it semantic diff, not very original. But you're going to use it much more often than you use actually the merge because probably you merge, I don't know, a few times a week. Okay, it depends on your role. Maybe some people say, okay, I do that on a daily basis. But anyway, you diff very often

Starting point is 00:30:08 because you want to see your own changes, how you modify that. Okay, I'm on a good track. How I compare this to what other colleague did or something like that. And then you can also use the semantic diff in there, right? So yeah, it's something you use on a daily basis. And answering your question, yes, you can plug it to any version control system.

Starting point is 00:30:27 In fact, you can plug it to our own Plastic SCM. Of course, you can do it with Git. You can do it with Perforce. No reason not to do it with Subversion. So, yeah, I mean, it's just, I mean, the same way you invoke KDE 3 or P4 Merge or R-Axis or our axis or beyond compare you can just plug semantic in there what operating systems do you support well uh we have actually we're supporting windows we have version one on windows mac and linux but we have less demand less less interest i will say with uh mac and linux So we delayed version two.

Starting point is 00:31:06 So we are more on the windows of things. Although as I speak, I can tell you because we are going to be sharing some screenshots this week about the upcoming versions for Mac and Linux. The thing is that the first version, version one of semantic was sort of alien in terms of UI, right? So it was able to actually solve all the semantic conflicts and so on, but it didn't look like any other three-way merge tool. So in version two, we said, okay, let's simply add the semantic

Starting point is 00:31:39 power to a more natural tool where you see the three panels with the three contributors and so on. That's it, right? It's mostly like a regular tool, just decorated with semantic information. And that's what version two is, but it's available on Windows only at this point. Okay. I wanted to interrupt the discussion for just a moment

Starting point is 00:32:01 to bring you a word from our sponsors. PVS Studio Analyzer detects a wide range of bugs. This is possible thanks to the combination of various techniques, such as data flow analysis, symbolic execution, method annotations, and pattern-based matching analysis. PVS Studio team invites listeners to get acquainted with the article Technologies Used in the PVS Studio Code Analyzer for Finding Bugs and Potential Vulnerabilities,

Starting point is 00:32:24 a link to which will be given in the podcast description. The article describes the analyzer's internal design principles and reveals the magic that allows detecting some types of bugs. So you've mentioned it a couple times. Do you want to tell us a little bit more about Plastic SCM, this other source control that your company makes? Okay. In fact, Plastic was the main motivation to start the company makes? Okay. In fact, plastic was the main motivation to start the company back in 2005.

Starting point is 00:32:48 This is the main product we developed, or main source of revenue, I will say. Okay, we started it with the intention to have something better than the available

Starting point is 00:33:03 subversions of the world, andS and source saves, as we said. And it was by the time when Git also started. So we started in 2005 and a few months after really starting the company, Git was out in its infancy. And well, we continue evolving through that, right? We have a very wide range of users from indie teams doing games with just two, three people to some corporations with 3,000 developers using Plastic on a few projects.

Starting point is 00:33:38 And we even have some super huge project with 1,000 developers in the same code base, which is kind of crazy in terms of merging and branching and all that. And we built it around a few key concepts, right? It's basically super strong with branching. It's very, very simple to do branching, super fast in plastic. It's very good with merging. I mean, Git is super good with merging

Starting point is 00:34:06 but we can do even a few more cases and even making it a little bit easier than what they do which is not easy because git is good but we try to be even better in that sense so that's a real challenge and then it's very good with big files and super

Starting point is 00:34:22 big repos like we have customers with repositories in the four terabyte five terabyte range especially in video games that's one of the things we excited that's one of the things we we had to i mean we wanted to do by design from day one and that's one of the things that separates us from other systems get in particular then we can also do something quite interesting which is plastic is fully. Then we can also do something quite interesting, which is Plastic is fully distributed, but it can also be fully centralized

Starting point is 00:34:50 and every combination in between. What I mean by that, you can work with local repos like Google doing Git, but you can create a new workspace, working copy, and work directly with a server like Google doing Subversion. What we mean by this is that in a lot of teams,

Starting point is 00:35:05 you have really different profiles of people. So some of them are very comfortable with the distributed way of working, but some of them really prefer to go into the checking or commit mode directly to the central servers, and that's it. Especially, you know, in games, we are very big in games, and lots of artists,

Starting point is 00:35:26 not super techie developers. I mean, people that are not normally full-time developers, let's say, or just artists and stuff like that, they really prefer to just check in and done. And we also implement locking. It's optional. I mean, we prefer merging.

Starting point is 00:35:40 I mean, merging is our life, as I said. So locking is kind of a second thing to do. But there are certain assets, certain files that you cannot merge, and then locking is good with that. So these are sort of the key things. Of course, there's something I always forget to say, because mostly I take it for granted, but we develop the entire stack. I mean, it's not only the core, the server, the command command line we develop all the merge tools as i said at the beginning plus div tools plus gui's in mac linux and windows plus you know it's sort of a complete package you download it and you have all the pieces it's not like okay i get

Starting point is 00:36:17 the command line here but then i need a gui for some other provider and then i need a like a cloud service from someone else we just provide the whole thing which is a lot of work but it's fun right you mentioned that you have very good support for large binary assets and then i started wondering and i don't believe you address this is is it possible for your uh semantic diffing semantic merge tool to work with binary files or do you have plans to say like, well, these two P and Gs were both changed by these two different people. Let's look at the actual images here.

Starting point is 00:36:51 Okay, one thing is a plastic SCM version control and it can handle really huge files. I mean, release after release and we do like a version, three versions every week. Not that we deploy all of them to everyone, but it's like, okay, they are available to download. Each of them goes through a testing phase where we always check in a one gigabyte file, right?

Starting point is 00:37:13 So size limit is not an issue. But that's one thing, actually, what the core can handle. And then you get the diff and merge. Can you merge a PNG file? No, with the standard diff and merge tools. You will need to have like a special diff for merge. We have a special diff for images. So you put two images together

Starting point is 00:37:34 and then it can show you side by side. It can actually calculate the difference, which is sometimes is very nice, but some of it you don't understand anything because like kind of diff in pixels so it's like crazy but we can also do like a sliding image like you put one over the other and then you have a slider and you move it and you um you reveal one side or the other so if if the images are completely different then you don't really realize what's going on but if it's the same i mean the same concept or the same image

Starting point is 00:38:06 with a small modification, you can do it. So for diff, what we have is, okay, we have diff for image, and then we have a way to plug custom diffs. For instance, like if... And again, it's not that we only work for games. Of course, we work for a lot of industries too, but in games, they normally have custom packages where they put textures and 3D models together, packaged somehow.

Starting point is 00:38:31 So many studios develop their own diff thing, which takes one of these files, unpacks it, and shows you what was different. But to answer your original question, no, we don't have a merge tool for PNGs or something like that. It's something that we've done something like that for Unity projects, for instance. We have our own merge tool based on the same core,

Starting point is 00:38:55 but of course handling the actual content in a different way for what they call a scene files, which are some text-based representation of stuff, but not for i mean we cannot do magic and merge and x files or something like that right no that's that's not yet there okay okay i i had another question just thinking about all your tools here and your and your nice diffing and your source control system and And I was recently involved in a project where I was the person tasked with applying Clang format

Starting point is 00:39:30 across the entire code base. And part of this resulted in a lot of whitespace diffs because we went from tabs to spaces. Wow. So basically every single line in the entire repo changed. Now, I was able to resolve a lot of potential conflicts by putting Git in all kinds of ignore whitespace change mode. And I got something usable

Starting point is 00:39:55 and then was able to pass off instructions to people who still had to do their own merging. And just out of curiosity, is this something that your tools would handle differently or would we be in the same same kind of that's one of the things where semantic merge will help right when you actually changing just the indentation or tabs by spaces tab like that is not like a semantic issue at all because i mean not a semantic merge at all because the structure of the code doesn't really change okay but it will help because basically knows uh how to reduce that i mean it understands that the only thing that

Starting point is 00:40:33 changed was a space or a dap or something like that and it will react accordingly in some languages we even have the option i mean we have it for c I have to check if we have it for C++ at this point, but we have a way that when it's going to merge a method, we have the option to reformat it before merging, which means... Oh, okay. I mean, if you are... I mean, the thing is that

Starting point is 00:41:01 suppose you simply split a line in three lines, like you have a call to a function and you split it in three lines to make the line not that wide, right? To make it thinner or something like that. And then someone else added something in the middle or something like that. So you can have a really, well, a conflict in there. But if you reformat the code to a given standard or something you can get rid of some of these reformatting things and then you merge it and reformat again that's something we provide

Starting point is 00:41:33 for some of the languages so sometimes you can solve some conflicts that were simply not able to be automatic right so that's one of the things we do. And for instance, some of the things, not necessarily your case, but suppose, a typical case where you add an include at the beginning of the file, and then five lines later, you add someone that's added the same include

Starting point is 00:41:58 on a different branch. Semantic is smart enough. I mean, it's not AI or anything like that, right? But it's smart enough to actually know there's the same include at it twice, so it will only keep one. Automatically, it won't even ask you. Otherwise, you will get the two.

Starting point is 00:42:14 Like, okay, you add, I mean, stupid thing, but you add, I don't know, whatever. The same way in first line and then in line 10 of the file because it's like 10 lines of includes, right? Well, you just solve it automatically like 10 lines of includes, right? Well, you just solve it automatically. And that's interesting, right? That's one of the things it does.

Starting point is 00:42:34 There are a few others like, for instance, it can warm you if there are conflicts in the same function even if they don't collide. So you have an option to say, okay, warm me about that. Suppose you have a function with 10 lines. You modify something in line 1, I modify something in line 10. Any merge tool is going to merge that automatically because there's a conflict, it's not the same line. We have an option to say, okay, even if that is not in the same line,

Starting point is 00:42:57 since it's in the same method or the same function, let me know because maybe I want to take a look, because maybe I'm changing the logic in some way that I like to review manually or something. Okay, we can do this kind of stuff because you understand it's in the same context, in the same, you know, it's in the same function. If you are not parsing functions, you don't know,

Starting point is 00:43:18 you only know, I mean, when you work in text mode, it's the same to merge, it's the same for you to merge a love letter than a C++ file because you don't know what's in there, right? So, basically. Right. Okay, now I'm totally going to be going out here on a limb here from... But I'm thinking about all the capabilities you've

Starting point is 00:43:35 mentioned, and I would like a tool that if I go to make, you know, there's a conflict, like, or, you know, go to make a merge, you said, if two people edited the same function, you could warn on that. Is it possible with your semantic abilities to say, well, programmer A actually did a copy and paste of this code

Starting point is 00:43:55 into another chunk where really like that's like some sort of like smart copy paste detection on merge. That would be awesome. We don't do it right now, but it's not the first time I am asked for this. Okay, cool. And it wouldn't be that difficult because, in fact, we're doing that

Starting point is 00:44:17 when we div the code. We are already matching similar text blocks, right? We already do that. In fact, before launching Semantic, we had something that we still have called Xdiff and Xmerge and the X stands for cross or move, right? So we have move detection for test blocks even if we don't parse the language. So Semantic can do that. Suppose you launch Semantic with a file you don't parse. I mean, I don't know. Rust, okay?

Starting point is 00:44:51 We don't have a parser for Rust. So you start it with Rust. You move a block of code. Our system is able to figure out that this text block was moved because it knows it has a similar... Okay, if what you move is exactly the same, you just move to a different location, it's able to diff it and merge it correctly and apply

Starting point is 00:45:09 exactly the same thing as I said with functions but to test blocks. If you make small modifications after you move the block, it's still able to find it's the same up to a certain similar... Similarities. Similarities.

Starting point is 00:45:25 Okay. Similarities. Okay, this percentage, right? So it's able to do that. So there's nothing preventing us to actually say, okay, it's the same thing up to some percentage or something because it's something we do, right? I mean, it's already doing that. The diff is already using that for the calculation. The thing is that we are not providing the info per se,

Starting point is 00:45:47 like, okay, saying, hey, you just copied the file or you just copied the method from this other place and you shouldn't be able to do that or anything. Okay. Okay. Okay. That sounds really powerful. Yeah.

Starting point is 00:46:03 Is there anything else you wanted to go over that we haven't brought up yet with either Plastic SCM or Semantic Merge? Well, I think basically we covered all the essential concepts in here. The only thing I will say is that, well, if anyone wants to give it a try, they can go to semanticmerch.com and they have a free trial for 30 days. So they just can download it and use it and see if it's really as good as I said and check it for themselves. And if they want to use or take a glance of,

Starting point is 00:46:40 get a better understanding of the entire technology we developed, including this multi-file semantic thing and so on, they can go to gmaster.io, and then they have a free Git client which comes with all the merge technology and so on. And of course, if they want to switch to a different version control, we have Plastic SCM, which is our main product and the thing that makes our hearts beat every day.

Starting point is 00:47:06 So that's the big thing for us. These are all, well, the Git merge tool, sorry, Gmaster. Gmaster, you said, is free. The other ones are commercial. Do you have options for open source or students or that kind of thing? Absolutely, and thank you for saying that because I always forget it. Yeah, it's free for open source or students or that kind of thing? Absolutely. And thank you for saying that because I always forget it. Yeah, it's free for open source. All of them are free for open source.

Starting point is 00:47:31 And we have discounts for institutions. And we have free versions for individuals. So any individual can use it for free. No issues in there. Just have to, you know, in some of the sites, they can directly register and request the license and so on. Otherwise they can always contact us.

Starting point is 00:47:47 So for individuals, hobbies and stuff like that, open source projects, it's completely free. And then we have discounts for educational purposes and all that. So, yeah, we are pretty flexible in there,

Starting point is 00:47:58 in there, in that, right. I mean, we are a very small team, very small company, and we are pretty friendly with all the licensing stuff. Awesome. Well, it's been great having you on the show today, Pablo.

Starting point is 00:48:10 Thank you very much. Thank you very much for inviting me. It was a pleasure. And Rob, if you may, before we leave, there is a news item we forgot to go over. Oh, what's that? The C++ on C call for volunteers and students submissions is currently up, and those are closing on January 1st. So we'll get this news out just barely in time. Okay, well, I'll put that link in the show notes as well. Okay, thanks again, Pablo. Thanks. Thank you.

Starting point is 00:48:43 Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W.ving and Jason at left to kiss on Twitter.

Starting point is 00:49:08 We'd also like to thank all our patrons who helped support the show through Patreon. If you'd like to support us on Patreon, you can do so at patreon.com slash CPP cast. And of course you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode was provided by podcastthemes.com.

Your Ad Here

CppCast - Semantic Merge

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.