CppCast - Semantic Merge
Episode Date: December 21, 2018Rob and Jason are joined by Pablo Santos from Codice Software to discuss Semantic Merge, Plastic SCM and more. Prior to entering start-up mode to launch Plastic SCM back in 2005, Pablo worked ...as R&D engineer in fleet control software development (GMV, Spain) and later digital television software stack (Sony, Belgium). Then he moved to a project management position (GCC, Spain) leading the evolution of an ERP software package for industrial companies. During these years he became an expert in version control and software configuration management working as a consultant and participating in several events as a speaker. Pablo founded Codice Software in 2005 and since then is focused on his role as chief engineer designing and developing Plastic SCM and SemanticMerge among other SCM products. News Boden Cross-platform Framework SG20 Education and Recommend Videos for Teaching C++ C++ Now Call for Submissions C++ on Sea Volunteer and Student Programmes Pablo Santos Pablo Santos Links Semantic Merge Plastic SCM gmaster Plastic SCM Blog Sponsors Download PVS-Studio Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities JetBrains Hosts @robwirving @lefticus
Transcript
Discussion (0)
Episode 180 of CppCast with guest Pablo Santos
recorded December 18th, 2018.
Today's sponsor of CppCast is PVS Studio.
PVS Studio is a tool for bug detection
in the source code of programs written in C, C++, and C Sharp.
PVS Studio team will also release a version
that supports analysis of programs written in Java.
And by JetBrains, maker of intelligent development tools to
simplify your challenging tasks and
automate the routine ones. JetBrains
is offering a 25% discount for
an individual license on the C++ tool
of your choice. CLion,
ReSharper, C++, or AppCode.
Use the coupon code
JetBrains for CppCast during checkout
at JetBrains.com.
In this episode, we discuss cross-platform mobile frameworks in SG20.
Then we talk to Pablo Santos from Codis Software.
Pablo talks to us about semantic merge, plastic SCM, and more. Welcome to episode 180 of CppCast, the first podcast for C++ developers by C++ developers.
I'm your host, Rob Irving, joined by my co-host, Jason Turner.
Jason, how are you doing today? Good. How are you doing, Rob?
Doing pretty good. 180. We're starting to get awful close to
200. We'll hit that in like two or three months into the new year, right?
Yeah, 20 weeks, something like that. Four or five months.
Five months, yeah. But it won't take too long.
Right. Yeah, pretty crazy yeah pretty crazy all ready for uh the holiday
season i am yes i am ready to not be traveling for a time i just got back from hamburg um
and i will be not traveling again until folkestone until c++ on by the way, if I might say, you're running out of
time to buy tickets to my class there. So if you were thinking about it, go ahead and do it now.
That's right. C++ on C, which is in February, right?
It's in February, first week of February in Folkestone, England, and I'm giving a class
on constexpr. And it will apply to anyone who's using c++ 11 or forward c++ 11 constexpr is a little bit harder
but we'll talk about what the limitations are sounds great at the top of your episode i'd like
to read a piece of feedback uh two weeks ago um jens from meeting c++ put out this survey
uh asking on what speed you listen listen to CppCast and
the results of the survey
he put out the options
0.5 which I thought was ludicrous
and kind of forgot
that that was even an option that we have on the website
to listen to the podcast at 0.5 speed
1x
1.25x
or 1.5 or 2x
and the vast majority of the poll answers said 1x, 1.25x, or 1.5 or 2x. And the vast majority of the poll answers said 1x speed, 62%.
But a couple people did put down the 0.5, which really fascinates me. I would never imagine
listening to a podcast at half speed. Well, it's kind of funny, because I don't know if
our listeners have noticed, but it has happened a couple of times that your recording has gotten
out of sync a little bit and you start to sound kind of like you're speaking at half speed.
I think I've always done a pretty good job of editing that. So I doubt they've ever heard that.
Maybe one episode came out. I don't remember though.
Next time it happens, I won't tell you so that it doesn't.
But I've had plenty of people actually tell me, like students and whatever,
like, you sound different in person.
Oh, right, you're not talking as fast as I expect you to.
Because they watch all of the talks.
Less than that 1.5.
Yeah.
CBPCast and my YouTube channel.
Okay.
Well, we'd love to hear your thoughts about the show as well.
You can always reach out to us on Facebook, Twitter,
or email us at feedback at cbcast.com. And don't forget to leave us a review on iTunes. Joining us today is Pablo Santos.
Prior to entering startup mode to launch Plastic SCM back in 2005, Pablo worked as an R&D engineer
in fleet control software development and later digital television software stack. Then he moved
to a project management position leading the evolution of an ERP software package for industrial companies.
During these years, he became an expert in version control and software configuration management,
working as a consultant and participating in several events as a speaker.
Pablo founded CodeEyes Software in 2005, and since then is focused on his role as chief engineering,
designing and developing plastic SCM and semantic merge, among other SCM products.
Pablo, welcome to the show.
Hello, thank you for hosting me.
I'm curious about this fleet management that you were working on.
Can you tell us something about that project?
Well, it was actually a C++ project, and it was my first job once I got out of the university.
And it was, you know, a piece of software to actually show information about the buses going through the city.
So in every bus stop, you have like, you know, information about the next one and so on.
And then specifically the part I was working on was the the control center for
that like the the entire software that the the the guys had in the control center to actually
monitor if there were delays or something was wrong or something like that the gps at the time
it was around 2001 uh we're not i mean not that we're not that precise but probably they were not
so available that they are today.
So we had to do a lot of corrections in things like, you know,
interpolating the position, stuff like that,
because you couldn't trust the device all the time.
So it had a mix of things like from, you know,
from socket level sending messages to actually low level stuff
to DirectX rendering things and graphics,
which was my favorite part, by the way.
And so, yeah, a lot of things.
And my first experience with version control, too.
That sounds pretty awesome, actually.
I don't know if I've ever been to a city in America
that had bus stops that well-organized
that said what the next bus was coming
and how far away it was. Have you ever seen that, Rob? I've seen it in Germany. Yeah, that's pretty neat.
I never really thought about what's behind that to actually tell you the real truth of what's
happening. Yeah, it was something, you know, today here in Europe, I will say it's sort of
everywhere. At that time when I joined was new, sort of new.
There were just a few cities adopting that.
But yeah, now it's something like you sort of expect, right?
You get to a bus stop and then you get this digital information.
Years ago when I was a kid, it was just something printed down there.
So you have to make your calculation when the next one is going to come or something.
But yeah, that's good.
And I think right now they even have an app.
I don't use it that often anymore, but I think they even have an app.
And the nice thing is they can even reuse the whole thing for other uses,
like police and stuff like that, where the cars are and things like that,
all for control centers and so on.
So yeah, that was pretty exciting.
Wow. Yeah.
We're going to talk more about source control in a minute,
but can you give us a preview
and tell us maybe what source control you used then?
Of course.
My first version control then was source safe,
nothing super fancy.
Oh, yes.
That was probably the motivation
why I started thinking I have to create something better. To be fair, yes. That was probably the motivation why I started thinking I have to create something better.
To be fair, yeah.
I mean, SourceSafe was kind of terrible.
I don't think it's maintained by Microsoft anymore.
But if you had a team of like two people, it was a cheap and easy way to set things up.
But yeah, I mean, you know, it's better than nothing.
By that time, I mean, you know, it's better than nothing. By that time, I mean, right now, every new student,
they learn Git or, well, not SuperSafe anymore,
probably Git at the university or college and so on.
So they get this understanding.
But 20 years ago, you didn't have that information
where you were at the university.
So you saw SourceFace was kind of cool and well
i i started to hate it when i started using it but now in my memories is kind of cool because
you know uh as bad as it was it created a lot of conversation after that so it's really i mean you
you meet any developer around the world and you can spend quite a good time talking about how bad it was.
So it's kind of social, right?
Yeah, and as you said, it's better than nothing,
at least to give you an introduction to not copy-paste file management.
Absolutely, absolutely.
This nightmare of this is the good one, dot zip,
good one, really good, dot zip.
This is the one, the final, you know, yeah, yeah, yeah.
Oh, yeah, yeah, yeah.
That's terrible.
Okay, so Pablo, we've got a couple news articles to discuss.
Feel free to comment on any of these,
and then we'll start talking more about SEM
and merge control tools and the other
stuff that you work on okay okay perfect okay so this first one is uh boden cross-platform framework
and there's a preview release that they have up on github and it looks pretty cool it's it's made
for android and ios applications if you want to make an Android or iOS app and do so in completely,
you know, native C++ without having to do any Objective-C or Java, then this is a good solution
for you. I think what makes it different from Qt controls is that where Qt, it's like you're
rendering everything in Qt's own widgets. This is kind of creating a wrapper, but then doing the actual rendering of a button with the actual iOS button control.
Yeah.
Android button control.
It's very much in that vein like WX widgets, which I don't know if we've ever discussed that on this program.
Have we?
Definitely not in any depth.
Okay.
But yeah, WX is the same idea.
It's always native controls, which has its limitations
because you're limited to what is the union of all the things
that are possible with native controls on those platforms.
Is WX widgets a, you know, does that do mobile
or is it just like a desktop framework?
I'm going to have to look to see if it was ever updated to add mobile,
because mobile wasn't a thing when I last used WXWidgets.
So it might be kind of like a WXWidgets for mobile.
Yeah, it might be.
Okay, well, just a small comment.
I'm more on C Sharp these days than C++,
but I find this news really exciting
because this is one of the things we really miss
on the C Sharp landscape.
There are many options, but not really a single one
for all platforms, so like a QT for everything, right?
And if this one can make the cut,
probably we can have nice wrappers too.
So kind of excited about it.
Well, and since you're talking about C Sharp,
I mean, this is also kind of similar
to what Xamarin.Forms does, I think, right?
Yeah, exactly.
But the thing is that they are bigger
on the mobile side of things,
not that much on the desktop side of things.
I'm personally more interested on desktop
because of the kind of tools we develop.
Right.
But they are really focused on the...
I mean, they can also work on desktop,
but it's not the thing they do, right?
It's like a secondary thing.
Right.
Jason, did you find out whether WX Wid widgets has a mobile support or not i see no
mobile operating systems listed yeah okay so the next article we have is um this is from
chris tabella and it's sg20 education and recommended videos for teaching c++ and i
think we probably talked a little bit about this new study group when we were talking to Ashley and JF about the San Diego ISO C++ meetings.
And this new group is just going to be focusing on education.
So Christabella and JC Van Winkle are the ones who are heading the group.
And I thought this was a really interesting article talking about kind of
how this group got started.
And he also has a list of,
um,
several recorded talks that I thought was a really nice curated list of good,
uh,
CPP con and,
you know,
meeting C++ and et cetera,
talks that are,
uh,
all worth watching.
Yeah.
There's definitely a lot in there.
Yeah.
And several of them I have not watched
and should probably go back and watch at some point.
Yeah, there's always plenty of talks I need to catch up on.
I'll have to look through these.
Do you ever have any interaction
with the ISO C++ committee, Pablo?
Not really.
I try to follow where they are
going and so on
and I really miss this part of
having
a
roadmap of what they are going to do
and so on. I think that's
something we missed for a long
time, right?
Yeah, there's certainly a lot more
forthcoming with the direction of the
committee and everything these days it seems okay and then the last thing we have to discuss is uh
sable's now call for submissions is now live uh the conference this year or next year is going to
be may 5th to may 10th and what is the deadline to get your submission in? It's relatively soon, I think.
Yes, January 23rd.
January 23rd, and then proposal decisions will be sent out February 25th.
And the program goes online in March.
Are you working on your talk submission yet, Jason?
Nope.
You planning on one, though?
Let's say probably.
Okay.
And I say probably because there's a lot of conferences this year
right yeah and i am hoping to go to core c++ which is in tel aviv the week after this
right right so that complicates my schedule and so no firm decisions from my part have been made
i'll probably submit something
and see what happens, what gets accepted, whatever. And C++ now is always easy for me to get to
because I just have to drive across the Rocky Mountains. Much easier than it would have been
150 years ago, anyhow. Okay, so Pablo, to start off, why don't you tell us a little bit about Semantic Merge?
Okay. So, well, as you said at the beginning, we started developing our own version control system.
Like, you know, we wanted to come up with a product.
And soon after, we had to develop one of the pieces that you need to provide in a version control, which is a merge tool, right? You get two branches you you want to merge them you
go to the file level and then you have to get you know like the two contributors and a common
ancestor and merge them together and all that right and then we started thinking you know is
there a better way to do this uh we we developed the entire algorithm for the text base d for merge
so that's something we we we did and we continue developing and improving over the years.
But then we said,
okay, what every single programmer you talk to has in mind
is something like,
okay, why your diff and merge tool doesn't understand the code?
Why it works on a text-based
instead of on a method-based based or function based and that's
exactly what i want you to do right at the very beginning and i remember drawing the first uh
you know blueprints or the first ideas i will say uh uh really soon like really soon after just
founding the company and starting up but it had to to pass a lot of times since we really had the basics and enough technology
and enough of the different things you need for the core
to actually start thinking on it.
And then it was around 2012 or something like that
when we really started working on it.
And basically what we did is, okay, let's parse the code
and then let's calculate the diff and the merge
based on the code structure,
not the actual positions of lines and
text blocks and so
on, right? And that's
basically it, right? I mean, once
you describe that to a developer,
he immediately
imagines what the thing is about, right?
If you go and move a method
to a different location, well, a function
to a different location and then someone a function to a different location,
and then someone else modifies it on the original location,
and then you merge it, you get the move and the change is put into the right location, right?
And that's the magic of the semantic thing, right?
That's basically one of the things.
At the end of the day, we rerun a lot of merges.
We have something called a replay,
and we replay public repos on GitHub
and try to repeat all the merges
to figure out how we can do better and so on.
And at the end of the day,
even if some teams try to avoid refactoring,
because for them it's not a good idea
to actually be refactoring code
while it's life evolving, let's say, right?
While it's still evolving.
They prefer to do a freeze or something like that. But they try to avoid it, but even though's say, right? Well, it's still evolving. They prefer to do like a freeze or something like that.
But they try to avoid it.
But even though it happens, right?
And a lot of, there's a good percentage of conflicts that we consider semantic because
they involve moving pieces of code.
And that's a little bit what semantic is about, right? about right so uh what languages uh does semantic uh merge work with if i'm because i'm thinking
about the fact that you have to parse these absolutely the the downside of this approach
i mean and we support both right we we have tools to actually and in fact semantic provides a regular
text based if tool our merge tool right Like KD3 or any other.
So it does that.
I mean, it does this basic.
But then if it has support for the language,
it does more because then it understands structure and so on.
We can even plug external parsers and all that.
But to your question,
well, surprisingly, we support C++.
And we also support C.
We support Java, C Sharp.
And we also have some external parsers.
I mean, not developed by us, but actually by the community for Delphi,
the object Pascal from Portland.
Well, yeah. And also we have something for Python
and a little bit of,
we have an experimental one for JavaScript
and things like that, right?
I mean, it's not difficult to add more languages.
In fact, we are working on Swift at this point.
So it's not difficult to add more languages.
And most of the time we're based on a standard parse.
I mean, for instance, I can tell you a little bit of the story of...
I mean, for C Sharp, it's very easy
because we have this Roslyn project by Microsoft
where they provide the entire compiler infrastructure
and parsing and everything.
So it's very simple to get the AST.
So that's sort of simple.
For C, we use LeafClang.
I don't know if my presentation is correct
for that that c lang or everyone says it differently yeah okay okay okay so that's fine and that's the
strategy we tried for c++ at the beginning but it well you know c++ is by far the most complex of
them all of all of them so it was kind of, should we go that direction? It's going to be crazy.
And it was a little bit crazy
with leave C lang because you need to
apply it or at least
understand it was that you need to
give it all the dependent,
well, not dependencies, all the includes.
So it made it like
a little bit of a nightmare because you're just merging
a file so you get the other two
files and you don't want to configure your merge tool to know or your include paths and stuff like that
but at the end of the day we got a very nice uh parser from the folks at the eclipse cdt
project right they have a they have a library to actually parse c++, which is what they use for their IDE in the Eclipse project,
and we use that.
So actually we parse C++ using Java,
which is probably like a little bit of a sin,
but it works pretty well
because it's sort of very independent, right?
It's not like you have to feed it.
If you need to ask the user to actually configure his merge tool almost like a makefile, he's not going to feed it. If you need to ask the user to actually configure his merge tool
almost like a make file, he's not going to use it.
He's going to be like, okay, you're crazy.
So now you need zero configurations.
Like, okay, it's there.
You just use it and it goes and it's fine.
At the end of the day, we don't need super strict parsing either.
It's more like, okay, where are the functions?
Where does it start? Where does it start?
Where does this end?
If this, you know, it's all what we need.
We could even probably have gone the, you know,
pairing brackets and stuff like that.
But, you know, it's never that simple with C++.
So I'm just trying to imagine how all this comes together.
You've got a parser that builds some sort of AST
or semantic information about the code.
And then do you actually diff then the ASTs effectively?
Exactly. That's the trick, right?
Since we develop our own version control system,
Plastic SCM actually, which is like an alternative to Git.
So since we develop all that and all the algorithms inside,
we actually reuse part of the code we have to actually merge directories
and the elements of directories, which are the, you know, you have trees there.
So we sort of reuse the same, actually not the same code,
but the same know-how and the same, know and some of the tests too to be honest we
reuse that to actually put it into the file so it was like okay you are merging at the directory
level and then you're zooming into the file level but you split it into components so you have a
namespace and then inside the namespace you have a class and then you have functions and well
depending on the language there's a different construction If you are doing Java, it's a little bit different than C Sharp or C++,
but at the end of the day, it's more or less the same, right?
What we don't do, I mean, we stop at the method level.
So if you move a method to a different location,
we need to identify whether it's the same method.
So basically, we parse the ASTs, we div the ASTs,
and then we need to figure out if something you move and rename is
still the same thing. And there we use some similarity algorithms to actually find if the
text blocks inside are similar enough. Something that is worth saying is that in order to cut
complexity, because the problem can get super complex, right?
What we did was, okay, we parse just up to the declaration level. So inside the function itself,
we don't parse. I mean, the parsers actually do its job, but we don't use that, right? It's like, I know there is a function and then you have 50 lines. Okay, these 50 lines are text for me.
I know it was moved to a different location.
I know it was renamed.
But then I just, I mean,
we don't merge at the level of ifs and else
and stuff like that, right?
That's what I mean.
It's more at the method level.
I mean, at the end of the day, you get it merged, of course,
but we use regular text for that
because otherwise the complexity explodes, you know?
Right.
It's just unsolvable thing, right?
So, yes, we create the ASTs, we take the ASTs, we div them,
we get two pairs of differences, basically how it was with your version,
how it was plus your version,
and then that's what we use to calculate the actual merge.
And the super nice thing is that it's amazing the number of manual conflicts
that are a nightmare to solve
that become even automatic, right?
Like you move a method to a different location,
then you change it,
you try to merge that with a regular merge tool
and you get the old method that has nothing to do
that is now in the top of the file or something, try to be matched with another method that has nothing to do, that is now in the top of the function or of the file or something,
try to be matched with another method
that has nothing to do with it and you go crazy.
Then as soon as you go in semantic mode,
okay, it knows the same method.
It doesn't matter what it is, right?
It just knows how to merge it.
And the super nice thing,
well, I get super excited about it.
So I can talk for hours, but don't worry, I'll stop.
So the super nice thing about this
is that we can even do it across files,
which is not, of course, something you do on a daily basis,
but when you do it, it's really good.
Like, you move a function to a different file
because you're cleaning up code and you're refactoring stuff
and you just do that.
While someone else was modifying the function
on the original location, we can merge that and diff it too, right?
We can diff and merge that thing.
And that's pretty amazing.
I mean, I have this vision.
My view is that sooner or later,
all diffs and merge tools are going to do this.
I don't know if it's going to be with our technology.
I hope it is.
But I think it's going to happen
because there's no good reason not to do it this way. I mean, it's like, it's simply, it's better than regular diffs on a daily basis, right?
Of course, you can tell me, okay, you don't do that all day.
I mean, you don't do that every single day.
Of course, you don't.
But when you do it, and it's more often than you think, it's super useful.
Wow.
Sounds pretty cool.
Yeah, it does.
One question I had was since you're doing all this AST parsing
and everything, is there any
performance concern if you have like a multi-million
line code base or is it
not doing enough parsing
that it doesn't become much of a concern?
Okay. Suppose you parse
duplex, sorry, not parse,
duplex semantic to your Git, which is something you can do, you plug, sorry, not parse, you plug semantic to your Git, right?
Which is something you can do.
You can just say, okay, Git,
use this tool as my merge tool, that's all.
So you run git merge tool and it runs semantic
instead of, I don't know, KD3 or whatever.
Okay, when that happens,
it will only parse the files in conflict.
So if some files are not in conflict,
then it won't be parsed at all because it's Git driving.
I mean, if Git already knows how to solve it,
then there's no worries in there.
The second thing is about the multi-file.
When the multi-file is in place,
and that's something, well, we developed,
we reused some of the technology we have in Plastic SCN,
which is our version control,
Plasmatic to create a free,
well, it's free, it's still free.
We don't know what is going to be commercial anytime soon.
A free Git client, which is called Gmaster.
You can go to gmaster.io and download it.
And it puts together parts of the two products.
And this is the one doing the multi-file semantic merge.
And when that happens, we have to interfere a little bit with Git
in the sense that it's not the one actually driving the merge process,
but we are the ones doing that
because we can find conflicts in parts where Git doesn't.
For instance, the typical thing of move code is you add a file and then you have
some code move from the original file
into this new one. When you merge that, Git is not going
to find any conflict in the added file because it's new,
right? There can be a conflict in there. But we
can find that there was code move from the original file into
the second one. And then we expand the scope of the conflict,
let's say. So Git
will say, okay, foo.c
has a conflict, but we know that foo.c
is involved in a refactor group
with var.c because
of that. I have some
well, it's slightly
more difficult to explain
just talking,
but I have a few
graphics that we can share
later on at BlogPost
where they can find a very
neat explanation in just a few
paragraphs, right? Of how it works.
So that's a little bit about it, right?
It's all about
the main motivation is
okay, let's make it
simpler to actually find
and solve conflicts. I mean, merges has been really feared by developers for,
I don't know, I would say generations,
but maybe it's a little bit too much.
But that's, yeah, but that's, you know,
many people say, okay, I get a merge conflict.
It's like, oh my God.
And we just try to make it something much simpler
and much more powerful.
That's what it is.
So you already mentioned that it works with Git.
Does it work with any source control system?
Yeah, you can plug semantic into...
It's just a standard diff and merge tool
because you can also use it to diff, right?
It's not only merging because, in fact, you are going to use the diff part.
We call it semantic diff, not very original.
But you're going to use it much more often than you use actually the merge
because probably you merge, I don't know, a few times a week.
Okay, it depends on your role.
Maybe some people say, okay, I do that on a daily basis.
But anyway, you diff very often
because you want to see your own changes,
how you modify that.
Okay, I'm on a good track.
How I compare this to what other colleague did
or something like that.
And then you can also use the semantic diff in there, right?
So yeah, it's something you use on a daily basis.
And answering your question, yes, you can plug it to any version control system.
In fact, you can plug it to our own Plastic SCM.
Of course, you can do it with Git.
You can do it with Perforce.
No reason not to do it with Subversion.
So, yeah, I mean, it's just, I mean, the same way you invoke KDE 3 or P4 Merge or R-Axis or our axis or beyond compare you can just plug semantic in there
what operating systems do you support well uh we have actually we're supporting windows we
have version one on windows mac and linux but we have less demand less less interest i will say
with uh mac and linux So we delayed version two.
So we are more on the windows of things.
Although as I speak, I can tell you
because we are going to be sharing some screenshots
this week about the upcoming versions for Mac and Linux.
The thing is that the first version, version one of semantic
was sort of alien in terms of UI,
right? So it was able to actually solve all the semantic conflicts and so on, but it didn't look
like any other three-way merge tool. So in version two, we said, okay, let's simply add the semantic
power to a more natural tool where you see the three panels with the three contributors and so on.
That's it, right?
It's mostly like a regular tool,
just decorated with semantic information.
And that's what version two is,
but it's available on Windows only at this point.
Okay.
I wanted to interrupt the discussion for just a moment
to bring you a word from our sponsors.
PVS Studio Analyzer detects a wide range of bugs.
This is possible thanks to the combination of various techniques,
such as data flow analysis, symbolic execution,
method annotations, and pattern-based matching analysis.
PVS Studio team invites listeners to get acquainted with the article
Technologies Used in the PVS Studio Code Analyzer
for Finding Bugs and Potential Vulnerabilities,
a link to which will be given in the podcast description.
The article describes the analyzer's internal design principles
and reveals the magic that allows detecting some types of bugs.
So you've mentioned it a couple times.
Do you want to tell us a little bit more about Plastic SCM,
this other source control that your company makes?
Okay. In fact, Plastic was the main motivation to start the company makes? Okay. In fact, plastic was the main motivation
to start the company back in 2005.
This is the
main product we developed, or main
source of revenue, I will
say.
Okay, we started it with
the intention to have
something better than the
available
subversions of the world, andS and source saves, as we said.
And it was by the time when Git also started.
So we started in 2005 and a few months after really starting the company, Git was out in
its infancy.
And well, we continue evolving through that, right? We have a very wide range of users
from indie teams doing games with just two, three people
to some corporations with 3,000 developers
using Plastic on a few projects.
And we even have some super huge project
with 1,000 developers in the same code base,
which is kind of crazy in terms of merging and branching and all that.
And we built it around a few key concepts, right?
It's basically super strong with branching.
It's very, very simple to do branching, super fast in plastic.
It's very good with merging.
I mean, Git is super good with merging
but we can do even a few more cases
and even making it a little bit easier
than what they do
which is not easy because git is good
but we try to be even better in that sense
so that's a real challenge
and then it's very good
with big files and super
big repos
like we have customers with repositories in the
four terabyte five terabyte range especially in video games that's one of the things we
excited that's one of the things we we had to i mean we wanted to do by design from day one and
that's one of the things that separates us from other systems get in particular then we can also
do something quite interesting which is plastic is fully. Then we can also do something quite interesting, which is
Plastic is fully distributed,
but it can also be fully centralized
and every combination in between.
What I mean by that, you can work with local repos
like Google doing Git, but
you can create a new workspace, working copy,
and work directly
with a server like Google doing Subversion.
What we mean by this is that
in a lot of teams,
you have really different profiles of people.
So some of them are very comfortable
with the distributed way of working,
but some of them really prefer to go into the checking
or commit mode directly to the central servers,
and that's it.
Especially, you know, in games, we are very big in games,
and lots of artists,
not super techie developers.
I mean, people that are not normally
full-time developers, let's say,
or just artists and stuff like that,
they really prefer to just check in and done.
And we also implement locking.
It's optional.
I mean, we prefer merging.
I mean, merging is our life, as I said.
So locking is kind of a second thing to do. But
there are certain assets, certain files that you cannot merge, and then locking is good with that.
So these are sort of the key things. Of course, there's something I always forget to say,
because mostly I take it for granted, but we develop the entire stack. I mean, it's not only
the core, the server, the command command line we develop all the merge tools
as i said at the beginning plus div tools plus gui's in mac linux and windows plus you know it's
sort of a complete package you download it and you have all the pieces it's not like okay i get
the command line here but then i need a gui for some other provider and then i need a like a cloud
service from someone else we just provide the whole thing which is a lot of work but it's fun right you mentioned that you have very good support
for large binary assets and then i started wondering and i don't believe you address this is
is it possible for your uh semantic diffing semantic merge tool to work with binary files
or do you have plans to say like,
well, these two P and Gs were both changed
by these two different people.
Let's look at the actual images here.
Okay, one thing is a plastic SCM version control
and it can handle really huge files.
I mean, release after release
and we do like a version, three versions every week.
Not that we deploy all of them to everyone,
but it's like, okay, they are available to download.
Each of them goes through a testing phase
where we always check in a one gigabyte file, right?
So size limit is not an issue.
But that's one thing, actually, what the core can handle.
And then you get the diff and merge.
Can you merge a PNG file?
No, with the standard diff and merge tools.
You will need to have like a special diff for merge.
We have a special diff for images.
So you put two images together
and then it can show you side by side.
It can actually calculate the difference,
which is sometimes is very nice,
but some of it you don't understand anything
because like kind of diff in pixels so it's like crazy but we can also do like a sliding image like you put one over the other
and then you have a slider and you move it and you um you reveal one side or the other so if
if the images are completely different then you don't really realize what's going on but if it's
the same i mean the same concept or the same image
with a small modification, you can do it.
So for diff, what we have is, okay, we have diff for image,
and then we have a way to plug custom diffs.
For instance, like if...
And again, it's not that we only work for games.
Of course, we work for a lot of industries too,
but in games, they normally have custom packages where they put textures and 3D models together,
packaged somehow.
So many studios develop their own diff thing,
which takes one of these files, unpacks it,
and shows you what was different.
But to answer your original question,
no, we don't have a merge tool for PNGs or something like that.
It's something that we've done something like that
for Unity projects, for instance.
We have our own merge tool based on the same core,
but of course handling the actual content in a different way
for what they call a scene files,
which are some text-based representation of stuff,
but not for i mean we
cannot do magic and merge and x files or something like that right no that's that's not yet there
okay okay i i had another question just thinking about all your tools here and your and your nice
diffing and your source control system and And I was recently involved in a project
where I was the person tasked with applying Clang format
across the entire code base.
And part of this resulted in a lot of whitespace diffs
because we went from tabs to spaces.
Wow.
So basically every single line in the entire repo changed.
Now, I was able to resolve a lot of potential conflicts
by putting Git in all kinds of ignore whitespace change mode.
And I got something usable
and then was able to pass off instructions to people
who still had to do their own merging.
And just out of curiosity,
is this something that your tools would handle differently or would we be in
the same same kind of that's one of the things where semantic merge will help right when you
actually changing just the indentation or tabs by spaces tab like that is not like a semantic
issue at all because i mean not a semantic merge at all because the structure of the code doesn't really change okay but it will help
because basically knows uh how to reduce that i mean it understands that the only thing that
changed was a space or a dap or something like that and it will react accordingly in some languages
we even have the option i mean we have it for c I have to check if we have it for C++ at this point,
but we have a way that when it's going to merge a method,
we have the option to reformat it before merging,
which means...
Oh, okay.
I mean, if you are...
I mean, the thing is that
suppose you simply split a line in three lines,
like you have a call to a function and you split it in three lines
to make the line not that wide, right?
To make it thinner or something like that.
And then someone else added something in the middle or something like that.
So you can have a really, well, a conflict in there.
But if you reformat the code to a given standard or something you can get rid of some
of these reformatting things and then you merge it and reformat again that's something we provide
for some of the languages so sometimes you can solve some conflicts that were simply
not able to be automatic right so that's one of the things we do.
And for instance, some of the things,
not necessarily your case, but suppose,
a typical case where you add an include
at the beginning of the file,
and then five lines later,
you add someone that's added the same include
on a different branch.
Semantic is smart enough.
I mean, it's not AI or anything like that, right?
But it's smart enough to actually know
there's the same include at it twice,
so it will only keep one.
Automatically, it won't even ask you.
Otherwise, you will get the two.
Like, okay, you add, I mean, stupid thing,
but you add, I don't know, whatever.
The same way in first line and then in line 10 of the file
because it's like 10 lines of includes, right?
Well, you just solve it automatically like 10 lines of includes, right?
Well, you just solve it automatically.
And that's interesting, right?
That's one of the things it does.
There are a few others like, for instance,
it can warm you if there are conflicts in the same function even if they don't collide.
So you have an option to say, okay, warm me about that.
Suppose you have a function with 10 lines.
You modify something in line 1, I modify something in line 10.
Any merge tool is going to merge that automatically
because there's a conflict, it's not the same line.
We have an option to say, okay, even if that is not in the same line,
since it's in the same method or the same function, let me know
because maybe I want to take a look,
because maybe I'm changing the logic in some way
that I like to review manually or something.
Okay, we can do this kind of stuff
because you understand it's in the same context,
in the same, you know, it's in the same function.
If you are not parsing functions, you don't know,
you only know, I mean, when you work in text mode,
it's the same to merge, it's the same for you
to merge a love letter than a C++ file because
you don't know what's in there, right? So, basically.
Right. Okay, now I'm
totally going to be going out here
on a limb here from...
But I'm thinking about all the capabilities you've
mentioned, and I would like
a tool that if I go to make, you know,
there's a conflict, like, or, you know, go to make
a merge, you said, if two
people edited the same function,
you could warn on that.
Is it possible with your semantic abilities to say,
well, programmer A actually did a copy and paste of this code
into another chunk where really like that's like
some sort of like smart copy paste detection on merge.
That would be awesome.
We don't do it right now,
but it's not the first time I am asked for this.
Okay, cool.
And it wouldn't be that difficult
because, in fact, we're doing that
when we div the code.
We are already matching
similar text blocks, right? We already do that. In fact, before
launching Semantic, we had something that we still have called Xdiff and Xmerge
and the X stands for cross or move, right? So we have move detection for
test blocks even if we don't parse the language. So Semantic can do that. Suppose you launch Semantic with a file you don't parse.
I mean, I don't know.
Rust, okay?
We don't have a parser for Rust.
So you start it with Rust.
You move a block of code.
Our system is able to figure out that this text block was moved
because it knows it has a similar...
Okay, if what you move is exactly the same,
you just move to a different location, it's able to
diff it and merge it correctly and apply
exactly the same thing as I said with functions
but to test blocks. If you make
small modifications after you move the block,
it's still able to find
it's the same up to a certain
similar...
Similarities.
Similarities.
Okay. Similarities.
Okay, this percentage, right? So it's able to do that.
So there's nothing preventing us to actually say,
okay, it's the same thing up to some percentage or something
because it's something we do, right?
I mean, it's already doing that.
The diff is already using that for the calculation.
The thing is that we are not providing the info per se,
like, okay, saying, hey, you just copied the file
or you just copied the method from this other place
and you shouldn't be able to do that or anything.
Okay.
Okay.
Okay.
That sounds really powerful.
Yeah.
Is there anything else you wanted to go over that we haven't brought up yet with either Plastic SCM or Semantic Merge?
Well, I think basically we covered all the essential concepts in here.
The only thing I will say is that, well, if anyone wants to give it a try, they can go to semanticmerch.com
and they have a free trial for 30 days.
So they just can download it and use it
and see if it's really as good as I said
and check it for themselves.
And if they want to use or take a glance of,
get a better understanding of the entire technology we developed,
including this multi-file semantic thing and so on,
they can go to gmaster.io,
and then they have a free Git client
which comes with all the merge technology and so on.
And of course, if they want to switch to a different version control,
we have Plastic SCM, which is our main product
and the thing that makes our hearts beat every day.
So that's the big thing for us.
These are all, well, the Git merge tool, sorry, Gmaster.
Gmaster, you said, is free.
The other ones are commercial.
Do you have options for open source or students or that kind of thing?
Absolutely, and thank you for saying that because I always forget it. Yeah, it's free for open source or students or that kind of thing? Absolutely. And thank you for saying that because I always forget it.
Yeah, it's free for open source.
All of them are free for open source.
And we have discounts for institutions.
And we have free versions for individuals.
So any individual can use it for free.
No issues in there.
Just have to, you know, in some of the sites,
they can directly register and request the license
and so on.
Otherwise they can always contact us.
So for individuals,
hobbies and stuff like that,
open source projects,
it's completely free.
And then we have discounts for educational purposes and all that.
So,
yeah,
we are pretty flexible in there,
in there,
in that,
right.
I mean,
we are a very small team,
very small company,
and we are pretty friendly with all the licensing stuff.
Awesome. Well, it's been great having you on the show today, Pablo.
Thank you very much. Thank you very much for inviting me. It was a pleasure.
And Rob, if you may, before we leave, there is a news item we forgot to go over.
Oh, what's that? The C++ on C call for volunteers and students submissions is currently up, and those are closing on January 1st.
So we'll get this news out just barely in time.
Okay, well, I'll put that link in the show notes as well.
Okay, thanks again, Pablo.
Thanks.
Thank you.
Thanks so much for listening in as we chat about C++.
We'd love to hear what you think of the podcast.
Please let us know if we're discussing the stuff you're interested in,
or if you have a suggestion for a topic, we'd love to hear about that too.
You can email all your thoughts to feedback at cppcast.com.
We'd also appreciate if you can like CppCast on Facebook
and follow CppCast on Twitter.
You can also follow me at Rob W.ving and Jason at left to kiss on Twitter.
We'd also like to thank all our patrons who helped support the show through
Patreon.
If you'd like to support us on Patreon,
you can do so at patreon.com slash CPP cast.
And of course you can find all that info and the show notes on the podcast
website at cppcast.com.
Theme music for this episode was provided by podcastthemes.com.