Software at Scale - Software at Scale 35 - Maintaining Git with Johannes Schindelin
Episode Date: October 20, 2021Johannes Schindelin is the maintainer (BDFL) of Git for Windows.Apple Podcasts | Spotify | Google PodcastsGit is a fundamental piece of the software community, and we get to learn the history and ...inner workings of the project in this episode. Maintaining a widely-used open source project involves a ton of expected complexity around handling bug reports, deprecations, and inclusive culture, but also requires management of inter-personal relationships, ease of contribution, and other aspects that are fascinating to learn about.Highlights00:06 - How did Johannes end up as the maintainer of Git for Windows?06:30 - The Git community in the early days. Fun fact: Git used to be called `dircache`08:30 - How many downloads does Git for Windows get today?10:15 - Why does Git for Windows a separate project? Why not make improvements to Git itself?24:00 - How do you deprecate functionality when there are millions of users of your product and you have no telemetry?30:00 - What does being the BDFL of a project mean? What does Johannes day-to-day look like?33:00 - What is GitGitGadget? How does it make contributions easier?41:00 - How do foster an inclusive community of an open-source project?50:00 - What’s next for Git? This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev
Transcript
Discussion (0)
Welcome to Software at Scale, a podcast where we discuss the technical stories behind large software applications.
I'm your host, Utsav Shah, and thank you for listening.
Hey, Johannes, welcome to the Software at Scale podcast. Thank you for being a guest.
Thank you for having me, Utsav.
So you are the maintainer for Git for Windows,
right? So my first question to you is, what got you into that? How did you get interested in being
like a maintainer for Git? I wasn't. I stumbled into it. So this is one of those things where I
said, how hard could this be? And instead of running away, like saying too hard, too hard, like a normal person,
I just got stuck with it. And I thought that I could basically hand it off to somebody else
interested. And that really didn't work out. In the end, it was good for me because I got a nice
job at Microsoft because I'm the good for Windows maintainer. But for seven years, I just worked on
it off and on in my free time. And basically only when I felt guilty enough
for not having taken care of it.
But I was actually a maintainer,
a software open source software maintainer before.
I started in 2000 when I got interested
in that protocol VNC,
which is a little bit like remote desktop.
And there was this program
where you could connect to another computer and see the screen and remote control it.
So that was good and fine, dandy, but I wanted to actually have exported something that did not exist as a desktop.
So I turned that software into a library where you could implement your own VNC server.
And that was what I maintained for, I don't know,
I think I only gave up maintainership in something like 2012.
So that was my first maintainership.
But that was only after I had spent quite a couple of years
just benefiting from open source,
which back then was not even called open source.
It was basically just free source code in and the university's networks
and i enjoyed it and wanted to give something back essentially yeah there was no github or
anything like that it wasn't easy to make build requests there were ftp servers of all things
uh yeah it was it's much easier these days to get involved in open source than back then. So you were doing a PhD in genetics when you started maintaining Git
or started contributing to Git?
Yeah.
So the story was that originally I'm a mathematician
and I work with prime numbers and I love prime numbers to this day,
but I couldn't get anything work-wise done.
And at the end of the day, I really like to have a warm meal and a dry bed.
So I had to make some money.
And so I started a software company writing logistic software, basically sending trucks down the right roads where they could drive.
But I got this incredible chance in 2001 where somebody said,
hey, somebody in the university looks for somebody who knows something about computers to do image processing.
And I said, image processing, okay.
Yeah, and they would give you a PhD for it.
Oh, yeah, come in.
So I started that.
And in 2005, exactly around the time when Git started,
when this bitkeeper thing blew up,
which I followed really anxiously every day,
and multiple times I checked, are there news?
So Git was started, and exactly at that time,
my supervisor told me, Johannes,
you know, it really would be time to write up a thesis now,
right? I had just subscribed to the Git mailing list, and a week after the Git mailing list was
formed, I had to unsubscribe and had to write up a thesis. For three months straight, I really had
this regimen, this very strict regimen of kicking my butt into writing
paragraph after paragraph and then revising. So it was four days writing, one day revising,
and in the end, it was only four days revising and one day writing. In the end, I had it. And
the next thing after handing in the first draft to my supervisor was that I immediately resubscribed
to the GitMail list. And I think the
first contribution I made was
only after following for
a month or two.
And it was something to do
with getting Git compiled
on the Irix machine that we had in
university that I wanted to use.
And that was actually a big driver
for me because I wanted to use Git
to develop the software for the PhD and then the postdoc.
And that's how I got involved in Git.
But I did not maintain anything Git for years after that.
So in 2005, I started contributing.
And in 2007, I started Git for Windows, maintaining Git for Windows. I started Git for Windows basically at freelancing in an insurance company
where I was tasked with writing stuff using Subversion.
And Subversion is slow, I think.
At least it's slower than Git.
And I had to use it on Windows, which makes it double slow, right?
But then Git also was super slow because it only ran on SIGWIN.
And SIGWIN is this really cool thing
where you don't have to change your code much,
but it comes at the price of running relatively slow
by emulating all these POSIX functionalities.
And I have started porting GITs to pure Windows 32 API.
Win32, sorry, it's not 32-bit. It's it's win 32 that's the term even if it's 64-bit
but i started porting it and then my gig was up in that company and i didn't really know what to
do with those patches it was not really finished so i put it on the git mailing list and i went on
happily with my university stuff and other freelancing stuff.
And then that contributor actually came back.
The contributor called Johannes, also another Johannes, came back and said, oh, I finished it, by the way.
And then somehow they went on vacation and somebody else wanted to have a way to compile it because it was not really clear
how to build this from scratch. So I said, okay, I'll give that to you, but you then maintain it.
Okay. So they took it and they didn't maintain it. I got stuck with maintaining it. I tried to
really hand it off multiple times over the years, but in the end it worked out in 2015,
I got hired by Microsoft and it's really been quite an interesting and fun ride.
So clearly PhDs are useful so that you can procrastinate on them and do useful work.
Okay, that's a fascinating story.
Like how was the Git community when you, like, I guess it had just started, right?
As you said, like in 2005, how big was the Git community when you, like, I guess it had just started, right? At least you said like in 2005.
How big was the Git community?
And like, do you know approximately like how many users of Git were there?
Or like, what was it like at that time?
So in the very beginning, there was only one user.
That was Linus Torvalds.
And he still called it Dircache and not Git.
So it started out really really tiny and the first thing that was maintained with
git was git and then i mean people got really interested because they're just like i back then
i was a linux fan fanboy and linux fanboys were prone to be interested in this because it was good
right it was open source it was good for Linux. You could support it.
You could give something back.
And so I think the first couple months,
I would say there were already something like a dozen contributors,
regular contributors.
And a lot of things were really fundamental,
like the pack files.
There were no pack files in the initial version of Git.
It was all individual files. Every blob and every tree was an individual file. The performance
was horrible if you put something like a thousand files into that system. But it wasn't a thousand
files, it was just something like 50.
Yeah, it was really tiny. And it took a while, even after Linux officially started using Git,
took a while until it really felt to me like this had taken off. I mean, the real
point when I think even industry switched to git was when microsoft announced look we develop windows on top of git because if it's good enough for microsoft it's good for
enough for everybody for especially anybody else with a smaller repository on windows
yeah so git is open source anybody can download do you have any idea of how many people are using Git on Windows?
Do you have any download numbers or any metrics at all?
Or is it easy to get any of those numbers?
I do have download numbers,
but those download numbers have to be taken with a grain of salt
because obviously there are downstream users of Git for Windows
that I don't have any insight into.
But the GitHub releases on which Git for Windows is released
publishes download numbers.
I don't know statistics over time.
I would really like to see that,
but I have the impression that something like
a quarter million per week downloads happen.
And that's pretty consistent of the versions.
Even if they're like Visual Studio packages its own min-git.
So that's iBundle min-git and they package it. And then obviously they don't download it from GitHub releases.
So I don't know how often
they are used and also downloads are not uses users right there are many software packages that are downloaded tried once and then nobody uses them anymore but with git I have the impression
that with every new version there's consistently a quarter million per week downloads so probably roughly
some number to orient yourself yeah so maybe you can you can share with us like why does git for
windows need like a separate maintainer for than regular git as a layman i'm like oh it's git it
just should just work right what is all of the complexity and maybe you can just share some of those details right so the first thing is um git was born in the linux
ecosystem right and every linux user worth their salt will say i hate microsoft right i hate windows
i will never touch it and you know? I was exactly in that same camp.
Fun fact, until I started work at Microsoft,
I maintained Git for Windows,
not owning any machine that ran Windows.
I always had a VM,
usually in university with a campus license,
which is probably not the complete correct... I mean, Microsoft will probably not do anything about it
now that I'm one of their own,
but that's how much I despised working on the Windows,
that I wouldn't...
The first thing I did with a laptop after buying it
was deleting the Windows partition
and putting in Linux on it.
So this is...
I would say if you frequently Git mailing this,
that's basically the sentiment that you will still feel there.
People think about free software.
It should all be free.
And maybe sometimes not realizing that free software is really only free
if your time isn't worth anything.
Because time is really essential if you want to get all this running.
Now, the Git maintainer really doesn't want to have much to do with Windows.
Not necessarily because they hate Windows,
but I think because it's too tricky to get things working on Windows.
Because in the Linux world,
basically what you do is you rely on POSIX standards.
You look in that long list of available functions, Basically, what you do is you rely on POSIX standards.
You look in that long list of available functions, syscalls and commands on the shell, and then
you just make sure that you don't use anything that only GNU flavors speak, but that all
the flavors speak, and then you're set.
But those syscalls and those tools, they don't exist on Windows.
That's the real big thing.
For example, if you want to spawn a new process from your process on Linux and Unix, what
you do is you first call the fork syscall, which makes a complete copy of everything,
of the memory that you have, of all the file handles
you have open, of all the sockets that you have open, of all the file mappings you have
open, it makes them transparent.
So it's copy and write.
It's not really a copy unless you change it in one of the processes.
But then after that fork, you immediately call it an exec syscall, which replaces the entire thing with the executable
that you wanted to execute.
And so the first step, the fork, if you want to do it on Windows, there is no equivalent.
You have to do it all by hand.
And because the fork call does not know that there will be an exec call, it has to do all
that laboriously.
So that's what SIGWIN does, right?
And if you want to avoid that, you have to go into Git source code.
And what we should have done but didn't do, and now the ship has sailed,
is that we should have imitated what Subversion did, an abstraction layer.
Like, what do we want to do?
Do we want to spawn in that process?
And we don't call the syscall directly.
We call a helper function.
And this helper function can be implemented
on Windows, on Linux.
It can be implemented on iOS, whatever you want, right?
And that's what we should have done, but we didn't.
So it's basically my fault when i
started with a windows port i tried to do the minimal minimal thing where i tried to re-implement
syscalls where possible and where not possible as in the case of fork and spawn i would go to the
function that called fork and spawn and would then have an if def.
So a large part that's window specific and a large part that's not window specific, but
more POSIX.
And this is something you basically have to have a machine in order to test it.
And you have to have a willingness to sacrifice loads of nights in order to make this work.
And the Git maintainer probably is not willing to do that.
So June Hamano is really happy to leave all that to somebody else,
anybody else, as long as they don't have to change anything.
And of course, we try to be very gentle with the source code when we go to Windows specific things.
We try very, very much not to change the source code of Git too much, even if it comes at the price of some other things.
So is Git for Windows basically like a fork of Git that is rebased often enough that it doesn't have that many changes?
That's exactly how I try to do it.
We do develop patches over the course of...
So between releases, we develop new patches.
We accept new pull requests, new contributions.
So we accumulated something like 70 topic branches on top
of Git. And those are rebased
with every release. So
next time, 3.34
will come out soon
in November. And with
release candidates,
I start the process of rebasing.
And then we can
test it. We also
do pre-releases.
And if anybody's out there who wants to help with a project and doesn't really know how to develop C, testing is a huge help.
If you just test it with all your non-critical things and then when there's a bug that you
encounter, just give a really good bug report.
That really is helpful. So that's how we maintain it. Yes, it's a bug that you encounter, just give a really good bug report. That really is helpful.
That's how we maintain it. Yes, it's a friendly fork. It's rebased with every new release.
But between, we accumulate new stuff. And then, of course, we try to upstream the things that
proved robust enough. As a user, or just in general as a developer like you don't hear of people
running into git bugs that often you hear of people running into like you know all sorts of
like javascript npm node bugs but you don't hear that much like people running into git bugs but
how do you hear about them do you just hear from like the mailing list and
what would you say the nature is of git bugs because
like it seems like the standard commands are mostly stable so are they mostly on the periphery
with like all of the new commands that come out like the commit graph and stuff or is it that
there's so many different variations of git and there's so many different like versions of os's
and stuff you have to support that there's always bugs that come out on different OSs for some
reason. What is the kind of
bugs that you'll see?
That's a really interesting question
because there is a forest of bugs
and they all look
different.
I would actually contest that there are not very
many operating systems that
we support. It used to be
when I started with open source,
it used to be a really diverse landscape. There was HPUX, there was IRIX, there was OSF1,
all those different Unix flavors with their unique strengths and weaknesses, and they
basically are no longer relevant. Today, when we talk about Git bugs, I would say something like 98% of the Git bugs that
we encounter are reported for Windows, for Mac OS, or for Linux.
And Linux only in the versions that target Intel processors or AMD.
So the x86 underscore 64 architecture.
So that's not very diverse, right?
The bugs that we encounter there are often regressions,
where, I mean, with every fast-moving project,
you do have to expect that there are bugs.
Our test suite is large, but not completely comprehensive,
and it's also a little bit heterogeneous.
We don't have this thing where we have objects
and then unit tests for those objects
and then integration tests.
It's basically the contributors decide
what they want to test,
and some are really overzealous.
One contributor,
one very frequent contributor of the GitHub project
implemented a test that tests all the axes
of a huge matrix and it runs,
I think it takes 11 minutes on Windows to run it.
And they don't care because it's Windows.
It's much faster in Linux.
And then they say, so that's your problem,
which, yeah, I mean, in open source,
we should collaborate a little bit more,
but you can't say much about a very prolific contributor, right?
So the regression tests are not comprehensive enough to catch all those.
We did have a problem, for example, when we introduced protocol v2,
which is a really, really big step up from protocol v1 where you can basically implement more capabilities
and have more flexible back and forth with the server. This enabled us to do partial clone,
for example. But there was a bug, and the bug, there was a performance bug. It was rather bad. So we rolled it back for one major version and then really
banged out the hell out of this in order to fix this bug and then switch back on protocol B2.
So that's something like those are the normal bugs that we hear about. Sometimes come the
really tedious ones. Well, tedious for those people who have to deal with them,
like me, the security vulnerabilities.
And especially on Windows, there are so many ways
you can exploit things because of the peculiarities.
The file names are case insensitive.
But not only that, there are multiple ways
to refer to the same file.
There are the 8.3 names that you may remember from DoS times.
And they're turned off on the D drive and E drive if you have them. But on the C drive, which is where most people work, it's still turned on by default.
And for backwards compatibility, it cannot be turned off.
And there were so many enterprising ways how you could try to get to the git directory
and write into it without saying.git/.
And of course, if you can say that in a different way,
then you might be able to trick git into tracking that file and overwrite, for example, the hooks.
And then during the clone, you can execute the code that you deliver without the user
having a chance to inspect it.
So those are the, I would say those are the most stressful bugs.
The other bugs are not so stressful.
They are basically you fix it and then you give people, the reporter, a temporary intermediate version that fixed it.
And then in the next native version, you have a full fix.
So who are the security researchers doing this research on Git?
Is it just people on the Internet?
You don't even know who they are?
Or is there like a community of security researchers who look into Git after like a few major releases?
So there's no dedicated team taking care of Git.
It's more, they're security researchers
who basically band together, share knowledge,
and then try to bang on one project at a time.
And then figure out whether it's broken
in some enterprising ways.
So you probably heard of Google Zero.
And I don't think that they really looked at Git in detail yet.
But a couple of years ago, Microsoft security research looked into it.
And as it happens, I was on vacation.
And boy, was that stressful coming back.
That was not fun.
It was good in the end to have it all
fixed, but we had nine CPEs.
Nine.
That's horrible.
And I'm
incredibly thankful for the researchers
who are not
really a community, but they
basically know each other.
But they are also bug bounty hunters.
Like with GitHub's bug bounty
program, there are quite a few
people who report bugs
there and then get their bug bounty
and make a good, decent living.
But you have to have the talent to
figure out those bugs.
I wouldn't be able. I can't fix them, but I can't figure
them out.
Yeah, I still remember as kids, we
tried to name directories the word like
con and like windows would just not let you do that i think it seems like those bugs are related
to all of these like backwards compatibility things yes or aux.c that's the famous thing
because in the linux kernel there are two files named aux.c, and you can't clone a Linux kernel repository on Windows because of that.
Maybe you can share with us one thing.
I was reading through recent commits on Git for Windows.
I noticed that you said that there might be some users of this certain command.
We don't know for sure.
How do you find out that there are people using a deprecated command?
How do you deprecate something?
How do you decide there's not enough users using this?
We can basically start phasing it out.
When you don't have any telemetry from users,
how do you make decisions that can reduce tech debt
or reduce baggage from the past?
Right.
Without telemetry, it's extremely hard.
In another project I maintained, at some stage we tried to have an opt-in telemetry, it's extremely hard. And in another project I maintained, at some stage, we tried to
have an opt-in telemetry. And oh my, the backlash was so much. Nobody was willing to give me
something, even when I said, let's only do the telemetry that proves to funding agencies that my software should be funded.
They were happy to use it, but not to do anything to help me make money doing it.
So telemetry is really a tricky, tricky thing.
Most users just go from their own egotistical point of view and don't really look at the
benefits, especially when you try to make it opt-in
as opposed to opt-out.
Opt-out, I can understand.
But we do have telemetry,
just not of all users.
In my team, we support the Windows developers
to a certain extent.
So they basically started with the VFS for Git fork.
And a lot of things they do themselves,
but basically with the Git version,
we still have to maintain.
And for those, we do have telemetry
because they're in-house users.
There is no problem with a GDPR
because this is very confined.
They basically, what they do is intellectual property of the company anyways.
And so there we can actually look at things, how they are used.
But we basically use that data, that evidence, mostly to figure out where to focus our efforts,
not where to deprecate things.
So the deprecation is a really interesting thing.
In the Git project, I think so basically
this is not the Git for Windows
maintainer's decision because
I inherit
the deprecations from the Git maintainer.
And if I want to
deprecate something, I go through Git.
I don't want to go alone. I want to go
far. So I do it together.
For some things, it's really unclear. So we don't duplicate them. That's very simple.
But for other things, it's clear that we have to duplicate them at some stage.
My favorite example there is the preserve merges option of Git rebase. That was, it's my fault.
So it's definitely, I'm the guilty party here.
I introduced this and I thought it was a cool design
and it worked, but the limits of the design
became very, very obvious very quickly
in that you can't even reorder commits
if you use that option.
So I tried to implement it in a better way,
design it in a better way, in a backwards incompatible way.
So it's a new option now.
It's called rebase-merges because everything about it is new.
And it's also much faster because it is not implemented as a shell script.
It's implemented in pure C.
So the preserve merge now needs to go, right?
And I tried to deprecate it, I think, one and a half years ago.
And there was an immediate outcry.
No, you can't do that.
It was only the rebase merges is only stable for two major versions.
And in Git SVN, we had this other option,
and you can't.
Now I'm trying it again.
Maybe, I don't think in 2.34 it will be deprecated,
but maybe in 2.35.
So it's an incredibly long thing,
and it has to be announced there have
to be these messages when you use that option that you get the message like this is now deprecated
please don't use this anymore please use this instead and then for i think the the feeling
is more something like a year you have to have that in there because there are people who don't update all that often right so a year is nothing for them whereas for git is something
like six major versions yeah so so that actually kind of answers the question in the sense of you
first need to make the command like print out all of these warnings you need to wait for a really
long time to see who's
complaining about you can see if if there's a way to like override that give them like an
alternative and finally at some point you may be able to bite the bullet and say okay bye bye this
command is going away this future version right absent telemetry you basically have complaints
driven development right that's a good way to put it so what does
your life look like or like what does your day-to-day job look like now you're hired by
microsoft you're working on git um do you like kind of split your time like half and half between
being a maintainer and building new features is your time like full-time focused on just
maintenance of git like and what do you do when as like the bdfl for
git for windows like what does that mean yeah i don't really like the term bdfl i would like to
to think that i'm neither benevolent nor a dictator but basically i want to work together
with people because for me it's all about communication and that's why I like it so
much because it's about good communication anyways but back to your question what does my day-to-day
life look like so most of the time I do spend reading emails and drinking coffee or or a black
tea it's it's basically I have to keep track of so many things.
You also earlier asked me how I get aware of bug reports.
I did monitor Twitter for a while because for a while,
people just crammed their bug reports into 140 characters.
That's, of course, not really enough to put enough information in it. but sometimes it was actually critical enough
that I then tried to contact the person
and get more information out of them.
Stack Overflow also used to be a big source of bug reports,
but these days I think most people actually got the memo
that the quickest way to get help
in the Git for Windows project, especially since 2015,
since I am employed to maintain the software, is to go to the Git for Windows bug tracker
on GitHub and report the bug.
There's a bug reporting template that helps you fill out hopefully all the relevant information,
and if not, then there will be a back and forth.
In the meantime, it's not only me, there are also another few volunteers who also tend to the bug tracker. So it's becoming less of a hiccup
procedure and more of a smooth ride. I have a problem here, I get help, done, excellent.
So this is what I spend a large part of my time on. I do develop some new features from time to time.
Right now I'm super excited about upstreaming Scalar. You recently talked to Stoli, who basically
started that project, and I had this clever idea, how hard can it be to port this to C and
contribute it to the Git project? And now it's already the fifth iteration of the patch series,
and I hope that this will get into Git.
And from there, I will probably spend some real quality time
on taking those features and figuring out
how core Git can actually benefit from that
so that you don't have to ask scalar clone
to basically have all this opinionated setup
of your huge repository that you want to clone,
but that you can ask git clone with a special option
to do exactly the same.
That's the dream of the team,
that we basically get rid of maintaining Scalar
and that at the same time,
all the Git users benefit from our hard work.
That is the ultimate goal there.
So I can't really say it's a 50-50 split
because there are times when I spend
basically all the time
re-implementing something in C.
And there are other times
when I just scour the Git mailing list,
the Git for Windows bug tracker,
or sometimes Slack overflow
to see is there anything relevant
that I should take care of
as Git for Windows maintainer.
Okay.
And can you tell us a little bit
about GitKit Gadget?
Yeah.
Yeah, GitKit Gadget. Yeah, GitGit Gadget.
It's probably not really a big secret
that I find a mailing list really challenging to work with.
Because, yeah, it is a free medium.
But if I'm contributing something,
I have to work so hard to know
what is the current state of my contribution.
So where did it make it in the upstream code review? Where did it make it in the integration
branches? Is anything left for me to do? Or is there just a tangent that people talked about
and that I basically can't ignore because it is not relevant for my contribution so what I want what I wanted to do with GitGitGadget was basically to have a way
anyway to deal with the Git project that relieves me of so much mental
yeah mental work extra work like walking walking some somewhere when I could take the bike.
So my approach there originally was a shell script called MailPatchSeries.
It's still available on GitHub in my personal org.
But eventually I figured out that pull requests are, despite the many flaws that you see in
them, they are basically the easiest way to contribute code.
Could we improve them?
Yes.
No question about that.
Is it better to contribute via mailing list?
Hands down, no.
It's just horrible to have everything mixed with everybody talking about everything.
It's like if you're at a bazaar and at some stage your ears are ringing and you just want
to go on vacation far away where there's nobody who talks to you, right?
So I tried to have that with GitGitGadget. I really had the idea for more than a year
and tried to interest somebody else to do it. And then eventually posed myself that famous question,
how hard could it be? Right. And it took half a year to get into really decent state. But at the
same time, it was also a perfect excuse to learn TypeScript.
So GitGitGadget is a system that connects GitHub pull requests with contributions on
a Git mailing list.
The idea is that you open your pull requests on the git repository in GitHub and then you
are talked through the next steps that you need to do in order to have this land on the
Git mailing list.
The reviews on the Git mailing list are even copied back into the PR as comments.
So you get notified by your GitHub notifications that there's something that you probably want to take care of,
or at least want to know about.
These days, it's not only review comments,
it's also when it has been integrated into a branch,
and also when there has been a status update.
Because the status updates, they happen centralized
for all the topics, not in your mail thread,
so that you know, hey, this is headed for that and that branch.
Or no, there has been something
a discussion and a so-called re-roll which is the next iteration is expected to land
so the ball is in your court so to say um and it's unfortunately a bit complex the system because I need an Azure pipeline.
I need an Azure function,
so that is basically a really small serverless
piece of code that can be accessed via a URL.
And of course, there is the GitGitGadget mail address.
So I registered a Google mail address for that
so that it can be sent.
So this, and then maintaining it, of course,
it's a TypeScript project.
So basically, I get updates.
I get pull requests.
Well, I shouldn't say I. We, because another contributor
now thankfully takes care of that, to update
all the component versions.
Like if the TypeScript definitions for Node are updated, or if there's a new Jest version,
which is the thing that we use to do automated tests.
So it's a little bit of a project on its own, and it's so complex that recently when somebody asked me,
could we take a gadget
and adapt it to this subsystem
of the Linux kernel
so that we can also do the same?
I said, oh gosh, okay, okay.
Let me take an hour here.
Actually, it was to write up
what it would take to adapt this.
And unfortunately, it's really hard.
It's very, very focused on git
and the processes on the git mailing list and the git mailing list uh yeah unfortunately i don't see
much of a chance without spending two months to really revamp the code base to make things
to modularize it even more to allow for that. And then still what you need to set up
is a Google Maya address, an Azure pipeline,
an Azure function, and maintain all those
that they work together.
A little bit of a challenge.
But I think this speaks to the overall theme, right?
It's like you want to make open source contributions
to Git easier, and that's kind of what GitGitGadget is.
And even though it's work to build,
it's immensely useful.
I've used it and I think it's one of those things
that makes your life easier.
It runs tests for you.
All of that stuff that you expect
kind of in an open source repository today,
it kind of gives you that.
And it is a little, it's not wrong to expect that git
generally works through a mailing list because there's just this weird circular dependency right
like depending on github to make sure you have new versions of git is just strange ultimately
um so actually i don't really think so.
So if you would run Git in place, which I did for quite a while, then it's very fragile.
Then all of a sudden you can't run Git anymore and you can't go back to the previous version.
But if you use Git, you build Git, and then you install it, so double quote, install it into your home directory.
And then you run it from there.
Then that Git works, and you know it works,
and you can develop Git.
And then if you break something,
you still have that working Git
that you built from a previous revision.
And in that respect, I would say it's pretty much the same.
And also because Git is distributed,
should GitHub be down at some stage,
you can always go to other options, right?
So I don't really see the overall lines here.
It's quite beautiful, the system that we've set up.
The only thing that you cannot really rely on
is the Git mailing list.
As a colleague of mine recently realized,
hey, I sent a mail to the Git mailing list and it colleague of mine recently realized hey i sent a mail to
the git mailing list and it didn't arrive why it bounced it was an outlook.com mail address so
it's open to everybody except for you outlook.com users right and oh by the way oh by the way if
you want to send this via g that automatically has an HTML part,
it will also not come through.
So this pretense that this is an open system
when it's just not, not a good thing.
Yeah, the whole requirement for having plain text emails
is very, very challenging.
Like it's so easy to copy paste something and not realize, oh, it's no longer plain text.
Because in most email clients, you don't even see that anymore.
I didn't even know how to send a plain text email until I had to send something to get and I had like a bounced email.
Yeah, that's a good point.
And just in general, like, how do you foster like an inclusive and
product like open source community right there's clearly a lot of people who want to contribute
to git and now it's your job to make sure that you know people are nice to each other in a sense
and everyone can contribute easily and well and git GitGitGadget is part of that. What have you learned over the years?
What have I learned?
Yeah, it's really, it's a challenging subject
because nobody wants to change.
Really, nobody wants to change,
especially those who benefit.
Those with privilege are the last to acknowledge
that they actually benefit from privilege.
They say, no, no, it's meritocracy, right?
And I was actually surprised when I looked up the word and where it came from.
Meritocracy was really introduced as a racist tool to keep non-white males out of a certain circle.
So it was specifically designed and these days people still use it as if it was anything
positive.
If you want to contribute to Git, you have to have a huge amount of privilege.
You have to have the free amount of time to contribute to an open source project.
I would say that 80% of the population of this
lovely planet do not have that time. They don't have that luxury. So the rest has a privilege.
You do have to have a computer. Granted, there are more and more people who have access to a
computer, but you also have to have some sort of training in programming. Who has that? I had the privilege that my brother taught me
how to program when I was nine.
And that's
just an incredible
strike of luck. So that's
privilege. And I
learned that most people who
enjoy the same privilege as I do
are fully unaware how
much privilege they have and enjoy
and take for granted.
So one thing when even I saw how I take this for granted
was when I worked with a Google Summer of Code student,
and they went silent for a couple days.
And I thought, that's not right,
because we wanted to work on a certain thing and chat back and forth. And then eventually I heard something back and there was a power outage for three days. They didn't have power. I take for granted that where I live, power outages are basically a micro second thing. I don't realize when power is out. But in other countries,
it's normal. My student, now friend, told me, yeah, that happens. There is a season for it
because of weather conditions. And so if you have that, all of a sudden, it's not so easy to
contribute. And I did not even think about the challenges there.
I took it for granted that this person has a computer, has internet and now has to come online
at that point because I want to talk to them, right? So the realization of that was quite hard to acknowledge that I myself have tons to learn still. And I would
like to believe that I'm more aware than most on the Git mailing list of my approach.
So this is something. Another thing that I realized over the course of the years is
how much diversity brings. I mean, there are these obvious examples from back in my old job in science,
where I did image processing. To this day, it is hard. It is basically the diagnoses of liver
cancer and non-white people or non-male people even is more flaky than in white male people. It's so biased, right?
So these are the obvious things
where you would benefit from a more diverse developership,
where there are developers, contributors who say,
you know what, I'm doing more of this.
How can we do it?
How can we change the software that it supports that better?
And at some stage, you also realize that when you work, for example, with people who have only one hand,
and so you want to optimize your usage to be able to type with one hand,
all of a sudden, you are in the same situation because either one of your arms is broken
because you weren't skiing because of your privilege, or even more likely you have offspring
and you have to nurture them, right?
So one arm is occupied.
There's a sleeping baby there.
And so you only have one hand.
And then all of a sudden, you're really, really happy about that super annoying thing about the sticky keys,
where you can press shift for a long time and then you can still do what you want to do with one hand.
So diversity is something that's really useful and it comes back to you as a benefit.
And that's why I'm really, really trying my best to foster what I can in diversity inclusion. And unfortunately,
there are people who take this as an attack on themselves. There was an episode last year where
I really had to go offline for three days, essentially, just to recover from a super
toxic, nasty Git for Windows ticket where they're just people wanted to harm other people
by writing the nastiest things.
And I don't know why they waste their time to inflict harm.
Why would you do that?
I mean, what's the use for you?
You don't get anything out of it.
You could instead spend that time
to do something that's collaborative,
that benefits people, including yourself, right?
So that's where I come from with diversity and inclusion.
Thank you for sharing that.
A simple example of diversity that keeps coming to my mind is, you know,
when YouTube was not designed for people with left hands,
like left-handed people rather,
their videos would always upload upside down
because they didn't think of people
could be holding phones the other way.
And clearly like when you have people
who are slightly different from you,
that's when you can design software
because a lot of different kinds of people
are going to be using the software
that you are writing, right?
Yeah. But it's also
unfortunate that there's a lot of like toxicity and like just open source uh because pretty much
anybody can and say will say anything so yeah um thank you for all of your work on clearly like
really important projects like git which pretty much I think, the most commonly used tool.
It is, yeah, it's pretty rewarding, I have to say,
because as a maintainer, I do see what's coming out of it
and how many people used it.
So last week when I was at a scientific conference,
I asked the audience, how many of you have used Git? And I think there were
five whose hands didn't go up. And of course, those five were professors who don't have the
time anymore to play with computers. They have to secure funding. And so it was really gratifying
to see. And then I also asked who used the rebase and I had to immediately apologize
for the interactive rebase
because that's my fault.
So, but it's really,
it's nice to see that what you do
has an impact and a positive one.
Yeah.
And maybe just as like a wrap up question,
like what's next for Git?
Like what are you,
so you're excited about Scalar,
as you said,
adding that to Git Core.
In a sense, are there any just general themes
or specific projects that you're excited about?
So right now, I'm really excited about that scaling idea
where you can basically maintain bigger and bigger repositories
or access them in more hybrid ways.
Like we are going away a little bit
from this distributedness of Git
because with partial clone,
you definitely need to have access to a server
that has all the objects.
And if they go away, basically you have a problem.
But I really think that having the ability to work in a monorepo, which for companies,
from my perspective, makes a lot of sense, because then you have your code in one place
and you can take one bit and move it to another part and still use it from both sides.
This is something I find exciting.
What I do not quite see right now, but what I would be really excited about is
more research and more effort and more love
toward the user interface.
Because let's face it,
everybody who starts to use Git struggles. And it's not the fault of
the people who struggle. It's the fault of the software, right? It's just like this famous story
from the first Xerox machines, where they have tons of options, right? And then the thing that
made it so much more usable, eminently more usable, was the green button, where you can make that one copy
that you most frequently need to do.
So I would like to figure out
how to attract people who are into user interface design,
and not so much the graphical one,
but more the processes.
How can I take processes
that people have to deal with all day long
and then find the common threads
and then emulgate that
into a user interface on the command line
that makes sense?
Because, yeah, I mean,
we recently had this attempt
where Git checkout that does two things that are completely separate from each other.
One is switching branches and the other one is taking revisions of certain files from a different branch.
Right. Both was Git checkout.
Now somebody implemented Git restore, which is the get it from somewhere else.
And the Git switch would switch branches.
But I'm still not happy with the user interface.
The name now, at least, is intuitive,
but the command line options,
still, I mistype so many times the options to git switch
because they don't make sense.
So if somebody with a usability background
would come in and say, look, we need to do this.
I would have a couple of ideas how we could implement this in a backwards compatible way via config options,
where we just leave the things alone for scripts and for users interactively.
We just have certain things that are changed so that things flow.
The workflow models how the user interface looks,
not the other way around.
Yeah. I think the Git switch is like a perfect example,
like to create a new branch and Git switch is like the dash C,
but then creating a new branch for checkout is like dash B.
And that always trips me up. Yeah.
But I think it's also good that there is some progress
i was so happy to see that there are two new commands that you that that get has added
specifically to improve user interface right so just the fact that that got done was like
pretty exciting to me yeah it's a beginning but now we need experts not software developers
yeah beginners i think beginners is what we need.
People running into Git for the first time
and they're like, wait, this is very confusing.
This should be something else.
Yeah, I thought so at some stage,
but see, I have some musical training
and there are people who play wonderful music
and they are the worst teachers
because they don't know how they do it.
So the
same thing can be said about beginners. Beginners have no
idea how the user interface should look like to make it easy for them. You need
an expert who assists beginners or looks over their shoulders. What do they do? How
do they do it? What would be the thing? So what would be the minimal finger
movement for the beginner to actually get what they need right now with minimal effort?
And then I think it takes really experts in usability who know just how to design things to make it flow easier, more smoothly.
I'm really excited about this idea, though. If there's any listeners who are interested
in contributing to Git, I will leave Johannes' email in the show notes. And you can always,
I'm sure you can find him on the github.com slash git for Windows slash git repository as well. And
thank you, Johannes, for being a guest. I think this was great. Thank you so much for having me. you