Software at Scale - Software at Scale 35 - Maintaining Git with Johannes Schindelin

Episode Date: October 20, 2021

Johannes Schindelin is the maintainer (BDFL) of Git for Windows.Apple Podcasts | Spotify | Google PodcastsGit is a fundamental piece of the software community, and we get to learn the history and ...inner workings of the project in this episode. Maintaining a widely-used open source project involves a ton of expected complexity around handling bug reports, deprecations, and inclusive culture, but also requires management of inter-personal relationships, ease of contribution, and other aspects that are fascinating to learn about.Highlights00:06 - How did Johannes end up as the maintainer of Git for Windows?06:30 - The Git community in the early days. Fun fact: Git used to be called `dircache`08:30 - How many downloads does Git for Windows get today?10:15 - Why does Git for Windows a separate project? Why not make improvements to Git itself?24:00 - How do you deprecate functionality when there are millions of users of your product and you have no telemetry?30:00 - What does being the BDFL of a project mean? What does Johannes day-to-day look like?33:00 - What is GitGitGadget? How does it make contributions easier?41:00 - How do foster an inclusive community of an open-source project?50:00 - What’s next for Git? This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to Software at Scale, a podcast where we discuss the technical stories behind large software applications. I'm your host, Utsav Shah, and thank you for listening. Hey, Johannes, welcome to the Software at Scale podcast. Thank you for being a guest. Thank you for having me, Utsav. So you are the maintainer for Git for Windows, right? So my first question to you is, what got you into that? How did you get interested in being like a maintainer for Git? I wasn't. I stumbled into it. So this is one of those things where I said, how hard could this be? And instead of running away, like saying too hard, too hard, like a normal person,
Starting point is 00:00:45 I just got stuck with it. And I thought that I could basically hand it off to somebody else interested. And that really didn't work out. In the end, it was good for me because I got a nice job at Microsoft because I'm the good for Windows maintainer. But for seven years, I just worked on it off and on in my free time. And basically only when I felt guilty enough for not having taken care of it. But I was actually a maintainer, a software open source software maintainer before. I started in 2000 when I got interested
Starting point is 00:01:15 in that protocol VNC, which is a little bit like remote desktop. And there was this program where you could connect to another computer and see the screen and remote control it. So that was good and fine, dandy, but I wanted to actually have exported something that did not exist as a desktop. So I turned that software into a library where you could implement your own VNC server. And that was what I maintained for, I don't know, I think I only gave up maintainership in something like 2012.
Starting point is 00:01:51 So that was my first maintainership. But that was only after I had spent quite a couple of years just benefiting from open source, which back then was not even called open source. It was basically just free source code in and the university's networks and i enjoyed it and wanted to give something back essentially yeah there was no github or anything like that it wasn't easy to make build requests there were ftp servers of all things uh yeah it was it's much easier these days to get involved in open source than back then. So you were doing a PhD in genetics when you started maintaining Git
Starting point is 00:02:29 or started contributing to Git? Yeah. So the story was that originally I'm a mathematician and I work with prime numbers and I love prime numbers to this day, but I couldn't get anything work-wise done. And at the end of the day, I really like to have a warm meal and a dry bed. So I had to make some money. And so I started a software company writing logistic software, basically sending trucks down the right roads where they could drive.
Starting point is 00:03:09 But I got this incredible chance in 2001 where somebody said, hey, somebody in the university looks for somebody who knows something about computers to do image processing. And I said, image processing, okay. Yeah, and they would give you a PhD for it. Oh, yeah, come in. So I started that. And in 2005, exactly around the time when Git started, when this bitkeeper thing blew up,
Starting point is 00:03:32 which I followed really anxiously every day, and multiple times I checked, are there news? So Git was started, and exactly at that time, my supervisor told me, Johannes, you know, it really would be time to write up a thesis now, right? I had just subscribed to the Git mailing list, and a week after the Git mailing list was formed, I had to unsubscribe and had to write up a thesis. For three months straight, I really had this regimen, this very strict regimen of kicking my butt into writing
Starting point is 00:04:06 paragraph after paragraph and then revising. So it was four days writing, one day revising, and in the end, it was only four days revising and one day writing. In the end, I had it. And the next thing after handing in the first draft to my supervisor was that I immediately resubscribed to the GitMail list. And I think the first contribution I made was only after following for a month or two. And it was something to do
Starting point is 00:04:34 with getting Git compiled on the Irix machine that we had in university that I wanted to use. And that was actually a big driver for me because I wanted to use Git to develop the software for the PhD and then the postdoc. And that's how I got involved in Git. But I did not maintain anything Git for years after that.
Starting point is 00:04:56 So in 2005, I started contributing. And in 2007, I started Git for Windows, maintaining Git for Windows. I started Git for Windows basically at freelancing in an insurance company where I was tasked with writing stuff using Subversion. And Subversion is slow, I think. At least it's slower than Git. And I had to use it on Windows, which makes it double slow, right? But then Git also was super slow because it only ran on SIGWIN. And SIGWIN is this really cool thing
Starting point is 00:05:28 where you don't have to change your code much, but it comes at the price of running relatively slow by emulating all these POSIX functionalities. And I have started porting GITs to pure Windows 32 API. Win32, sorry, it's not 32-bit. It's it's win 32 that's the term even if it's 64-bit but i started porting it and then my gig was up in that company and i didn't really know what to do with those patches it was not really finished so i put it on the git mailing list and i went on happily with my university stuff and other freelancing stuff.
Starting point is 00:06:07 And then that contributor actually came back. The contributor called Johannes, also another Johannes, came back and said, oh, I finished it, by the way. And then somehow they went on vacation and somebody else wanted to have a way to compile it because it was not really clear how to build this from scratch. So I said, okay, I'll give that to you, but you then maintain it. Okay. So they took it and they didn't maintain it. I got stuck with maintaining it. I tried to really hand it off multiple times over the years, but in the end it worked out in 2015, I got hired by Microsoft and it's really been quite an interesting and fun ride. So clearly PhDs are useful so that you can procrastinate on them and do useful work.
Starting point is 00:06:57 Okay, that's a fascinating story. Like how was the Git community when you, like, I guess it had just started, right? As you said, like in 2005, how big was the Git community when you, like, I guess it had just started, right? At least you said like in 2005. How big was the Git community? And like, do you know approximately like how many users of Git were there? Or like, what was it like at that time? So in the very beginning, there was only one user. That was Linus Torvalds.
Starting point is 00:07:18 And he still called it Dircache and not Git. So it started out really really tiny and the first thing that was maintained with git was git and then i mean people got really interested because they're just like i back then i was a linux fan fanboy and linux fanboys were prone to be interested in this because it was good right it was open source it was good for Linux. You could support it. You could give something back. And so I think the first couple months, I would say there were already something like a dozen contributors,
Starting point is 00:07:57 regular contributors. And a lot of things were really fundamental, like the pack files. There were no pack files in the initial version of Git. It was all individual files. Every blob and every tree was an individual file. The performance was horrible if you put something like a thousand files into that system. But it wasn't a thousand files, it was just something like 50. Yeah, it was really tiny. And it took a while, even after Linux officially started using Git,
Starting point is 00:08:40 took a while until it really felt to me like this had taken off. I mean, the real point when I think even industry switched to git was when microsoft announced look we develop windows on top of git because if it's good enough for microsoft it's good for enough for everybody for especially anybody else with a smaller repository on windows yeah so git is open source anybody can download do you have any idea of how many people are using Git on Windows? Do you have any download numbers or any metrics at all? Or is it easy to get any of those numbers? I do have download numbers, but those download numbers have to be taken with a grain of salt
Starting point is 00:09:18 because obviously there are downstream users of Git for Windows that I don't have any insight into. But the GitHub releases on which Git for Windows is released publishes download numbers. I don't know statistics over time. I would really like to see that, but I have the impression that something like a quarter million per week downloads happen.
Starting point is 00:09:50 And that's pretty consistent of the versions. Even if they're like Visual Studio packages its own min-git. So that's iBundle min-git and they package it. And then obviously they don't download it from GitHub releases. So I don't know how often they are used and also downloads are not uses users right there are many software packages that are downloaded tried once and then nobody uses them anymore but with git I have the impression that with every new version there's consistently a quarter million per week downloads so probably roughly some number to orient yourself yeah so maybe you can you can share with us like why does git for windows need like a separate maintainer for than regular git as a layman i'm like oh it's git it
Starting point is 00:10:41 just should just work right what is all of the complexity and maybe you can just share some of those details right so the first thing is um git was born in the linux ecosystem right and every linux user worth their salt will say i hate microsoft right i hate windows i will never touch it and you know? I was exactly in that same camp. Fun fact, until I started work at Microsoft, I maintained Git for Windows, not owning any machine that ran Windows. I always had a VM, usually in university with a campus license,
Starting point is 00:11:23 which is probably not the complete correct... I mean, Microsoft will probably not do anything about it now that I'm one of their own, but that's how much I despised working on the Windows, that I wouldn't... The first thing I did with a laptop after buying it was deleting the Windows partition and putting in Linux on it. So this is...
Starting point is 00:11:41 I would say if you frequently Git mailing this, that's basically the sentiment that you will still feel there. People think about free software. It should all be free. And maybe sometimes not realizing that free software is really only free if your time isn't worth anything. Because time is really essential if you want to get all this running. Now, the Git maintainer really doesn't want to have much to do with Windows.
Starting point is 00:12:09 Not necessarily because they hate Windows, but I think because it's too tricky to get things working on Windows. Because in the Linux world, basically what you do is you rely on POSIX standards. You look in that long list of available functions, Basically, what you do is you rely on POSIX standards. You look in that long list of available functions, syscalls and commands on the shell, and then you just make sure that you don't use anything that only GNU flavors speak, but that all the flavors speak, and then you're set.
Starting point is 00:12:43 But those syscalls and those tools, they don't exist on Windows. That's the real big thing. For example, if you want to spawn a new process from your process on Linux and Unix, what you do is you first call the fork syscall, which makes a complete copy of everything, of the memory that you have, of all the file handles you have open, of all the sockets that you have open, of all the file mappings you have open, it makes them transparent. So it's copy and write.
Starting point is 00:13:14 It's not really a copy unless you change it in one of the processes. But then after that fork, you immediately call it an exec syscall, which replaces the entire thing with the executable that you wanted to execute. And so the first step, the fork, if you want to do it on Windows, there is no equivalent. You have to do it all by hand. And because the fork call does not know that there will be an exec call, it has to do all that laboriously. So that's what SIGWIN does, right?
Starting point is 00:13:48 And if you want to avoid that, you have to go into Git source code. And what we should have done but didn't do, and now the ship has sailed, is that we should have imitated what Subversion did, an abstraction layer. Like, what do we want to do? Do we want to spawn in that process? And we don't call the syscall directly. We call a helper function. And this helper function can be implemented
Starting point is 00:14:13 on Windows, on Linux. It can be implemented on iOS, whatever you want, right? And that's what we should have done, but we didn't. So it's basically my fault when i started with a windows port i tried to do the minimal minimal thing where i tried to re-implement syscalls where possible and where not possible as in the case of fork and spawn i would go to the function that called fork and spawn and would then have an if def. So a large part that's window specific and a large part that's not window specific, but
Starting point is 00:14:49 more POSIX. And this is something you basically have to have a machine in order to test it. And you have to have a willingness to sacrifice loads of nights in order to make this work. And the Git maintainer probably is not willing to do that. So June Hamano is really happy to leave all that to somebody else, anybody else, as long as they don't have to change anything. And of course, we try to be very gentle with the source code when we go to Windows specific things. We try very, very much not to change the source code of Git too much, even if it comes at the price of some other things.
Starting point is 00:15:37 So is Git for Windows basically like a fork of Git that is rebased often enough that it doesn't have that many changes? That's exactly how I try to do it. We do develop patches over the course of... So between releases, we develop new patches. We accept new pull requests, new contributions. So we accumulated something like 70 topic branches on top of Git. And those are rebased with every release. So
Starting point is 00:16:09 next time, 3.34 will come out soon in November. And with release candidates, I start the process of rebasing. And then we can test it. We also do pre-releases.
Starting point is 00:16:32 And if anybody's out there who wants to help with a project and doesn't really know how to develop C, testing is a huge help. If you just test it with all your non-critical things and then when there's a bug that you encounter, just give a really good bug report. That really is helpful. So that's how we maintain it. Yes, it's a bug that you encounter, just give a really good bug report. That really is helpful. That's how we maintain it. Yes, it's a friendly fork. It's rebased with every new release. But between, we accumulate new stuff. And then, of course, we try to upstream the things that proved robust enough. As a user, or just in general as a developer like you don't hear of people running into git bugs that often you hear of people running into like you know all sorts of
Starting point is 00:17:11 like javascript npm node bugs but you don't hear that much like people running into git bugs but how do you hear about them do you just hear from like the mailing list and what would you say the nature is of git bugs because like it seems like the standard commands are mostly stable so are they mostly on the periphery with like all of the new commands that come out like the commit graph and stuff or is it that there's so many different variations of git and there's so many different like versions of os's and stuff you have to support that there's always bugs that come out on different OSs for some reason. What is the kind of
Starting point is 00:17:47 bugs that you'll see? That's a really interesting question because there is a forest of bugs and they all look different. I would actually contest that there are not very many operating systems that we support. It used to be
Starting point is 00:18:04 when I started with open source, it used to be a really diverse landscape. There was HPUX, there was IRIX, there was OSF1, all those different Unix flavors with their unique strengths and weaknesses, and they basically are no longer relevant. Today, when we talk about Git bugs, I would say something like 98% of the Git bugs that we encounter are reported for Windows, for Mac OS, or for Linux. And Linux only in the versions that target Intel processors or AMD. So the x86 underscore 64 architecture. So that's not very diverse, right?
Starting point is 00:18:46 The bugs that we encounter there are often regressions, where, I mean, with every fast-moving project, you do have to expect that there are bugs. Our test suite is large, but not completely comprehensive, and it's also a little bit heterogeneous. We don't have this thing where we have objects and then unit tests for those objects and then integration tests.
Starting point is 00:19:12 It's basically the contributors decide what they want to test, and some are really overzealous. One contributor, one very frequent contributor of the GitHub project implemented a test that tests all the axes of a huge matrix and it runs, I think it takes 11 minutes on Windows to run it.
Starting point is 00:19:33 And they don't care because it's Windows. It's much faster in Linux. And then they say, so that's your problem, which, yeah, I mean, in open source, we should collaborate a little bit more, but you can't say much about a very prolific contributor, right? So the regression tests are not comprehensive enough to catch all those. We did have a problem, for example, when we introduced protocol v2,
Starting point is 00:19:59 which is a really, really big step up from protocol v1 where you can basically implement more capabilities and have more flexible back and forth with the server. This enabled us to do partial clone, for example. But there was a bug, and the bug, there was a performance bug. It was rather bad. So we rolled it back for one major version and then really banged out the hell out of this in order to fix this bug and then switch back on protocol B2. So that's something like those are the normal bugs that we hear about. Sometimes come the really tedious ones. Well, tedious for those people who have to deal with them, like me, the security vulnerabilities. And especially on Windows, there are so many ways
Starting point is 00:20:52 you can exploit things because of the peculiarities. The file names are case insensitive. But not only that, there are multiple ways to refer to the same file. There are the 8.3 names that you may remember from DoS times. And they're turned off on the D drive and E drive if you have them. But on the C drive, which is where most people work, it's still turned on by default. And for backwards compatibility, it cannot be turned off. And there were so many enterprising ways how you could try to get to the git directory
Starting point is 00:21:35 and write into it without saying.git/. And of course, if you can say that in a different way, then you might be able to trick git into tracking that file and overwrite, for example, the hooks. And then during the clone, you can execute the code that you deliver without the user having a chance to inspect it. So those are the, I would say those are the most stressful bugs. The other bugs are not so stressful. They are basically you fix it and then you give people, the reporter, a temporary intermediate version that fixed it.
Starting point is 00:22:10 And then in the next native version, you have a full fix. So who are the security researchers doing this research on Git? Is it just people on the Internet? You don't even know who they are? Or is there like a community of security researchers who look into Git after like a few major releases? So there's no dedicated team taking care of Git. It's more, they're security researchers who basically band together, share knowledge,
Starting point is 00:22:37 and then try to bang on one project at a time. And then figure out whether it's broken in some enterprising ways. So you probably heard of Google Zero. And I don't think that they really looked at Git in detail yet. But a couple of years ago, Microsoft security research looked into it. And as it happens, I was on vacation. And boy, was that stressful coming back.
Starting point is 00:23:01 That was not fun. It was good in the end to have it all fixed, but we had nine CPEs. Nine. That's horrible. And I'm incredibly thankful for the researchers who are not
Starting point is 00:23:18 really a community, but they basically know each other. But they are also bug bounty hunters. Like with GitHub's bug bounty program, there are quite a few people who report bugs there and then get their bug bounty and make a good, decent living.
Starting point is 00:23:34 But you have to have the talent to figure out those bugs. I wouldn't be able. I can't fix them, but I can't figure them out. Yeah, I still remember as kids, we tried to name directories the word like con and like windows would just not let you do that i think it seems like those bugs are related to all of these like backwards compatibility things yes or aux.c that's the famous thing
Starting point is 00:23:58 because in the linux kernel there are two files named aux.c, and you can't clone a Linux kernel repository on Windows because of that. Maybe you can share with us one thing. I was reading through recent commits on Git for Windows. I noticed that you said that there might be some users of this certain command. We don't know for sure. How do you find out that there are people using a deprecated command? How do you deprecate something? How do you decide there's not enough users using this?
Starting point is 00:24:27 We can basically start phasing it out. When you don't have any telemetry from users, how do you make decisions that can reduce tech debt or reduce baggage from the past? Right. Without telemetry, it's extremely hard. In another project I maintained, at some stage we tried to have an opt-in telemetry, it's extremely hard. And in another project I maintained, at some stage, we tried to have an opt-in telemetry. And oh my, the backlash was so much. Nobody was willing to give me
Starting point is 00:24:57 something, even when I said, let's only do the telemetry that proves to funding agencies that my software should be funded. They were happy to use it, but not to do anything to help me make money doing it. So telemetry is really a tricky, tricky thing. Most users just go from their own egotistical point of view and don't really look at the benefits, especially when you try to make it opt-in as opposed to opt-out. Opt-out, I can understand. But we do have telemetry,
Starting point is 00:25:32 just not of all users. In my team, we support the Windows developers to a certain extent. So they basically started with the VFS for Git fork. And a lot of things they do themselves, but basically with the Git version, we still have to maintain. And for those, we do have telemetry
Starting point is 00:25:54 because they're in-house users. There is no problem with a GDPR because this is very confined. They basically, what they do is intellectual property of the company anyways. And so there we can actually look at things, how they are used. But we basically use that data, that evidence, mostly to figure out where to focus our efforts, not where to deprecate things. So the deprecation is a really interesting thing.
Starting point is 00:26:28 In the Git project, I think so basically this is not the Git for Windows maintainer's decision because I inherit the deprecations from the Git maintainer. And if I want to deprecate something, I go through Git. I don't want to go alone. I want to go
Starting point is 00:26:43 far. So I do it together. For some things, it's really unclear. So we don't duplicate them. That's very simple. But for other things, it's clear that we have to duplicate them at some stage. My favorite example there is the preserve merges option of Git rebase. That was, it's my fault. So it's definitely, I'm the guilty party here. I introduced this and I thought it was a cool design and it worked, but the limits of the design became very, very obvious very quickly
Starting point is 00:27:19 in that you can't even reorder commits if you use that option. So I tried to implement it in a better way, design it in a better way, in a backwards incompatible way. So it's a new option now. It's called rebase-merges because everything about it is new. And it's also much faster because it is not implemented as a shell script. It's implemented in pure C.
Starting point is 00:27:48 So the preserve merge now needs to go, right? And I tried to deprecate it, I think, one and a half years ago. And there was an immediate outcry. No, you can't do that. It was only the rebase merges is only stable for two major versions. And in Git SVN, we had this other option, and you can't. Now I'm trying it again.
Starting point is 00:28:15 Maybe, I don't think in 2.34 it will be deprecated, but maybe in 2.35. So it's an incredibly long thing, and it has to be announced there have to be these messages when you use that option that you get the message like this is now deprecated please don't use this anymore please use this instead and then for i think the the feeling is more something like a year you have to have that in there because there are people who don't update all that often right so a year is nothing for them whereas for git is something like six major versions yeah so so that actually kind of answers the question in the sense of you
Starting point is 00:28:58 first need to make the command like print out all of these warnings you need to wait for a really long time to see who's complaining about you can see if if there's a way to like override that give them like an alternative and finally at some point you may be able to bite the bullet and say okay bye bye this command is going away this future version right absent telemetry you basically have complaints driven development right that's a good way to put it so what does your life look like or like what does your day-to-day job look like now you're hired by microsoft you're working on git um do you like kind of split your time like half and half between
Starting point is 00:29:37 being a maintainer and building new features is your time like full-time focused on just maintenance of git like and what do you do when as like the bdfl for git for windows like what does that mean yeah i don't really like the term bdfl i would like to to think that i'm neither benevolent nor a dictator but basically i want to work together with people because for me it's all about communication and that's why I like it so much because it's about good communication anyways but back to your question what does my day-to-day life look like so most of the time I do spend reading emails and drinking coffee or or a black tea it's it's basically I have to keep track of so many things.
Starting point is 00:30:29 You also earlier asked me how I get aware of bug reports. I did monitor Twitter for a while because for a while, people just crammed their bug reports into 140 characters. That's, of course, not really enough to put enough information in it. but sometimes it was actually critical enough that I then tried to contact the person and get more information out of them. Stack Overflow also used to be a big source of bug reports, but these days I think most people actually got the memo
Starting point is 00:30:59 that the quickest way to get help in the Git for Windows project, especially since 2015, since I am employed to maintain the software, is to go to the Git for Windows bug tracker on GitHub and report the bug. There's a bug reporting template that helps you fill out hopefully all the relevant information, and if not, then there will be a back and forth. In the meantime, it's not only me, there are also another few volunteers who also tend to the bug tracker. So it's becoming less of a hiccup procedure and more of a smooth ride. I have a problem here, I get help, done, excellent.
Starting point is 00:31:38 So this is what I spend a large part of my time on. I do develop some new features from time to time. Right now I'm super excited about upstreaming Scalar. You recently talked to Stoli, who basically started that project, and I had this clever idea, how hard can it be to port this to C and contribute it to the Git project? And now it's already the fifth iteration of the patch series, and I hope that this will get into Git. And from there, I will probably spend some real quality time on taking those features and figuring out how core Git can actually benefit from that
Starting point is 00:32:26 so that you don't have to ask scalar clone to basically have all this opinionated setup of your huge repository that you want to clone, but that you can ask git clone with a special option to do exactly the same. That's the dream of the team, that we basically get rid of maintaining Scalar and that at the same time,
Starting point is 00:32:49 all the Git users benefit from our hard work. That is the ultimate goal there. So I can't really say it's a 50-50 split because there are times when I spend basically all the time re-implementing something in C. And there are other times when I just scour the Git mailing list,
Starting point is 00:33:08 the Git for Windows bug tracker, or sometimes Slack overflow to see is there anything relevant that I should take care of as Git for Windows maintainer. Okay. And can you tell us a little bit about GitKit Gadget?
Starting point is 00:33:22 Yeah. Yeah, GitKit Gadget. Yeah, GitGit Gadget. It's probably not really a big secret that I find a mailing list really challenging to work with. Because, yeah, it is a free medium. But if I'm contributing something, I have to work so hard to know what is the current state of my contribution.
Starting point is 00:33:46 So where did it make it in the upstream code review? Where did it make it in the integration branches? Is anything left for me to do? Or is there just a tangent that people talked about and that I basically can't ignore because it is not relevant for my contribution so what I want what I wanted to do with GitGitGadget was basically to have a way anyway to deal with the Git project that relieves me of so much mental yeah mental work extra work like walking walking some somewhere when I could take the bike. So my approach there originally was a shell script called MailPatchSeries. It's still available on GitHub in my personal org. But eventually I figured out that pull requests are, despite the many flaws that you see in
Starting point is 00:34:48 them, they are basically the easiest way to contribute code. Could we improve them? Yes. No question about that. Is it better to contribute via mailing list? Hands down, no. It's just horrible to have everything mixed with everybody talking about everything. It's like if you're at a bazaar and at some stage your ears are ringing and you just want
Starting point is 00:35:15 to go on vacation far away where there's nobody who talks to you, right? So I tried to have that with GitGitGadget. I really had the idea for more than a year and tried to interest somebody else to do it. And then eventually posed myself that famous question, how hard could it be? Right. And it took half a year to get into really decent state. But at the same time, it was also a perfect excuse to learn TypeScript. So GitGitGadget is a system that connects GitHub pull requests with contributions on a Git mailing list. The idea is that you open your pull requests on the git repository in GitHub and then you
Starting point is 00:35:59 are talked through the next steps that you need to do in order to have this land on the Git mailing list. The reviews on the Git mailing list are even copied back into the PR as comments. So you get notified by your GitHub notifications that there's something that you probably want to take care of, or at least want to know about. These days, it's not only review comments, it's also when it has been integrated into a branch, and also when there has been a status update.
Starting point is 00:36:32 Because the status updates, they happen centralized for all the topics, not in your mail thread, so that you know, hey, this is headed for that and that branch. Or no, there has been something a discussion and a so-called re-roll which is the next iteration is expected to land so the ball is in your court so to say um and it's unfortunately a bit complex the system because I need an Azure pipeline. I need an Azure function, so that is basically a really small serverless
Starting point is 00:37:12 piece of code that can be accessed via a URL. And of course, there is the GitGitGadget mail address. So I registered a Google mail address for that so that it can be sent. So this, and then maintaining it, of course, it's a TypeScript project. So basically, I get updates. I get pull requests.
Starting point is 00:37:39 Well, I shouldn't say I. We, because another contributor now thankfully takes care of that, to update all the component versions. Like if the TypeScript definitions for Node are updated, or if there's a new Jest version, which is the thing that we use to do automated tests. So it's a little bit of a project on its own, and it's so complex that recently when somebody asked me, could we take a gadget and adapt it to this subsystem
Starting point is 00:38:09 of the Linux kernel so that we can also do the same? I said, oh gosh, okay, okay. Let me take an hour here. Actually, it was to write up what it would take to adapt this. And unfortunately, it's really hard. It's very, very focused on git
Starting point is 00:38:26 and the processes on the git mailing list and the git mailing list uh yeah unfortunately i don't see much of a chance without spending two months to really revamp the code base to make things to modularize it even more to allow for that. And then still what you need to set up is a Google Maya address, an Azure pipeline, an Azure function, and maintain all those that they work together. A little bit of a challenge. But I think this speaks to the overall theme, right?
Starting point is 00:38:59 It's like you want to make open source contributions to Git easier, and that's kind of what GitGitGadget is. And even though it's work to build, it's immensely useful. I've used it and I think it's one of those things that makes your life easier. It runs tests for you. All of that stuff that you expect
Starting point is 00:39:18 kind of in an open source repository today, it kind of gives you that. And it is a little, it's not wrong to expect that git generally works through a mailing list because there's just this weird circular dependency right like depending on github to make sure you have new versions of git is just strange ultimately um so actually i don't really think so. So if you would run Git in place, which I did for quite a while, then it's very fragile. Then all of a sudden you can't run Git anymore and you can't go back to the previous version.
Starting point is 00:39:57 But if you use Git, you build Git, and then you install it, so double quote, install it into your home directory. And then you run it from there. Then that Git works, and you know it works, and you can develop Git. And then if you break something, you still have that working Git that you built from a previous revision. And in that respect, I would say it's pretty much the same.
Starting point is 00:40:22 And also because Git is distributed, should GitHub be down at some stage, you can always go to other options, right? So I don't really see the overall lines here. It's quite beautiful, the system that we've set up. The only thing that you cannot really rely on is the Git mailing list. As a colleague of mine recently realized,
Starting point is 00:40:44 hey, I sent a mail to the Git mailing list and it colleague of mine recently realized hey i sent a mail to the git mailing list and it didn't arrive why it bounced it was an outlook.com mail address so it's open to everybody except for you outlook.com users right and oh by the way oh by the way if you want to send this via g that automatically has an HTML part, it will also not come through. So this pretense that this is an open system when it's just not, not a good thing. Yeah, the whole requirement for having plain text emails
Starting point is 00:41:23 is very, very challenging. Like it's so easy to copy paste something and not realize, oh, it's no longer plain text. Because in most email clients, you don't even see that anymore. I didn't even know how to send a plain text email until I had to send something to get and I had like a bounced email. Yeah, that's a good point. And just in general, like, how do you foster like an inclusive and product like open source community right there's clearly a lot of people who want to contribute to git and now it's your job to make sure that you know people are nice to each other in a sense
Starting point is 00:41:59 and everyone can contribute easily and well and git GitGitGadget is part of that. What have you learned over the years? What have I learned? Yeah, it's really, it's a challenging subject because nobody wants to change. Really, nobody wants to change, especially those who benefit. Those with privilege are the last to acknowledge that they actually benefit from privilege.
Starting point is 00:42:27 They say, no, no, it's meritocracy, right? And I was actually surprised when I looked up the word and where it came from. Meritocracy was really introduced as a racist tool to keep non-white males out of a certain circle. So it was specifically designed and these days people still use it as if it was anything positive. If you want to contribute to Git, you have to have a huge amount of privilege. You have to have the free amount of time to contribute to an open source project. I would say that 80% of the population of this
Starting point is 00:43:05 lovely planet do not have that time. They don't have that luxury. So the rest has a privilege. You do have to have a computer. Granted, there are more and more people who have access to a computer, but you also have to have some sort of training in programming. Who has that? I had the privilege that my brother taught me how to program when I was nine. And that's just an incredible strike of luck. So that's privilege. And I
Starting point is 00:43:35 learned that most people who enjoy the same privilege as I do are fully unaware how much privilege they have and enjoy and take for granted. So one thing when even I saw how I take this for granted was when I worked with a Google Summer of Code student, and they went silent for a couple days.
Starting point is 00:44:00 And I thought, that's not right, because we wanted to work on a certain thing and chat back and forth. And then eventually I heard something back and there was a power outage for three days. They didn't have power. I take for granted that where I live, power outages are basically a micro second thing. I don't realize when power is out. But in other countries, it's normal. My student, now friend, told me, yeah, that happens. There is a season for it because of weather conditions. And so if you have that, all of a sudden, it's not so easy to contribute. And I did not even think about the challenges there. I took it for granted that this person has a computer, has internet and now has to come online at that point because I want to talk to them, right? So the realization of that was quite hard to acknowledge that I myself have tons to learn still. And I would like to believe that I'm more aware than most on the Git mailing list of my approach.
Starting point is 00:45:14 So this is something. Another thing that I realized over the course of the years is how much diversity brings. I mean, there are these obvious examples from back in my old job in science, where I did image processing. To this day, it is hard. It is basically the diagnoses of liver cancer and non-white people or non-male people even is more flaky than in white male people. It's so biased, right? So these are the obvious things where you would benefit from a more diverse developership, where there are developers, contributors who say, you know what, I'm doing more of this.
Starting point is 00:45:58 How can we do it? How can we change the software that it supports that better? And at some stage, you also realize that when you work, for example, with people who have only one hand, and so you want to optimize your usage to be able to type with one hand, all of a sudden, you are in the same situation because either one of your arms is broken because you weren't skiing because of your privilege, or even more likely you have offspring and you have to nurture them, right? So one arm is occupied.
Starting point is 00:46:35 There's a sleeping baby there. And so you only have one hand. And then all of a sudden, you're really, really happy about that super annoying thing about the sticky keys, where you can press shift for a long time and then you can still do what you want to do with one hand. So diversity is something that's really useful and it comes back to you as a benefit. And that's why I'm really, really trying my best to foster what I can in diversity inclusion. And unfortunately, there are people who take this as an attack on themselves. There was an episode last year where I really had to go offline for three days, essentially, just to recover from a super
Starting point is 00:47:20 toxic, nasty Git for Windows ticket where they're just people wanted to harm other people by writing the nastiest things. And I don't know why they waste their time to inflict harm. Why would you do that? I mean, what's the use for you? You don't get anything out of it. You could instead spend that time to do something that's collaborative,
Starting point is 00:47:45 that benefits people, including yourself, right? So that's where I come from with diversity and inclusion. Thank you for sharing that. A simple example of diversity that keeps coming to my mind is, you know, when YouTube was not designed for people with left hands, like left-handed people rather, their videos would always upload upside down because they didn't think of people
Starting point is 00:48:09 could be holding phones the other way. And clearly like when you have people who are slightly different from you, that's when you can design software because a lot of different kinds of people are going to be using the software that you are writing, right? Yeah. But it's also
Starting point is 00:48:26 unfortunate that there's a lot of like toxicity and like just open source uh because pretty much anybody can and say will say anything so yeah um thank you for all of your work on clearly like really important projects like git which pretty much I think, the most commonly used tool. It is, yeah, it's pretty rewarding, I have to say, because as a maintainer, I do see what's coming out of it and how many people used it. So last week when I was at a scientific conference, I asked the audience, how many of you have used Git? And I think there were
Starting point is 00:49:06 five whose hands didn't go up. And of course, those five were professors who don't have the time anymore to play with computers. They have to secure funding. And so it was really gratifying to see. And then I also asked who used the rebase and I had to immediately apologize for the interactive rebase because that's my fault. So, but it's really, it's nice to see that what you do has an impact and a positive one.
Starting point is 00:49:39 Yeah. And maybe just as like a wrap up question, like what's next for Git? Like what are you, so you're excited about Scalar, as you said, adding that to Git Core. In a sense, are there any just general themes
Starting point is 00:49:50 or specific projects that you're excited about? So right now, I'm really excited about that scaling idea where you can basically maintain bigger and bigger repositories or access them in more hybrid ways. Like we are going away a little bit from this distributedness of Git because with partial clone, you definitely need to have access to a server
Starting point is 00:50:16 that has all the objects. And if they go away, basically you have a problem. But I really think that having the ability to work in a monorepo, which for companies, from my perspective, makes a lot of sense, because then you have your code in one place and you can take one bit and move it to another part and still use it from both sides. This is something I find exciting. What I do not quite see right now, but what I would be really excited about is more research and more effort and more love
Starting point is 00:50:58 toward the user interface. Because let's face it, everybody who starts to use Git struggles. And it's not the fault of the people who struggle. It's the fault of the software, right? It's just like this famous story from the first Xerox machines, where they have tons of options, right? And then the thing that made it so much more usable, eminently more usable, was the green button, where you can make that one copy that you most frequently need to do. So I would like to figure out
Starting point is 00:51:33 how to attract people who are into user interface design, and not so much the graphical one, but more the processes. How can I take processes that people have to deal with all day long and then find the common threads and then emulgate that into a user interface on the command line
Starting point is 00:51:58 that makes sense? Because, yeah, I mean, we recently had this attempt where Git checkout that does two things that are completely separate from each other. One is switching branches and the other one is taking revisions of certain files from a different branch. Right. Both was Git checkout. Now somebody implemented Git restore, which is the get it from somewhere else. And the Git switch would switch branches.
Starting point is 00:52:25 But I'm still not happy with the user interface. The name now, at least, is intuitive, but the command line options, still, I mistype so many times the options to git switch because they don't make sense. So if somebody with a usability background would come in and say, look, we need to do this. I would have a couple of ideas how we could implement this in a backwards compatible way via config options,
Starting point is 00:52:54 where we just leave the things alone for scripts and for users interactively. We just have certain things that are changed so that things flow. The workflow models how the user interface looks, not the other way around. Yeah. I think the Git switch is like a perfect example, like to create a new branch and Git switch is like the dash C, but then creating a new branch for checkout is like dash B. And that always trips me up. Yeah.
Starting point is 00:53:24 But I think it's also good that there is some progress i was so happy to see that there are two new commands that you that that get has added specifically to improve user interface right so just the fact that that got done was like pretty exciting to me yeah it's a beginning but now we need experts not software developers yeah beginners i think beginners is what we need. People running into Git for the first time and they're like, wait, this is very confusing. This should be something else.
Starting point is 00:53:52 Yeah, I thought so at some stage, but see, I have some musical training and there are people who play wonderful music and they are the worst teachers because they don't know how they do it. So the same thing can be said about beginners. Beginners have no idea how the user interface should look like to make it easy for them. You need
Starting point is 00:54:13 an expert who assists beginners or looks over their shoulders. What do they do? How do they do it? What would be the thing? So what would be the minimal finger movement for the beginner to actually get what they need right now with minimal effort? And then I think it takes really experts in usability who know just how to design things to make it flow easier, more smoothly. I'm really excited about this idea, though. If there's any listeners who are interested in contributing to Git, I will leave Johannes' email in the show notes. And you can always, I'm sure you can find him on the github.com slash git for Windows slash git repository as well. And thank you, Johannes, for being a guest. I think this was great. Thank you so much for having me. you

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.