The Pragmatic Engineer - How Linux is built with Greg Kroah-Hartman

Starting point is 00:00:00 There's a nine week release. Every nine weeks, there's a new release going out, right? So, Linus does a release, just point in time. And then the merge window is considered open. And then for two weeks, all the maintainers send Linus, all the stuff they've had pending from the last release. We have two weeks to add all new features. And then he does release candidate one.

Starting point is 00:00:18 From thereon, it's bug fixes only for the next seven weeks. So it's bug fixes only, bug fixes only, it's regression fixes, we'll revert things. No new features. Do I understand correctly that in the case of Linux? Is this a thing where every nine weeks there will be a release? It's time-based. So we have that two-week window of merging all the new features to lean this that have been in our tree and accepted already and proven to work.

Starting point is 00:00:39 And the window is short, nine weeks. We used to have three-year-long development cycles. And the problem there is, even if you have six-month development cycles, there's that fear of you have a feature. I want to take your feature, but it's not quite ready. Do I want to wait and things like that? But if you know that you can get your feature in in nine weeks from now, and it's just not ready, it's not ready.

Starting point is 00:00:55 The pressure is off me as a maintainer to take your new feature until it's ready. Linux is the world's most widely used operating system thanks to powering most Android devices, servers, smart TVs and embedded systems. But how is it actually built? Today we set down to Greg Crow Hartman, a Linux kernel maintainer for 13 years, who is one of the three Linux Foundation Fellows. In today's conversation we cover details on how widespread Linux is and why mobile versions of Linux have three times the lines of code as a server versions. What exactly it takes to get a change except to the Linux kernel and merged by Linux show awards himself? How Linux manages to have 4,000 contributors per year yet have no product managers or project

Starting point is 00:01:36 managers. And many more details. If you're a software engineer, you will use Linux directly or indirectly, and this episode will help you understand why it's so widespread and how it's a lot easier to contribute than most people would assume. If you enjoy this show, please subscribe to the podcast on any podcast platform and on YouTube. Thank you. So, Greg, it's really just nice to have you here because you're one of the most well-known Linux contributors, one of the longest standing ones as well.

Starting point is 00:02:05 So just welcome to the podcast. Thanks for having me. I think as software engineers, we know Linux is important in the sense of it's running on most web servers that we use and run. It's a desktop OS that some people use. And it's, of course, you know, powering, a fork of it is powering Android. But what is there to know about Linux? How big is this thing? How complex is this thing?

Starting point is 00:02:28 Well, it's, yeah, it's an operating system. So it's a kernel. We took over the world without anybody noticing. I joke, it's Android devices are 4 billion Android Linux users out there and they don't realize it. Everything else is a rounding error, which doesn't make the server people happy with me. But it's true. It's in everything. It's an all-de embedded devices.

Starting point is 00:02:48 It's in the air conditioning units, car electric charging ports, satellite. It runs the international station. Really? Yeah. Air traffic control for Europe and probably the U.S. All the financial markets. I don't think. It's in the cameras that we're using.

Starting point is 00:03:06 So it's, yeah, I don't know of any place it hasn't taken over. The number of the top five selling laptops for the past 15, 10, 15 years, Chromebooks. Those are all Linux-based. Not Apple, but Chromebooks are. Oh, iPhones. So every 5G modem out there is running a copy of Linux. Really? Yeah.

Starting point is 00:03:26 Wow. So now with Apple doing their new chip, I don't know if it's the new one, but Qualcomm, all the 5G modems, probably the 4G, I'm not sure, but I know all the 5G modems have Linux inside it. This episode was brought to you by WorkOS. If you're building a SaaS app, at some point your customers will start asking for enterprise features like Sammel authentication, skin provisioning, and fine grade authorization. That's where WorkOS comes in, making it fast and painless to add enterprise features to your app. Their APIs are easy to understand, and you can ship quickly and get back to building other features. WorkOS also provides a free user management solution called AuthKit for up to 1 million-monthly active users. It's a drop in a replacement for Alt Zero and comes standard with useful features like domain verification, rule-based access control, bot protection, and MFA.

Starting point is 00:04:11 It's powered by Radix components, which means zero compromises in design. You get limited as customizations as well as modular templates designed for quick integrations. Today, hundreds of fast-growing startups are powered by WorkOS, including ones you probably know, like Cursor, Versal, and Perplexity. Check it out at Workos.com to learn more. That is WorkOS.com. First of all, I'm just reflecting on why I never kind of, you know, like thought about it like this. Because in my mind, it was always like, you know, Debian, a Red Hat, it's on a server site.

Starting point is 00:04:45 And maybe that's because that's where I actually see. it is. Of course, you know, there's the, I'm a, right now, I'm a Mac user and there's the Unix influence, which is an influence, you know, it gets pretty close. I think it's a good time to reflect on how many things it actually runs. In terms of the kernel itself, like how large is it? I know, you know, for different devices, it'll be spread differently for server-side Linux, for an Android device, it'll use different parts of the kernel. How big is this in terms of contributors, lines of code. I know lines of code is not a great measure,

Starting point is 00:05:21 but it is a measure. So we have just under 40 million lines of code right now. That's a lot. That's all the kernel. That's the kernel. The core part is like 5% of that that everybody runs. And then everybody, the rest of it is hardware support. Different drivers, different devices,

Starting point is 00:05:36 different architectures, different chips. So your laptop runs about two, two and a half million lines of code. Your server runs about one and a half million. Servers are really easy. Those are very simple. Wow. your phone runs about 4 million.

Starting point is 00:05:50 Your phone SSCs are in those complex pieces of CPU and interaction out there. They're just crazy complex chips. Why is it? Can we just pause for a second? So, again, lines of code we know is not a perfect measure of complexity, but in this sense, comparing it between the two of them with the same code base is somewhat. So you said roughly, give or take, a server is well and a half million. A phone is four million.

Starting point is 00:06:13 Like three times the lines of code for a phone. Why the difference? Even though I would think that the server, you know, does all this mission critical stuff. A server is really simple. CPU and a network card and storage. And storage. That's it. So, SFC on a phone has, you have power control, you have clocks, you have five different buses on there talking to different types of devices.

Starting point is 00:06:35 You have battery control. You have talking to your modem. You have another version of Linux in the modem. You got USB out the back. You got USB bypass to talk to the audio side. You have audio drivers. You have a zillion different clocks and fives and all sorts of stuff in there. And it's an eight core machine.

Starting point is 00:06:53 There's eight processors and nothing. Those are not trivial things. And sometimes those processors are different sizes. So you have big and little sizes, which add the complexity just for some control, for some power management. But they all run the same core of the Linux, but it's the drivers and the devices and things like that. So your pixel phone. I look at pixel phone. Google ships a core kernel that all Android.

Starting point is 00:07:15 devices pick, not hardware specific, just says RM64. Pixel has 300 other drivers they add to get the pixel phone working. I mean, some of these are tiny. This is for this tiny chip, this is this tiny. But your phone is really one of the most complex beasts out there for software. Is it safe to say that, you know, the complexity and, you know, the lines of code will to some extent scale what that has to do with the hardware, the capabilities, and, you know, not about, you know, like how mission critical? Because, of course, it needs the phone was. needs to be stable. The server needs to be stable. My TV needs to be stable. So, you know, that's just kind of a given, right? Yeah. Oh, and all TVs for the past 15 years are all running Linux.

Starting point is 00:07:53 Oh, so my Samsung TV is running Linux. Oh, yeah. Same, my Samsung, my Samsung washer and dryer running Linux. So your Samsung watch is running Linux. So Samsung has their own, yeah, distrail. All works really nice. It's all due down to the complexity of the hardware. So the kernel controls the hardware. The job of Linux is to make all the hardware look, diagnostic to programs. So you can write the same user space program and run it on the same, on different hardware, and it just works. A kernel's job is to manage memory and devices in a common way and provide that to

Starting point is 00:08:29 user space. It's not a, we joke from the kernel that user space is just a test load. But, I mean, it's a tool there for you to actually solve your problems. So when you're running servers, you're wanting to put message through and network and storage and stuff. That's your load. And that's what they're there. for a phone, you want to control a display, you want to talk out the modem, you want to talk on the thing, you want to listen to audio.

Starting point is 00:08:50 Yeah. Lots of different things there. And I'm just saying want to touch back on the kernel because, like, I'm not a Linux developer. Like, I know, you know, I've heard of the kernel. I, in my assumption, it is the critical parts as, as you said, the, you know, the thing that runs immediately and then it will, you know, the user space will run on top of it. But what is the differential? or what makes a kernel? And you said it's about 5% of all of these things.

Starting point is 00:09:17 How do you split this? Or is there a definition of, again, you're a kernel of developer. So I'm trying to get a sense. How can someone who I'm, let's say I'd like to contribute to Linux and understand how it is. Eventually I'm going to figure out what this kernel is. But what is it? What makes kernel and non-colonel? So kernel versus user space.

Starting point is 00:09:38 So there's an idea. Chips have a protected mode and a not. protective mode. In a very simplified way. There's different levels of protection. So the protected mode is where the operating system runs, the kernel. And that is where we share all the resources. It's one flat address space.

Starting point is 00:09:53 Got it. And we are not isolating processes. So a user space process then runs on top of that and we isolate them. And they all individually think they have the whole machine, but they don't. So it's multitasking. You can run multiple programs at the same time. And the kernel is there to give you memory, to give access to storage, to give access in a common way, to give access to the network in a common way, to give, or provide the pipes to go around the network stack in the user space.

Starting point is 00:10:18 Some people don't like using Linux as network stack. They have their own. To provide a way for all your different mice to show up to user space in the common way. We know all the different mice. USB to storage devices, your graphics controller. We provide a way to make it so that user space can talk to the kernel in an agnostic way. And their stuff will just work because all the graphics work the same interface. We talk to keyboards all the same way, things like that.

Starting point is 00:10:43 So it's a commonality of providing a shim layer above the hardware. And then, for example, drivers, do they always live in the user space? No, all the drivers live in the kernel. So the kernel and drivers are all. Linux is not a micro-cernel architecture. It's a monolithic. So the code is all in the same address space. So a bug in any one of them has a chance to take any part of the kernel down.

Starting point is 00:11:04 So Linux ships all the drivers for all the architecture in one big tarball. That's 40 million lines of code. Other operating systems try and go out there and had split things off. So the core windows is their kernel, and then you could put drivers additional on the top. We tie everything together in one big giant blob. There's a still monolithic. Any driver in theirs can crash the kernel within reason. In that way, we can refactor the way the interfaces between drivers and the kernel are.

Starting point is 00:11:35 Linux drivers are on average one-third smaller than other operating system drivers. because we can see the commonalities. If you send three different drivers for three kind of same hardware, well, let's combine them all, make it smaller, and refactor things and make it easier. And, oh, let's change this API. And this has to do with the open source approach, right? That you see it like that.

Starting point is 00:11:54 So we see all this common code, and we can refactor it, and we can make it better and cleaner. And we're not tied to any fixed interface. Our fixed interface is between the user space and the kernel. We will not break that. That's our guarantee. We've guaranteed it for a long, long time. And so we always want you to be able to upgrade your kernel and not feel worried that your old programs are going to crash.

Starting point is 00:12:14 So you should always be able to upgrade. That's our guarantee to you. If it does break, then it's our fault. We'll progress. There are some exceptions. There's some gray areas. There's some really low-level parts between the user space and kernel that we kind of work around. And we argue about these all the time.

Starting point is 00:12:28 But we never try and break user space on purpose. A lot of times we do accidentally, we'll fix it up. That's our only really rule of kernel development. Don't break user space on purpose. And so when we're talking about the 1.5 million lines rough for server, we're talking about the kernel. Yes, kernel plus drivers. Kernel plus driver, because it is part of the kernel. And then you have this 4 million line of tarball, tarball, and then every platform will kind of take their parts of it.

Starting point is 00:12:54 They'll take what is relevant for their use case capabilities, drivers, you know, other parts. And then this is why, I guess, you know, Raspberry Pi, you're going to say it's going to run. It's laying on Linux, right? Of course. Oh, yeah, yeah, Raspberry Pi, yeah. Those things are everywhere. That's what's in all the electric charging stations. Those are Raspberry pies.

Starting point is 00:13:12 Really? Yeah, where you plug your car into? Yeah, it's Raspberry Pi. Yeah, those are all rats. Wow. Because it's a really cheap industrial thing. Lots of signages now. Those are all running Raspberry Pi's.

Starting point is 00:13:22 I guess it was safe to assume they're not running Windows to be fair. No, yeah. So the Dutch signage for the trains, those are all running Linux. Sometimes you'll see a crash Linux machine up there. Can we look at a specific example of how development actually flow through with a specific patch. Before I showed a specific example, so we had 4,000 developers last year. So they make a change.

Starting point is 00:13:43 So those 4,000 developers will send an email to a maintainer, and a maintainer maintains a subset of the kernel. Every part of the kernel is owned by somebody. And then you are one of these maintainers as well. Yeah, I'll maintain some drivers and things like that, but then those maintainers send things off up the tree to a subsystem maintainer. So like USB serial, then we'll get sent up to USB, and then USB will go to Lieness. So it's kind of a pyramid scheme that way.

Starting point is 00:14:07 We have that. So we have like 800 maintainers, and we have the middle section. We maybe have about 200 different trees there. And then in our testing environment, all those trees are tested every day. They're all merged together and things that happen, whatnot. So we have this kind of hierarchy of developers and maintainers that way. And part of the hierarchy is the human aspect. So if I take code from you as a maintainer, I'm now responsible for it because my name's on it.

Starting point is 00:14:35 So if it's a simple one-off or it's a simple driver that nobody cares about, except you, great. I know you're the only one that's going to be affected by it. It's fine. But if it's the core part of the kernel and I take changes from you, now I'm responsible for it if you disappear. So I have to trust that either you're going to be here or that I understand it good enough that I can maintain it. So part of Linux development trust or model is trust and it's trust in human interaction. Like I will take stuff from people. If they, whatever they send me, because I trust, not that they got to.

Starting point is 00:15:05 right, but they'll be there to fix it when they get it wrong, because we all get it wrong. And that's the part. So that's the trust model we have. And that we've been burned in the past by some major features where it landed in the networking core subsystem a long time ago. And then once they landed and were merged and taken, the email address behind it disappeared. And the network developers took six months to unwind the mess. So it's hard to change the core part of the kernel for a good reason, because it affects everybody. And also a good reason in that we want to make sure that you are going to be there to fix it if he breaks. Yeah.

Starting point is 00:15:38 But for drivers and things like that, we'll take anything. Drive-by. It's really simple. Yeah. It's very simple thing that way. But that's the hierarchy. So it changed flows up the tree that way. Yeah.

Starting point is 00:15:48 So I can show that. All right. So what are we going to see? So here is a change. So this was written by somebody named Chester. He made a change to the USB serial driver. It's an option. The chip is called option.

Starting point is 00:16:05 These are a USB, the serial devices. They're in modems, there are in lots of different things. There's a ton of different ones. And there's no standard for these types of devices. So you have to add a custom device ID for every single one that you want to use. It's just the way they work. So here's the patch. And here's the description of it.

Starting point is 00:16:23 This is just an email. The description here is the subject line. Yeah, USB serial option, adding whatever. Adding that device. And then here's, so the hardest part about, the hardest part about writing a kernel chain. is the description of what's going on. Really?

Starting point is 00:16:38 Yeah, I mean, the code is easy. It's the description explaining it is hard. You don't explain usually what it's doing. You have to explain why you're doing it. For something as simple as this, it's like, it's really easy. So this person says this driver is part of a cat six modem. The product ID is shared across multiple modems. It gives them a little dump of what it looks like in the device.

Starting point is 00:16:58 And then there's some more information. There's a signed off by line. And signed off by is what we created a long time ago. that shows that I have the proper authorship of this and ownership and I give it to this project under the license by which the project is run. So it's saying I'm licensed this thing under the GPL. And then way down below is probably the one line patch. This is all just. So this is all the description.

Starting point is 00:17:24 This person is giving context on like here's what needs to do about the modem, the different specifications or what. And here's the change. So somebody changed this, remove that line. The red is removed. the green is that. Yeah. So somebody added and had to reformat the lines based on some new ones they added. And that's it.

Starting point is 00:17:42 So they had a few new device IDs. And then there's a device idea and then we see a few hex numbers. Those are like some IDs here. So for USB devices have a product and a vendor and a product ID. That's how they're a vendor in the new products. And then there's some sub device and subproduct IDs. Got it. So that's what this is for.

Starting point is 00:18:04 Okay. So they're saying, we're just adding support for some, the driver already works for these chips, this chip, but we just have a new ID because a new vendor came along and they wanted to put their own vendor ID on it. Very common. So it sounds like this change is as simple as they get in terms of the code change, but still the description was very extensive, right? So very extensive. Part of the description was also just here is a dump of the description of the hardware, just so that we can verify, yeah, that is going to match with this. Got it.

Starting point is 00:18:33 So just it's a, we have tools that create those things. But yeah, it's a lot of work for four light change. But this is like if we talk GitHub language, although I'm not sure. This will be a PR, right? This is, this should be in the patch itself, not the PR. So a PR would be, so say you have 10 changes you want to make. So a PR would be the patch 0 out of 10. Got it, got it, got it.

Starting point is 00:18:59 Yeah, this is the commit. This is the commit itself, which is a big problem of why I don't like the GitHub model, because people don't put the changes in the GitHub, in the Git commit. No. Because the Git repo. We don't. So there's a problem.

Starting point is 00:19:10 When you commit the, when you're looking at the repo later and you look at the change, you don't see the pull, you can't see the pull request information. Yeah. And it's gone. And that's a big problem I feel with the GitHub. Well, I feel this goes back to, you know, like you built the tool or, you know, Linux group built a tool for your use case and you're using it the way you intend to use it. Whereas GitHub built, the pull request workflows is built on top of this.

Starting point is 00:19:31 And it is not part of GitFurrift. Git for whatever reason. Maybe GitHub could have made it part of Git whatnot, but it's not, right? Well, no, so we have pull requests. We created pull requests in Linux. We email, there's Git, create pull requests. Oh, okay. That was a good command.

Starting point is 00:19:49 It makes an email. Is that part of the Git? Yeah, it makes an email that says pull from this repo, and here's everything that's in this repo. And when we do a merge of that, that merge commit has all that information in it. Okay. And then so you'll, if you look at the Linux kernel and you see when you merge, when Linus merges in the USB tree, he sees my little message at the top saying,

Starting point is 00:20:11 here's everything that's going to be in this pull request. You got it. And because Git is the source and that's where all the data is, right? And so you can see that we don't have pull requests. It's not external. GitHub could change that model and put that in the merge request, but it doesn't. I was about to say that, like, because you did it, they could. But it's a matter of, yeah, I guess preferences and, yeah.

Starting point is 00:20:31 That's fine. Anyway. But the good thing about this is you can track every single line of code back to who made it and what they did and what was the change. What was the change law? What was the reasoning behind this? Yeah. Which is great. Okay.

Starting point is 00:20:44 So this comes into, this person sends it to the module. So he sent it to the, yeah. So the owner to this is Johan. There's a script we have that says take any patch and give me the people who are responsible for this and the mailing list. So Johan and me. I picked this and sent it to the USB list and copies a bunch of other people that I guess they worked with

Starting point is 00:21:05 and that have changed this driver in the past. And then the copy is also done with the tool. As you said, it kind of looks to who touched this code or who might. Exactly. All nice. This is all automated. Yeah, all automated.

Starting point is 00:21:16 We do that in the soon as well. So that was great. He sent it and just the mailing list has two copies of it. That's just because they wanted two different mail lists. But then they said, oops, I messed up. There's an email from the person instantly.

Starting point is 00:21:31 You have to re-send it off. Oh. It said, I messed up. There's an interface. Maybe I'd be a good idea to change this comment. So they go and change the comment. And then they resend it. And you just send a new version.

Starting point is 00:21:43 And then in this case, they send a new patch? Or do they just add one more? No, you want to have a clean commit, right? Yeah, we don't do. So here they sent a version 2 patch. If you can see that. It says version 2, right? Oh, there.

Starting point is 00:22:01 Version two. Got it. And then here's the same information. And then there should be some comments, but what changed between the two versions? Hopefully. Yes, changes in version two from the previous one. And there's a link back to the first one. Nice.

Starting point is 00:22:16 Very nice. So we want to see the changes because, I mean, I get a thousand emails a day. Yeah. And when I review patches and stuff, I'll review them. And then they're gone because I'm reviewing the next one. Yeah. But if I want, when you send a next version, I want to remember. see what changed from the previous one because I don't want to go back and dig through all

Starting point is 00:22:32 old stuff. Okay. So they added some information to it. Wonderful. And then what happened? Johan, who is the maintainer of the subsystem. Yep. Wrote said, hey, thanks for the patch and how for documenting it.

Starting point is 00:22:47 Oh, he did something else. I get the order. First they said, oh, Chester wrote, hey, please apply this after two weeks, after a week. He said, because after a week or after two weeks, it's nice, hey, what happened to this? What's going on? Johan said, you submit this patch during the merge window. I'll talk about how we do our development model, but there's a two-week merge window when we do releases that Linus takes all the changes from all the maintainers that have been in their development trees.

Starting point is 00:23:17 We can't add new changes at that point in time. So there's a two-week kind of blackout for new development. But this is where all the stuff is flowing into Linus for the next release. So during that time, if you send me a patch, I can't really do anything. with it, but it'll stick in my mailbox until then. So this happened to hit that little window of time. Just to understand, there's a nine-week release, every nine weeks there's a new release going out, right?

Starting point is 00:23:40 And then there's a window where patches are gathered. So yeah, here, let's talk about that. So there's, Linus does a release. Yeah. This point in time. And then the merge window is considered open. And then for two weeks, all the maintainers, send Linus all the stuff they've had pending from the last

Starting point is 00:23:58 release. Yeah. We have two weeks to add all new features. Yeah. And then he does release candidate one and then from there on it's bug fixes only for the next seven weeks. Mm-hmm. So it's bug fixes only, bug fixes only, it's regression fixes, we'll revert things and so it's no new features. Yeah. But during that seven weeks, people are sending me new features. Yeah. So I have a separate tree which is, which is my next. You're not you're not bashing it for the night when the window will open. Yeah. So we call it next, Linux Next. So we have a next tree where all these are merged together on a daily basis to see to make sure they work. Yeah. Be prepared for Linus's next one. And then when he does a release,

Starting point is 00:24:36 after everything's good, we all throw things at him again in another two weeks. Now, Linus doesn't pull automatically from all those merged trees. We have to explicitly ask them. Yeah. Because sometimes our trees aren't good. Yeah. So sometimes like I maintain the TTY and cereal one time famously, it was a mess. Our tree, it just wasn't working. There was new features added. So, So I'm like, I'm skipping this release cycle. I'm going to pull out some of these bug fixes and send it to you off the side and then go. But if it was like automatically being merged in, we'd have to deal with that mess. It's just interesting because most companies just, you know, reflecting on, you know,

Starting point is 00:25:09 the companies that use Git, large tech companies, they often have, let's say, let's talk about native mobile development where there is a concept of releasing every week or two weeks because of the app for review process or same with like desktop apps that you can't really just continue to release. There's usually an aim for something, but it's not as strict. So every now and then it would also happen that it's just not stable enough, we'll push it back. But there is not this rigid, like, clockwork. Like I think, you know, most companies that I've seen, they just treat it a bit more flexible

Starting point is 00:25:41 because, again, you know, they come up with the thing that they're in charge with it. Your feature you want to have at it. Yeah. And then as we know, when you have a milestone, you know, like features might be cut. Deadline might be moved. you know, like companies. But do I understand correctly that in the case of Linux, like is this a thing where every nine weeks there will be a release?

Starting point is 00:26:01 It's time-based. So we have that two-week window of merging all the new features to Linus that have been in our tree and accepted already and proven to work. And the window is short, nine weeks. Yeah. And that's good because we used to have two-year-long development cycles, three-year-long development cycles. And the problem there is if you have,

Starting point is 00:26:18 even if you have six-month development cycles, there's that fear of you have a feature. I want to take your feature, but it's not quite ready. Do I want to wait? I know what you're doing. But if you know that you can get your feature in in nine weeks from now, and it's just not ready, it's not ready. And it's much more like, okay, the pressure is off me as a maintainer

Starting point is 00:26:36 to take your new feature until it's ready. You can say like, look, if it'll make it into the next one or let's make sure it's going to work properly if it's a more complex one. Yeah, we have lots of features. I mean, famously there's a USB feature that's on patch version number 35. This 25 patch series, it's on the 35th version, and it's just not ready. And I just got email today saying, well, maybe we need to change this to this other way. I mean, so I feel so bad for that developer.

Starting point is 00:27:01 But he's been working hard and it's a complicated feature, and it's taken him a year and a half to get there. I have other patches that are in version three, but that's version three, and it's been two years. Because the developer just took a lot of time in between. Okay. So in this case, this is a good example that, you know, the person, the contributor, said like, hey, a reminder, I'd like this patch applied. And then Johan replied, reminding of the timeline on how it works, right? Yeah, exactly.

Starting point is 00:27:29 And Chester wrote back. And then really friendly, it'll be in the next one, don't worry. Which is nice, very positive. Yeah, we're not mean people. And reminding, don't ever feel bad about reminding me that I haven't reviewed your patch in two weeks. Now, if I haven't reviewed it in two days, yeah, I'll be a little testy. But two weeks is a good idea. And then, Justin Robax, thanks a lot for keeping eye on it.

Starting point is 00:27:50 Keep up the good work. And that's it. So then Johan has it. Yeah. Johann applied it to his tree because he then wrote saying, hey, and Johan is very nice here. He said, you kind of didn't do the comments in the proper format. I fixed it up for you. Oh, nice.

Starting point is 00:28:03 So for drive-by changes like that, we want to make it really easy. And we're not mean people. I mean, clearly, this feels like it's a person who is unlikely to become a regular contributor. They're getting their work done, right? They're adding. They have a device that they have to share. Yeah, pretty much. But we want to be friendly and open and easy to everybody

Starting point is 00:28:22 because everybody submits their first patch at one time, right? Famously, when I did my very first patch, I wrote an email saying, how do I make a patch? Because we didn't have good documentation. Somebody wrote back and said, hey, here's how you do it. He became my boss eight years later. It was just funny. It's just like a small world and whatnot. But yeah, and we want to make it easy.

Starting point is 00:28:42 So Johan takes us, and he's got the patch, and it's in his tree now, which is great. But that's in his local little tree. Then he has to get it off somewhere else. Johann then makes a pull request to me. So this is an output of the Git request. Make pull request. I don't know what the actual commandant. And this is what a pull request from Git will look like.

Starting point is 00:29:01 And this is because Johan is a subsystem maintainer. Yes, Johan maintains the USB to serial drivers. There's a bunch of drivers for this types of things. And then he sends it off to me the USB maintainer. Got it. And he says, take this patch or pull from this tree of this tag. And it's a sign tag. So it's signed with his GBG key, so I can verify that.

Starting point is 00:29:20 It's really him when I pull from it and says, take these patches and here's the information. It's going to be some USB device IDs, and they have all been in Linux Next with no reported issues. So they've been tested. In our integrated testing, we test all this stuff every day. And what does testing mean? Is it automated testing? Is it pushing it out on devices and device labs? Is it a mix?

Starting point is 00:29:41 Yes, it's all of that. So Linux Next gets merged every day as developer in Australia. He merges all the trees together and builds them and boots them. And virtual machines. That's a non-trivial thing for a colonel to do. If it can boot, it's usually a, things are going well. It isn't testing on real devices. Now, there are other labs out there with Colonel CI, which is our CI infrastructure,

Starting point is 00:30:04 that can run on all individual labs. And we do push things out there and are people testing Linux Next on their real hardware, sending us reports back in an automatic fashion. Those are less rare. Linas tree gets tested more on that. Stable trees. I can talk about stable trees and a little bit. I mean, tested more on the real hardware more. Linux Next gets build and boot tested pretty well.

Starting point is 00:30:24 I don't run Linux Next. I run my development trees online, so I don't run all the mix of them all. Sometimes they interact because we don't have any fiefdoms. If I have a USB change that needs to actually go through the networking stuff, I can change a networking code and whatnot like that. And they can say, hey, maybe you shouldn't do that. And we try and get approval. You review my patches. but it's now we can touch any part,

Starting point is 00:30:47 but he can touch any part of the kernel in a way. But he sent me a pull request. And a pull request is that I don't actually review the changes in it. I'm not reviewing each individual patch through email. I'm trusting that he sent me four patches here and that they're good. Yeah. And I have known Johan and I know that he will be there if something goes wrong. Yeah.

Starting point is 00:31:05 And like you, will you read the kind of the description and then every now and then you might decide, for example, to like deep dive into a change? Totally. I mean, for USB device IDs, it's like, okay, yeah, they're all touched in the same driver. Yeah, these are common. They're nothing simple. Sometimes they're a little more complex.

Starting point is 00:31:22 I don't pull from a lot of different trees, but I pull from some that I trust. Some subsystems that I don't necessarily trust as well, I will make you send them an email. And I'll actually review them, and then I'll review them, and then when I review them, I add my signed-off by to it. And I guess part of trust will be here. I'm just going to assume that since you and Johan know each other well and you work for a while, will probably also every now and I give a comment saying, hey, Greg, there's this change. Can you take an extra look on on this thing, et cetera? Yeah.

Starting point is 00:31:52 So sometimes Johan makes changes to the code himself. Or I make changes to the code myself. I put it out for review. And I have other people review my changes. This is just fascinating for me to tell you explain how trust between people, maintainers, is so important for efficient development. Yeah, it's all, it's, yeah, it's, and then also the trust is somebody once told me that Linux development was the scariest thing they ever did because not because it was like difficult or whatnot, it's because my name is on this change and it's public. That makes you as an engineer do really, really good work.

Starting point is 00:32:31 I mean, so much so that this person who said this patch went back and looked at it instantly and said, oh wait, the comment could be made a little bit better. And they're like, oh, yeah. So, I mean, that's not a normal development process in a company that I commit to go. It makes me wonder about a few things that I kind of took for granted. For example, could this mean that close source software where the outside world does not know how it's done? Maybe there's just a bit less incentive to do such great work. And actually, it's just a reflection. Like I do remember when I worked at a company and when we actually my team, we open sourced a component that we built.

Starting point is 00:33:08 And I just remember how I put in way more work into that to make. it look good to have the document, not just look good, but make it clean. We cleaned up actual tech debt before we published it. And we didn't do that with our stuff. It was... So open source development, by virtue of just human pressure, makes a better engineering product. It's a better engineering. And we've kind of shown that through the years that this development model creates a better

Starting point is 00:33:38 software. I'm kind of revisiting some of my, like, not assumptions, but I never... thought of it like this, but it's just, it's awesome to see this. So, so then what, what, what happens next after, after Johan sends it to you? So Yohoan sends it to me. And then I take it, and I put it in my tree. I think I send, I sent him saying I took it. And then if you take it, you're responsible, right? I pulled it and pushed it out. Yes. And there's my email that says that. So now I'm responsible. It's in my tree. So now the, um, since this is a device ID, these can go to Linus at any time. We can add bug fixes or new device IDs. These are true. Yeah. So

Starting point is 00:34:13 then a few days later, I send this change off to Linus. So I send Linus. I said, hey, Linus, take all these following changes, these changes, and here's a whole bunch of USB fixes. So here's some small driver fixes, some new device IDs. And so I summarize it all. I say these are all the things in here. Yeah. And these are going to be like a few dozen of patches, something like that. Yeah. But here's the list of the patches down below. And here's the diff stat of them. Here's the diff of them to make sure that this diff matches what he pulls from. This is signed with Mikey. I do say almost all of these have been Linux next. I guess some of them slipped in. But we also have another testing. When you send patches to the mailing list, we call the zero-day bot. We'll go

Starting point is 00:34:55 through and start applying them and build testing them. And that's run. And then our own trees that we create also does verification that they did build and boot. And it will run some benchmarks. For drivers, it doesn't really run benchmarks. And so then Linus takes this and he puts it his tree. So then it got picked up. So it got picked up another day later. And then let's talk about how we do our model. So Linus does a release every nine weeks. Yep. Bug fixes come in during those nine weeks for the last release. You're running the last release, right? You want those bug fixes. You had a device that's running those bug fixes. A long time ago, we realized that people don't want to wait eight weeks. So let's create a model of we have a development tree and we have a stable tree.

Starting point is 00:35:36 So when Linus does a release, I fork off Linus's branch. And I say, this is a stable branch. So 6.4, I do 6.4.1.2, 2, 2, 3, 4, 5. And our release numbers are just numbers. They mean nothing. They're not semantic versioning. We were around way before that happened. They're just meaning this number is later than that number. That's all. When we switch from 4.x to 5.x, it's just because the X got too big. Yeah. And in your brain, when you see a number between like 14 and 18, it looks smaller than 4 to 8. Yeah. So we just bump it up. every couple years. So then we, so we take stable. We have stable releases. I do a release every week. And what I do is the patches have to be in Linus's tree first. We can't diverge. So if it's in

Starting point is 00:36:20 lenesis tree, it's a bug fix and meets this criteria, I put in the stable tree and I do a release. And so we do new releases every week for that. So during those nine weeks, I'll take new device IDs. I'll take bug fixes and whatnot. And then you can tag the fixes that are going into the tree with a special way that I'll automatically take them. I know to look at them. The other stable tree, with me, Sasha, he runs through them and runs a whole bunch of fuzzing. He's been doing AI before it was called ever AI. It's just pattern matching. And we have a whole body of here's a whole bunch of bug fixes.

Starting point is 00:36:51 Here's a whole bunch of changes. Did anybody do these kind of match? Oh yeah. Some people don't realize that, oh, this was a bug fix. It should go into the stable tree. They've written academic papers on it for years. It's fun stuff. So just pattern matching, right?

Starting point is 00:37:05 So they'll pick up a whole bunch of stuff that, hey, maybe you forgot about that. And you'll give you a chance to respond. before it goes into the stable tree. And we do those releases. When Linus does a new release, then I throw that stable tree away and I make a new stable tree. That's great for things

Starting point is 00:37:19 that I can update more often. People want to make a device. You want to make it something that's going to last a long time. So what we've come up with, the idea is long-term stable trees. And there I pick one kernel a year and I maintain it for,

Starting point is 00:37:31 to start with two years, sometimes six years. So your Android phone is running off a kernel that's five years old, but it's still getting bug fixes back to it. So I maintain like four, or long-term stable trees at the same time. And we backported all these fixes to all the different branches,

Starting point is 00:37:45 and then we pick one a year, and we maintain these. So there's six of them going at a time. And in this case, like, is it you, like, there's one maintainer for each of these long-term? No, it's just me. Oh, you? Wow. Okay.

Starting point is 00:37:59 Yeah, it's the two of us. The longer, the interesting thing is the older the code is, the harder it is to maintain. And the company's like, oh, I'll put a junior developer to maintain old code. That's harder because it's more diverse. from what the latest developers are using. Can you tell me a little bit more about this? Because the older, the code, the harder is to maintain.

Starting point is 00:38:18 I think it feels true. But why is this the case? Is it just lost context? So development moves on, goes forward, right? So say a change I make today to the code base. It fixes the bug that affects the code that I look back. It's affected the code for the past 10 years. Yeah.

Starting point is 00:38:37 All right. If I try to start backporting this change, to code that's 10 years old, code has evolved in that time. Yeah. And making that change to older code is harder. And the more I have to change it, and more diverges from the original fix. So the more context and skill you have to have to make the change to the older code base than even the developer who made the first change.

Starting point is 00:38:57 It's not intuitive. Companies make this mistake all the time thinking, oh, I'll just maintain this old code base for a long time. We have major security bugs like Spectra and Meltdown with Chips. Some of those spectrophixes have not been backported to some of the long-term kernels that are still being supported because it was just too hairy of a fix. Anybody who cared, move to a new kernel. Yeah, yeah, yeah. So I look at a lot of these older kernels is, again, if you're using it, you will provide the resources to maintain it.

Starting point is 00:39:23 Google, I'll call out in Lenaro, Google's another group, do a lot of work in testing these old kernels because Google cares a lot about these kernels. So they provide testing infrastructure and merges and reproducibility and running on real devices to make sure that these kernels still work on them. and they work well. And that way I know that if I make a change back there, it'll still work. If I didn't have that resources for them doing that work, I wouldn't be able to maintain these old kernels. Yeah. And then going back to the buck fix,

Starting point is 00:39:49 so like every week there's a new stable branch release. And then when does the big release come? The nine-week release come? That's after this has been kind of baking, right, for the stable branch has been... So the stable's independent of Linus Street. Oh, stable's independent. So the only tie is it has to be in Linus's Street first.

Starting point is 00:40:07 We do not want... Got it. We don't want you to make a fix to a stable tree only in non-Lenus tree. Got it. So sometimes I will have bugs in the stable tree due to other changes I've taken. I mean, fixes need fixes. And I'm like, I can't take the fix for this until you get the fix in Linus of tree. And it's kind of a forcing function on a developer to get a fixed to Linus before I'll take

Starting point is 00:40:27 it for the stable tree. Sometimes I'll revert the change in the stable tree. And do I understand the way to get a fix into Linux is a, well, of course, you need to get a fix into Linus's tree, which means you need to go through the change. through one of the maintainers who is in, you know, who maintains one of the subsystems. Yeah. So say it's. And you just need to go up the tree as you up the pyramid.

Starting point is 00:40:49 Right. So famously, Bluetooth always breaks every other release. Bluetooth is crazy complex. The hardware is horrible. And if you need to get a fix in there and has to go to Bluetooth three and then that gets sucked into the networking tree and then that network tree goes to lean this. So it's like a two-stage process. Sometimes and then we have somebody tracking regressions.

Starting point is 00:41:07 Regressions are really important. We don't want anything to regress. Sometimes Linus will say, I'll just take these bug fixes or regressions. I'll just take them now. Boom. I'll just take them. So it depends on what they are. If they affect hardware that's really common, we prioritize that over hardware that isn't

Starting point is 00:41:20 as common, just by virtue of, hey, this broke my laptop. Right. I want to keep working. So yeah, it's a little thing that way. So we have two branches going at once, development, and then stable releases happening. So then this went into the Linus's tree. I picked this out as part of the stable trees, and then they ended up in the stable tree somewhere as well.

Starting point is 00:41:41 And then I can give you dates for all this stuff. This whole process took about a week and a half. And that was it. Okay. And then here is, it ended up in the 6.13.4 kernel as well. Yeah. And then as another ones as well. Back to all.

Starting point is 00:41:59 Trust isn't just earned. It's demanded. Whether you're a starter found on navigating your first audit or seizing secured a professional scale your governance risk and compliance program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in. Vanta can help you start or scale your security program by connecting with auditors and experts to conduct your audit and set up your security program quickly.

Starting point is 00:42:23 Plus, with automation and AI throughout the platform, Vantac gives your time back so you can focus on building your company. Businesses use Vantage to establish trust by automating compliance needs across over 35 frameworks like SOC2 and ISO-2701. With Vanta, they centralize security workflows, complete questionnaires up to five times faster, and proactively manage vendor risk. Join over 9,000 global companies to manage risk and prove security in real time. For a limited time, my listeners get $1,000 off Vanta at vanta.com slash pragmatic.

Starting point is 00:42:55 That is V-A-N-T-A.com slash pragmatic for $1,000 off. So we saw what it takes to get a fix into Linux, and it actually wasn't that complicated. No, it really is. I mean, it's just you email a change off. You email, you use the Git workflow. If you're familiar, it's pretty simple. Obviously, obviously you need to be able to build Linux, test it on, test it yourself, validate it locally that it works, the basic things. And then it's straightforward.

Starting point is 00:43:24 The fun thing is so I can take a change like that without really testing it because it built. It obviously works for your hardware. I didn't test it, but it works, and I assume that it goes. Yeah. And yeah, it's a very fast workflow as far as getting a project. So it was like a two-week window from sending the first change as the merge window to getting it out into stable kernels to the world. That was pretty fast for overall, for a worldwide project that is everywhere.

Starting point is 00:43:50 So I think I understand what it's like to, you know, be someone who contributes to Linux every now and the. But over time, some people start to contribute more. They become more regular contributors. And eventually, you're one of the few people or one of the few or many people who works on Linux full-time. Are there many people working on a full-time? So Linux has almost always been paid to be worked on. So I started keeping the numbers back in, what, 2006 or something? And at that point in time, 80% of the people that contributed were being paid to do it full-time for their employer.

Starting point is 00:44:24 And their employers want people who know how to do Linux because they're... they want to solve their problems. They want Linux to, it's much cheaper to pay a few engineers, to add a few new features than it is to write your own offering system. That's the beauty of Linux. That's why IBM put a bunch of money into it. That's why everybody uses it. It's a tool for people to get their work done.

Starting point is 00:44:41 You want to run your battery. You want to run your car charger. You had a little driver for the one device you had. You had an engineer to do that. And it's good to go. And it'll be maintained for forever because we maintain it in the community. It's all good. So it's cheaper.

Starting point is 00:44:53 So we've been doing it. And the joke used to be you get three changes into the kernel. You get a job. It's not really a joke. As long as they aren't spelling fixes. But some people do spelling fixes, which is great. We have people that do janitorial work through the kernel. They sweep the tree for common problems, and they just clean stuff up and keep code alive and

Starting point is 00:45:11 make sure it's fresh, proper coding style. We have coding style issues. We have people just fixing spelling mistakes and comments, which is great because you've got to start somewhere. In fact, spelling mistakes and comments is a great place to start because it makes you get the workflow down. You figure out how to make a patch. You have to send an email.

Starting point is 00:45:26 fix your email client and not send HTML and things like that. And you can't use a web client. It doesn't, web email client to send an email. It just doesn't work. Good email. There's lots of really good email tools out there. Use them. But you're now a full-time.

Starting point is 00:45:41 I'm a full-time maintainer. What does your kind of day-to-day or week-to-week look like? Because I'm going to assume it's going to be a little bit different than most developers who, you know, like write code, review code, do those kind of things. So, yeah, I mean, I've been working for. for the Linux Foundation for what 13 years now. Before that, I used to work at Nobel and Sousa, before that IBM, and then a little startup, all doing Linux stuff all this time. And then before that I did embedded work. When I worked for a company, you end up working on features

Starting point is 00:46:08 that your company wants, or reviewing code from the other developers of your company and then sending off changes. Or if you're a maintainer, a maintainer is, and the networking maintainer said this the best, we're like editors. We used to be a writer. All we do is critique other people's stuff now. But because we were a writer, we have a little side project. So we do have little things that we do dabble and stuff. So like I looked, I did 80 changes, only 80 changes last year because I have a few little things I want to do. That was low. But working for Linux Foundation as a full-time maintainer, that's rare. I think there's only maybe five people, maybe, maybe a handful of people that just work on whatever they want to do. So Linux Foundation

Starting point is 00:46:44 rule is they can't tell me what to do and I can't tell them what to do. Works so great. Me and Linus and Shia Khan, we're all fellows there and we work on improving Linux for however we feel like it. A lot of Me and Linus do a lot of review, a lot of other stuff. Linus still contributes. He famously rewrote the core locking primitives in Linux a couple years ago. I had a Microsoft developer say, there's no way any of us would be even allowed to do that on Windows. You know, you don't test changed core bits and pieces. For one of the security features and one of these stable releases,

Starting point is 00:47:15 Lena said to rewrite the call path from how a user space calls into the kernel, the core cis call path. Nobody really noticed that it got rewritten, but it did, and he did it. in a stable way and then it works like that. So we're also part of the kernel security team. We get security bug fixes all the time. And if they're easy, we'll just fix them ourselves and send out the fixes. So we do security fixes a lot as far as that goes.

Starting point is 00:47:39 So my day-to-day is I read other people's stuff. Like I said, I got 1,000 emails a day to do something with it. You're not like x-rating. No, yeah. You get a thousand emails a day. Yes. Wow. So I don't have to just file off.

Starting point is 00:47:52 And it's like, oh, this, like I subscribe to a number of current subsystem mailing lists to see what's going on. Yeah, yeah, yeah. And I don't have to do something with all of those. Yeah, but some of them, you need to do something. Yeah, some of these I do need to do something with. And some like, so I'll, so say for USB is one of the subsystems I retain. I showed them all off to a mailbox, and then once a week I'll go through the mall and say,

Starting point is 00:48:10 okay, let's review all these. And so I'll look at my inbox. I'll have 200 USB emails to patches to go through. And other people review them and other stuff like that. And okay, this maintainer said this was good, not good, whatnot. And I apply them to my trees, see if they build, if they fail. I'll report those. You know what you're doing reminds me a little bit of when I used to work at Uber.

Starting point is 00:48:31 We had this concept of RFCs, which I think got inspired by the RFC process. So people would just send off a document of here's what I'm planning to do. And they were mailing lists for like back and mobile, different parts. And I noticed after a while that the more tenured engineers and the more experienced engineers was spending increasingly more of their time reading through these things, critiquing, giving feedback, giving pointers, connecting the dots. Like, it just hit me when one of the first mobile engineers that Uber was telling me that he has one day blocked out per day just to go through all of these things,

Starting point is 00:49:06 which again, it wasn't kind of part of his role, but he felt responsible. He had all the context. He actually helped so many people avoid certain things just by pointing it out. It's the same thing or something similar happened here. It's the same. But we also, the different part of this, and I'll call this out, we don't have grand proposal sent to the kernel list. We don't say, hey, wouldn't it be great if you did this? I don't want to see that. I want to see code that works.

Starting point is 00:49:31 I love it. As proof, then code that works matters is because you've taken the time, you've proved that this can be done. Now, not necessarily that it's done right or done the best way, but it could be done. And that's now you have the skin in the game, and now I'm willing to work with you, and let's go on that. People do send off RFCs of patches. If it's an area I care about, I'll look at it. Sometimes you can get away with this. This is a fun trick with maintainers. If you send me a patch set that solves your problem in such a way that it's horrible, but I hate it so much, that I'll rewrite it myself. Because it'll be like, I can't say no because it's solved the problem, and I want to solve your problem. But if I don't say no,

Starting point is 00:50:09 then I have to take that. So you can do that once a year to a maintainer. I says that you're a little bit busy work because I've seen at different companies when you have the proposal process. Again, a lot of companies for good for, you know, it's, you know, it sounds logical. Instead of starting the work, instead of investing time, maybe we would all save time by do a little planning up front, right? But then every now and then what happens is you get into this never-ending planning, nothing happens until either the project is abandoned or someone just sits down, write some code, and kind of, you know, just cuts all the discussions are done because now it works. Yeah, well, you have to prove that it can work. And so inside companies, I'll say, we do have,

Starting point is 00:50:46 we did like when I work for companies, IBM, it's like we had planning, okay, we need to implement this feature to match this parody with this old version of Unix. How are we going to do that? Let's figure out to do this. Is this going to work? Yada, yada, yada. And we have planning, things like that. One of the fun things is, when you're dealing with open source and this happened at IBM, engineer over here was tasked with fix this problem. Great. He came up with a solution, submitted all the changes upstream. Lots and lots of discussion. Turns out his solution was not very good. Somebody else saw that it was a problem, rewrote it, submitted it, and got it accepted. but it wasn't the original engineers work.

Starting point is 00:51:21 And so the end of the year came, it was like, how was this person going to be reviewed? And we're like, he caused the feature to get done. It wasn't that his code made it in, but he influenced the community and made, the goal was you wanted to see Linux support this, right? Linux supports this now. And it had a change in mentality of how management had to treat engineers.

Starting point is 00:51:41 And also the same thing with who owns the code. We had people come in and end up becoming maintainers of certain subsystems. And that's great. And they were maintaining this part of the kernel. And then they were reassigned to do something else within the company. It's like, oh, that's great. But you're going to still have to give him time to do that other thing. It's like, no, no, no, no, no. The community gave it to him. It follows him. If he goes to a different company, it follows him. He goes to a different company. He follows him. And that's actually why Linux is so good, I think. When you work in a big company, you're forced to work on new things every couple years, right? And that's part of moving up in a company. You get different tasks when not. Famously, Windows has had like eight, no. five different teams work on their USB stack. Linux is that one team working their USB stack for 20 years. And then we know this stuff and we have this development in depth there. We just keep coming back to like I was kind of expecting a little bit of a discussion about

Starting point is 00:52:32 I came in here just using Linux or indirectly using Linux a lot, but not knowing of the depths. And I kind of thought that we would talk a lot about the tech, the processes. And every time we come back to the people, the trust, I wanted to have. why you think Linux has won so big that it's everywhere, but I'm starting to get the answer to this. Because I was thinking, why Linux? Why not a commercial? If I naively ask myself before this conversation, like we have two teams. One is commercially funded. They're selling their software. They're paying the developers really well. And then the other one is giving it away for free. They figure at a model where people are still paid, but, you know, it's open source.

Starting point is 00:53:19 Anyone can use it. Anyone can contribute, which one will win the long term. I naively would have said maybe the commercial one because they're incentivized. They're going to, you know, create all these professional things where here is more interesting value. But but. But so Linux has been contributed to by companies in their own interest. So it turns out everybody contributes in a selfish way.

Starting point is 00:53:41 We want to solve my selfish problem. But it turns out everybody has the same problems. So your problem being solved is the same problem as their problem. Famously, we had this when it came down to Embedded. So Embedded happened, they came up saying, we need to change Linux to make it work better on batteries. You know, power is really important. Power is very, very important.

Starting point is 00:53:59 Law more efficient. So we wanted this. This was when Linux was first getting into Embedded and we need to make this very efficient. We're like, great, that's a wonderful little solution. Make it work for everybody. Like, no, no, no, we just care about embedded over here. That's the only person that's going to care about power.

Starting point is 00:54:11 It's like, no, you really, just make it generic. It'll all be good. It turns out data centers save billions of dollars in money because of power management. And it turns out everybody. So the mainframe is more efficient on a mobile phone, suddenly it's a good candidate for it to be a mobile OS. Yes. It works for everything. Same thing for multiprocessors.

Starting point is 00:54:30 Multi-processor came out. We have two processors in the big data center. Who's going to care about that? In your pocket now you have 16 processors. It just works for everybody now, right? Data trends shrink, go different places. but because we solved it for in a generic way, we forced you to solve in a generic way,

Starting point is 00:54:47 but you contributed it in a selfish manner. And that's a good way. The IBM knew they could put money into it, hire developers, and get the money back. So it was cheaper for them in the long run to do that. And they make money selling support and selling hardware. Red Hat makes money selling support. And that's like that.

Starting point is 00:55:04 Intel makes money selling chips. And that's who contributes to Linux is the people who they want to sell a different product. Now, one other thing that's interesting about efficiency, we have 4,000 developers contributing, some of them only contribute one chain, some of them contribute a bunch, three to 500 companies per year, we're talking about per year. If you told me, like, this is inside a kind of commercial company, a tech company, you know, I would assume that in order to make this work, oh, for 4,000 developers, you know, we probably need to hire 400 p.ms. We'll have, we'll have about, for every 50

Starting point is 00:55:38 developers, we'll have about 80 TPMs. This is how it would run. Like, you're laughing. No, I know I've been there. You've been there, but I only come from here. Now, one thing you told me, because I was asking how many project, or are you very technical project managers you have, and you said zero. How? Well, so in a way, the project managers already happened on the back end before the patches got to us. So at a company, say, IBM, I want to solve this problem. They've said, how do we solve this problem? Let's put this task, let's figure it out. And then the hatch has come out to us. So we don't see that. So we just see the feature when it lands on us. That's fair. So they're there for working for the individual companies to get their thing.

Starting point is 00:56:17 Sometimes, sometimes they're not. Sometimes they're just developers are spitting things out and like this person who needed to get a new device ID. It saves company time and money if they contribute their changes upstream than to keep it as a fork because they have to keep maintaining that fork. So wise companies have realized let our developers work upstream, do what they need to do there with limited project management. And it just works out better. And again, we're only taking things when they're ready. We're not having to track. We do have tools.

Starting point is 00:56:44 We set everything through email. We have tools like the networking subsystem has a webpage. You can go to see what the status of your patches. If it's past all the CI, if it's been reviewed by the maintainer and things like that. So we have a bunch of automatic tools based on top of email that will help you out. And those project managers can go look at those if they wonder what the status of their employees' patches were at, things like that. But yeah, it's just a different model. But it's not like they're there.

Starting point is 00:57:06 they're hidden behind the solution for that company. It's fair, and I think it's also good to remind that that's the case. But I feel Linux still figure it out a way to just focus on just root-less efficiency with automation, with focusing on the work when it's done. So as you said, all these things do happen, but they happen before. And then you can, you know, like this part of the process will just be more efficient by design. Yeah, and we also, but once a year we get together, the core maintainers, and we talk about not technical things

Starting point is 00:57:36 because we can't have enough technical people in the room for a topic. We talk about process. Is our process working? Is it not working? And we refine it and say, oh, maybe we need to do this a little better. Oh, wouldn't it be nicer to do this?

Starting point is 00:57:45 Hey, we need more testing over here. Hey, can we do this type of stuff? So we do, we talk about our process all the time. Famously, the lead up to that meeting is a public, another public mailing list that we all talk about processes. And that process, that once a year bike shedding of our process in public, it helps shake out a lot of things

Starting point is 00:58:02 and work out. And there are problems. I'm not saying this development model is perfect. It works really well. One thing that's odd about Linux is that we keep going as fast as we are. We're running at 9 to 10 changes an hour. In the stable kernels, we're running 30 changes a day, 30 to 40 changes a day. 10 CBEs a day.

Starting point is 00:58:20 A bug at our level is a CVE almost. Yeah. So the CVs are the critical. It's a security bug. It's a vulnerability. That can be as stupid as memory leak somewhere or I rebooted the machine. I took over and got permission. I don't know, when I create a CV, I can't, I don't know how you use Linux, so I can't tell the severity of it.

Starting point is 00:58:41 But I can say, here's a bug. You should look at this. So we're responsible for that. So we're running at a huge rate of change. Most large software projects have a huge ramp up, and then they plateau with developers and rate of change and whatnot because they've solved the problem. Linux has never solved the problem. And I used to have, I had a manager at IBM every year he'd come to me and said, hey, is Linux done yet? I was like, no.

Starting point is 00:59:00 It took me 10 years to finally come up with the answer of, it'll be done when you, you'll be done when you stop making new hardware. And when they stop making new hardware or having different work classes, then we'll stop. But we're one of the few projects that keep having to add new features because of new hardware. We're not doing it just because, I mean, Linux has been working for all of us for 20 years.

Starting point is 00:59:18 We're doing it to support new hardware, to support new use models to support things. We don't add things for fun, generally. We add it to solve a problem that somebody had. Most of Linux is written using C or C++, right? No C++. Just just C. Just C.

Starting point is 00:59:32 And I guess for some hardware drivers, is there assembly ever involved or no? No. Assembly will drop down into the early boot of a processor and then some core functionality like locking and that drivers or other people will call. Basically, we'll go down like string functions and whatnot. We'll go down to good assembly language that's tuned for the different processors. Also, when you boot Linux looks at the processor you're running on, patches itself to figure out the best functions that those assembly would work and then it continues on moving. Which is crazy. It patches itself at boot time. So hold on. But some of that is assembly?

Starting point is 01:00:08 Some of that's assembly in the very beginning. And some of those low-level functions, but drivers don't never touch. Okay. So basically, like, from a Linux computer, but it's all C. Now, you know, one thing that actually, the way we started talking is there is a proposal to introduce Rust because it's just more memory safe. It's also a language growing in popularity. And some people would like to do more Rust development. What is your take on this? Do you think Linux at some point might support Russ or, you know, what are your, what is you're thinking of doing things outside of C? So we have 25,000 lines of rust in the kernel already. Oh. We do. Okay. Awesome. Yeah. So most of that is just bindings. There's no real functionality.

Starting point is 01:00:51 In the latest release, if the kernel crashes, it'll put up a QR code. You can take a picture of it to get the crash dump. That code was written in Rust. That's in Rust. So the Rust, so the Rust for Linux developers been working for a long time. A couple years ago, they came to us and said, we think we're ready to do this. Do you want it? And we said, yeah, let's try this experiment. You're willing to do the work? Who am I to tell no to? I mean, it's classic Linux. Yeah. I mean, it's, it's now the problem with Linux and Rust is it would be easier to write a core piece of Linux and Rust than it would be to write a driver. A driver is consumed from everywhere in the kernel. So you want to talk locking. You want to talk input and output. You want to talk

Starting point is 01:01:32 talk to the driver model, talk to the USB port, all this stuff. Drivers can be really tiny because they take resources from the rest of the kernel. In Rust, you need to have a binding between the C code and the Rust code. There's an intermediate layer. The kernel in C has these very opinionated model ideas of how it handles objects and how it does memory and how it has its memory model. Rust has its very opinionated model of how it does this type. Same idea.

Starting point is 01:01:57 This meshing is tough. This meshing is also the most crazy complex Rustic. you've ever seen. So from a new Rust developer, like me, I can barely read the bindings, but I trust other people are doing it. So yes, so the trick is we now need to write a binding for every different part of the kernel in order to write a RustCo, a Rust driver. If you want to do the QR generator, that's simple. That was this one function. So over the year, the past couple years, people have been writing bindings to try and do things. We've had a bunch of example drivers, like a new disk driver, this rider driver in C versus Rust.

Starting point is 01:02:32 It turns out there are still some performance issues with Rust code versus C code because we can do some tricks in C code that they can't do yet in Rest. That's on the tooling and the rest developers are doing it. The core Rust developers that in the language, some of them are Linux kernel developers. They've always wanted to rest to be working for Linux. The Rust model is good. Memory safety at our level does not mean that you can't crash the kernel. You can still override things.

Starting point is 01:02:55 Memory safety and Rust just means the memory that you pass around, you think you have an ownership of or it isn't an ownership of. And when things go out of scope, they'll get cleaned up properly. So I've seen every single kernel bug for the past 18 years. Half of them will be fixed with rest. It's just going to be fixed with rest. It's the stupid one-off bugs. It's the, oops, I overwrote an array, and I didn't realize it by one.

Starting point is 01:03:19 Oops, I forgot to clean up this error path. I forgot to unlock this lock. It's stupid little things like that. There's logic bugs, of course. You can write logic bugs in Rust. You always have those. Right. So, but famously, the QR code that didn't rust that made the QR.

Starting point is 01:03:35 C passed into the Rust Code, a pointer to a buffer, and the buffer size. The Rust code forgot to look at how big the buffer was. And it's scribbled right over memory. So you can write memory and Safe Code and Rust is live. And you can crash things in Rust. So memory safety here means it's the safety of object life cycles and things like that. It doesn't mean it's going to remove all bugs. It's not a golden bullet or anything like that.

Starting point is 01:03:58 Silverbolt. But I think, yes, I think Russ needs to come in because it should be easier to write drivers in this stuff. We have a lot of issues with lifetime rules of when you yank out a device, devices are dynamic. And dealing with these reference counting of things like that is very tricky to get right. There's parts in the kernel. We still do not have it right. And we know we don't have it right. Rust is forcing us to actually document our C code better. And it's cleaning up. So if Rust disappeared tomorrow, I've had to clean up code in the driver core that's like, oh, yeah, I guess we can do things better and safer in the C code in order to make Rust easier. And we have. And so it's making us rethink how we do a lot of our existing code in the kernel.

Starting point is 01:04:40 To be fair, a lot of core kernel people are pretty resistant to that. They don't like change. Don't like different languages. One core kernel developer said, I don't like working with a project that has multiple languages just because it's tricky. And they are free to do that. They're not stepping on anybody's toes. A lot of it's miscommunication, and a lot of it comes down to people again. Honestly, in this binding, I wrote the driver core many, many years ago of how drivers work in the kernel.

Starting point is 01:05:08 There had to be a binding for that in Rust. This code I saw, I said, this is horrible. This isn't going to work at all. It's miserable. I went and actually met with the developers. And there's a Rust Linux conference. We sat down. I think they gave a whole presentation just for me.

Starting point is 01:05:23 Turns out I was wrong, and they were wrong. We both were wrong. And they were doing crazy things. They had a thousand lines of C, a Rust code that I do in two lines of C code. I'm like, well, why? They're like, well, we didn't want to change the C code. I'm like, we can change the C code. Because I just did that because it was easy in C.

Starting point is 01:05:38 But if I changed that, you get rid of a thousand lines of rust. Let's do that. And then again, it comes down to, okay, understanding what your problems are, understanding what my problems are. Let's work together. And now we have bindings in a kernel that you can actually write some drivers with. And the Red Hat developers are starting to write the new NVIA GPU drivers in Rust. and they're starting to put the proposals out there.

Starting point is 01:05:57 The Apple GPU drivers are for the Apple MacBooks are ridden and Rust. Those patches are not merged, but they're ridden to rust and prove on a fork. That works great. There's a whole bunch of crazy object life cycle issues with graphics drivers, and Rust makes it a lot easier for them to do. I think you'll see a lot more of the simple, stupid drivers for hardware devices being riddened Rust, because all they want to do is reading my two some random memory bits. And it's really easy to do that in Rust, and you can do it,

Starting point is 01:06:23 actually less code than you can do it in C code. And I think that's good. We now have the infrastructure in there. So I think we've hit the tipping point. We'll start seeing new stuff in there. And we need to do that. I mean, there's mandates from governments that you can't use memory unsafe languages

Starting point is 01:06:36 like C in products. And if I want to see Linux to succeed, which I do, we're going to have to change. And I can say, going forward, if you want to write and rest, you can write in rest. Now, that being said, we still have 40 million lines of C code. So we have some very, very good developers

Starting point is 01:06:49 are out there working on mitigating the problems we have in C. We now have bounce checking. for our stuff. We now have other, we call them seatbelts and airbags that protect your C code from doing stupid things. And we're working with the compiler authors to add new extensions to see and make things safer for the C code because we want to protect the code that we have today because you're not going to rewrite code in Rust. Don't worry about that. Google famously published something recently saying over the past couple of years, we've ridden our new code in Rust,

Starting point is 01:07:15 and we got overwhelmingly more secure because we didn't touch the old code and bugs degrade over time. There's still going to be bugs in the older stuff, but most bugs happen in your new code, not in your old code. That's awesome. I'm sensing you're excited about Russ, and I, it's also just nice to see the evolution. Yeah, it's evolution and see what happens. And if it fails tomorrow, we can rip it out and whatever. But we have developers willing to do this work for us. It's not intruding on other people's stuff.

Starting point is 01:07:40 Well, I think it does go back to what you said earlier is I understand that a big part of Linux is, like, show the work. Like, if it works, and same thing, you know, it sounds like that's how Russ started. It's also how it's progressing. People are showing that it works. They're proving that it works. It solves their problem. It maybe even works better for them. And then you know, step by step.

Starting point is 01:08:01 Yeah, like, people are like, well, why not Zig or Hare? Those are other good languages. I'm like, that's great, but nobody's proposed to. Yeah. So, yeah, if they want to do that. And fair, I think those developers who work on those languages don't care about Linux, which is fine. They don't have to.

Starting point is 01:08:12 So looking ahead, outside of Rust, what are other things that you're kind of excited about that's coming in Linux? either projects, changes. I don't know if... We haven't said LLMs except for once here. I don't know if that, for example, will LLMs have any impact on how development is done? No.

Starting point is 01:08:37 There's not... I mean, they're all trained on Linux kernel code, so you write out another driver, but LLMs are great for writing boilerplate code and things like that. In Linux drivers, you don't have much boilerplate code because we've stemmed that down into the core and made that work better.

Starting point is 01:08:51 LLMs are used to find bugs and find the bugs fixes to match that we should be taking. But again, we've published papers on that for eight years. There's been lots of research on that. So we've been using that for a while. I mean, LLMs just applies statistics, right? So it's just pattern. Pretty much. So code for us at this level, it doesn't matter that much.

Starting point is 01:09:14 So no. And then as far as I don't know what's coming tomorrow because I just see what people send to me. So we don't have a plan. I mean, we always joke, you know, Linux is evolution, not intelligent design. It's just whatever shows up, right? Because you're solving your problem, we'll figure out how to fit it in there with everybody else's stuff. And make sure it doesn't work on it. People are working on new features.

Starting point is 01:09:34 I mean, Linux is people are like, oh, it's an old model. It's the old Unix model. It's like, yeah, we can run code from 20 and 30 and 40 years ago. But we can also run new stuff. We have new features. We have new IOP paths that are even better. We have new types of functionality. We have new security models.

Starting point is 01:09:48 We have new capabilities. We have new types of stuff for the new stuff. But we didn't break the old stuff. So we can do both stuff. You can rewrite your code. I know the databases are rewriting it to use IOU ring, which is a new way to do I.O., which gets the user space to kernel boundary out of the way and does faster pass. So they're speeding up the databases by porting to new Linux features.

Starting point is 01:10:08 But their old databases still run just fine. And so it's like people look at it like, oh, nothing's changed because the old stuff still works. That was the goal. The old stuff still works. So I don't know. Just see what new, I mean, new hardware features. I see the new hardware coming all the time. We get told by the CPU vendors, like, look at this new chip.

Starting point is 01:10:27 It's like, great. So that's always fun. And then in terms of contributing to Linux, we just went through this example. And it seems pretty easy to contribute, honestly. Like, you know, I wanted to ask on advice to contribute, but my sense is just do it. Like, it's not that difficult. But from a professional point of view, like, what? What do you think a developer who, you know, is building other stuff at a company,

Starting point is 01:10:53 what would they get professionally out of contributing even one change or a few changes to Linux? Like how could their outlook change or what could they learn? Well, the best thing is it's your resume. So I talk to college students. I talk to college students of Vue, other universities all the time. Say, hey, contribute to the kernel. You have time. And then when you go to get hired, somebody can look at you.

Starting point is 01:11:16 Say, oh, yeah, look, you do play well with others. And you did contribute other stream because when you come in as a company, you're not writing code from scratch. You're working with other people. You're working with existing codebases. If you contribute to Linux, or any open source project, you show that you can work with others. You can work with existing codebase. So it shows a great skill set. When I hired people, when I was at IBM, if you contributed versus not, it's like, oh, that's an easy.

Starting point is 01:11:37 So I'll rather take that. So from a personal point of view, contributing to get a job easier. Get a next job. For another point of view is from an engineer, you get to learn new things. I wrote my first driver and sent it out. So, oh, here it is. It's all perfect, whatnot. Everybody's like, no, this is wrong, this is wrong, this is wrong.

Starting point is 01:11:51 And you've ever heard of multiprocessors? I'm like, what? What is all this? And that's great. From an engineering point of view, I want to know better. I mean, the Linux kernel developers, you can never have all the best developers in the world of the same company. But when in open source, we can all work,

Starting point is 01:12:04 the best operating system, people can all work on the operating system together. So the depths and talent of the people that are working on Linux is just amazing. Take advantage of that. I'll say the Rust developers that are working on Rust for Linux, are core rust developers. These people are really, really, really good. They maintain core parts of the rust infrastructure. Take advantage of them.

Starting point is 01:12:23 I mean, I'm learning so much from them. So from an engineering point of view, there's these people that are really out there and willing to help you and grow as an engineer and learn different processes and learn different skills much better. I mean, I learned so much more working in the community that I ever did working at companies

Starting point is 01:12:40 because you have better review process, you have more exposure to crazy corner cases that you hadn't thought of. That, oh, yeah, in the real world, yes, that would have been one in a million, but we do have to take that because we have a million boxes, four billion machines out there. Plus, I guess, the more curious you are, everything is open in Linux. So I remember when I joined Uber, I was just amazed by the RFC process.

Starting point is 01:13:01 And internally, I could read all the RFCs, and I spent like a week or two just kind of reading and, you know, trying to take it all in. In Linux is here. Like, like, any, obviously, it's overwhelming if you just started at once. But you can, like, target something. And so you can just even if you contribute little or even before you contribute, you could just learn. You can see how the changes are made. You can try to understand these things.

Starting point is 01:13:22 Yeah, I will say it's not the best learning operating system. There's really good learning operating systems out there. We're not those. That being said, I mean, people still write academic papers about it and all this stuff. We want to read the schedule or do all this fun stuff with it because it is a real-world tool. I mean, I learned from Minix and Linus learned from Minix, which was a learning operating system. And then we took those ideas and that and we made Linux with it. I mean, Linus did it way before me.

Starting point is 01:13:47 But learning operating systems are great, but working on a real world one system is a little bit different. That being said, there's parts of the kernel that are very easy to get into for newbies. We have a whole section of code with really bad, crummy drivers that are the wrong coding style. They have the wrong formatting. They have just a lot of dead code. That's there for beginners to take up and take your first patch. fix the spelling mistakes, fix the coding style, learn how to do this stuff.

Starting point is 01:14:14 And there's a whole website, kernel newbies.org is a wiki that has a whole bunch of stuff on how to write your first kernel patch. How to get involved. I've given old YouTube talks if you search how to write a kernel patch. I need to do a newer one.

Starting point is 01:14:25 It's fun. I've gone to universities and said, I gave everybody a file that you're going to write a kernel patch for this file. It's like what? Okay. And they do it and by the end of the class,

Starting point is 01:14:33 end of two hours, they send a patch off and they got accepted. You know, it's very simple. It's not a difficult thing to do. And we want new people to get involved because we don't know who's out there or what they can contribute. If you just want to do something for fun or do something for real, it's great. Awesome. Well, this has been really interesting.

Starting point is 01:14:53 And I just like to close off with some rapid questions where I just ask and then, you know, you do what comes in mind. What's the most memorable patch that you've contributed to in Linux? So this is going to be about people again. Early 2000s, we were starting to get Microsoft saying Linux is a cancer. We're all worried about the next. Oh, I remember. Yes, we remember that stuff. We started getting some really, really good patches for some hardware that we really didn't

Starting point is 01:15:18 know that well. It was showing some really good stuff. And it was like, this is really good. And we're like, where did you get this information? How did you know this stuff? Is this like somebody trying to sneak this in? And the person wrote back and said, here's how I found this. There's how I tested it, how they did this.

Starting point is 01:15:33 I'm like, okay, all right, we took this. And over time, we took all this patches over time. And then we have this conference once a year for all the maintainers. and you get invited to it and we're like, oh, this gives them an invite because it was really good. And it was in Canada every year for a number of years for some reason.

Starting point is 01:15:48 And they came and they showed up and he showed up and he's like, oh, sorry, I had to bring my mom because he was in high school. He was 17 years old. None of us knew. And he contributed. And it was like, okay, great.

Starting point is 01:16:00 And it turns out he later went on to MIT and now he's a professor at Stanford. Wow. Yeah, all you see is an email address in Linux. Yeah, so Adam, though. It's like, okay. Yeah, it's like that. I mean, it's things like that.

Starting point is 01:16:12 Wow. That's good. Another one I'm really happy about is there's lots of drivers that are been sitting outside the kernel tree for many, many years. Just because people never got them upstream or went on. One of them is the subsystem to handle Braille keyboards. So Braille displays. Yeah. Just feel that those are outside the tree. I and a couple other developers worked with those people and got them in the tree and got them working.

Starting point is 01:16:33 And now they're shipping with all devices. So we made sure that these people who are always having to patch. this auditory stuff because these devices only needed for a very siny subset, but now it's available for everybody. I'm very happy to see that. Wow. And I guess this goes back to Linux of what you said, why companies will contribute a few developers per year, because now when you take Linux, you get an OS, for example, that also has rail support. Like, you know, that itself,

Starting point is 01:16:59 like adding it to an existing product or if you built an OS, that itself would be a massive undertaking. Yeah. So now it supports all the devices out there. Awesome. What's your favorite programming? language? Still C. I mean, I've been doing C for, what, 30 years every day? So, yeah, C. Yeah. I've been doing a lot of rest lately. Rust, I feel like I'm able to write really sloppy code and it's, and it works. I don't feel like I have to be as precise as C, which is, I don't know if that's good or bad. And what are, what's a book or two that you would recommend reading? The old code complete book was a really good one. That was a

Starting point is 01:17:37 really good one. It taught me that coding style matters. It doesn't matter what the coding style is. It's just a generic set coding style matters because our brains work on patterns. As programmers, we're reading patterns. And when the patterns the same, the metadata goes away and we can see the logic easier. Code Complete is aged a little bit weirdly. If you look at the first book, it has a lot more C examples and whatnot. But it talks about the basics behind programming and a lot of stuff. And that was a really, really good book. On the flip side, another really fun one. that's really tiny programming pearls. And it's like bit fiddling and cute little algorithms and neat stuff like that,

Starting point is 01:18:14 which surprisingly we still do today. We're talking about adding parity functions in a common way. And everybody's like, no, if you do it this way, you do it this way, it'll be faster in this. So we're still messing with these things that people will have messed with for 40, 50, 60 years. And these things still matter. And they matter to people because cycles matter and power matters and things like that. So between those two, those are my favorite ones. Well, this is awesome.

Starting point is 01:18:36 This has been such an interesting and, like, for me, just really educational and eye-opening chat. So I'm glad we did it. Well, thanks for having me. I found this episode to be a really interesting one about Linux. I'm still amazed that an open-source project managed to become the most widespread operating system in the world, despite not being a commercial business. It's such an interesting and inspiring project.

Starting point is 01:18:56 You can find Greg on social media as linked in the show notes below. And if you'd like to try your hands on contribution to Linux, visit kernelubes.org. For more deep dives related to back in engineering, check out the pragmatic engineer articles, linked in the show notes below. If you enjoyed this podcast, please do subscribe on your favorite podcast platform and on YouTube. This helps more people discover the podcast and a special thank you if you leave a rating. Thanks, and see you in the next one.

The Pragmatic Engineer - How Linux is built with Greg Kroah-Hartman

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.