The Pragmatic Engineer - How Linux is built with Greg Kroah-Hartman
Episode Date: March 19, 2025Supported by Our Partners• WorkOS — The modern identity platform for B2B SaaS.• Vanta — Automate compliance and simplify security with Vanta.—Linux is the most widespread operating system, g...lobally – but how is it built? Few people are better to answer this than Greg Kroah-Hartman: a Linux kernel maintainer for 25 years, and one of the 3 Linux Kernel Foundation Fellows (the other two are Linus Torvalds and Shuah Khan). Greg manages the Linux kernel’s stable releases, and is a maintainer of multiple kernel subsystems.We cover the inner workings of Linux kernel development, exploring everything from how changes get implemented to why its community-driven approach produces such reliable software. Greg shares insights about the kernel's unique trust model and makes a case for why engineers should contribute to open-source projects. We go into:• How widespread is Linux?• What is the Linux kernel responsible for – and why is it a monolith?• How does a kernel change get merged? A walkthrough• The 9-week development cycle for the Linux kernel• Testing the Linux kernel• Why is Linux so widespread?• The career benefits of open-source contribution• And much more!—Timestamps(00:00) Intro(02:23) How widespread is Linux?(06:00) The difference in complexity in different devices powered by Linux (09:20) What is the Linux kernel?(14:00) Why trust is so important with the Linux kernel development(16:02) A walk-through of a kernel change(23:20) How Linux kernel development cycles work(29:55) The testing process at Kernel and Kernel CI (31:55) A case for the open source development process(35:44) Linux kernel branches: Stable vs. development(38:32) Challenges of maintaining older Linux code (40:30) How Linux handles bug fixes(44:40) The range of work Linux kernel engineers do (48:33) Greg’s review process and its parallels with Uber’s RFC process(51:48) Linux kernel within companies like IBM(53:52) Why Linux is so widespread (56:50) How Linux Kernel Institute runs without product managers (1:02:01) The pros and cons of using Rust in Linux kernel (1:09:55) How LLMs are utilized in bug fixes and coding in Linux (1:12:13) The value of contributing to the Linux kernel or any open-source project (1:16:40) Rapid fire round—The Pragmatic Engineer deepdives relevant for this episode:What TPMs do and what software engineers can learn from themThe past and future of modern backend practicesBackstage: an open-source developer portal—See the transcript and other references from the episode at https://newsletter.pragmaticengineer.com/podcast—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@pragmaticengineer.com. Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe
Transcript
Discussion (0)
There's a nine week release.
Every nine weeks, there's a new release going out, right?
So, Linus does a release, just point in time.
And then the merge window is considered open.
And then for two weeks, all the maintainers send Linus, all the stuff they've had pending
from the last release.
We have two weeks to add all new features.
And then he does release candidate one.
From thereon, it's bug fixes only for the next seven weeks.
So it's bug fixes only, bug fixes only, it's regression fixes, we'll revert things.
No new features.
Do I understand correctly that in the case of Linux?
Is this a thing where every nine weeks there will be a release?
It's time-based.
So we have that two-week window of merging all the new features to lean this
that have been in our tree and accepted already and proven to work.
And the window is short, nine weeks.
We used to have three-year-long development cycles.
And the problem there is, even if you have six-month development cycles,
there's that fear of you have a feature.
I want to take your feature, but it's not quite ready.
Do I want to wait and things like that?
But if you know that you can get your feature in in nine weeks from now,
and it's just not ready, it's not ready.
The pressure is off me as a maintainer to take your new feature until it's ready.
Linux is the world's most widely used operating system thanks to powering most Android devices, servers, smart TVs and embedded systems.
But how is it actually built?
Today we set down to Greg Crow Hartman, a Linux kernel maintainer for 13 years, who is one of the three Linux Foundation Fellows.
In today's conversation we cover details on how widespread Linux is and why mobile versions of Linux have three times the lines of code as a server versions.
What exactly it takes to get a change except to the Linux kernel and merged by Linux
show awards himself?
How Linux manages to have 4,000 contributors per year yet have no product managers or project
managers.
And many more details.
If you're a software engineer, you will use Linux directly or indirectly, and this episode
will help you understand why it's so widespread and how it's a lot easier to contribute
than most people would assume.
If you enjoy this show, please subscribe to the podcast on any podcast platform and on YouTube.
Thank you.
So, Greg, it's really just nice to have you here because you're one of the most well-known Linux contributors, one of the longest standing ones as well.
So just welcome to the podcast.
Thanks for having me.
I think as software engineers, we know Linux is important in the sense of it's running on most web servers that we use and run.
It's a desktop OS that some people use.
And it's, of course, you know, powering, a fork of it is powering Android.
But what is there to know about Linux?
How big is this thing?
How complex is this thing?
Well, it's, yeah, it's an operating system.
So it's a kernel.
We took over the world without anybody noticing.
I joke, it's Android devices are 4 billion Android Linux users out there and they don't realize it.
Everything else is a rounding error, which doesn't make the server people happy with me.
But it's true.
It's in everything.
It's an all-de embedded devices.
It's in the air conditioning units, car electric charging ports, satellite.
It runs the international station.
Really?
Yeah.
Air traffic control for Europe and probably the U.S.
All the financial markets.
I don't think.
It's in the cameras that we're using.
So it's, yeah, I don't know of any place it hasn't taken over.
The number of the top five selling laptops for the past 15, 10, 15 years, Chromebooks.
Those are all Linux-based.
Not Apple, but Chromebooks are.
Oh, iPhones.
So every 5G modem out there is running a copy of Linux.
Really?
Yeah.
Wow.
So now with Apple doing their new chip, I don't know if it's the new one, but Qualcomm, all the 5G modems, probably the 4G, I'm not sure, but I know all the 5G modems have Linux inside it.
This episode was brought to you by WorkOS.
If you're building a SaaS app, at some point your customers will start asking for enterprise features like Sammel authentication, skin provisioning, and fine grade authorization.
That's where WorkOS comes in, making it fast and painless to add enterprise features to your app.
Their APIs are easy to understand, and you can ship quickly and get back to building other features.
WorkOS also provides a free user management solution called AuthKit for up to 1 million-monthly active users.
It's a drop in a replacement for Alt Zero and comes standard with useful features like domain verification, rule-based access control, bot protection, and MFA.
It's powered by Radix components, which means zero compromises in design.
You get limited as customizations as well as modular templates designed for quick integrations.
Today, hundreds of fast-growing startups are powered by WorkOS, including ones you probably know,
like Cursor, Versal, and Perplexity.
Check it out at Workos.com to learn more.
That is WorkOS.com.
First of all, I'm just reflecting on why I never kind of, you know, like thought about it like this.
Because in my mind, it was always like, you know, Debian, a Red Hat, it's on a server site.
And maybe that's because that's where I actually see.
it is. Of course, you know, there's the, I'm a, right now, I'm a Mac user and there's the Unix
influence, which is an influence, you know, it gets pretty close. I think it's a good time to reflect
on how many things it actually runs. In terms of the kernel itself, like how large is it?
I know, you know, for different devices, it'll be spread differently for server-side Linux, for
an Android device, it'll use different parts of the kernel. How big is this in terms of contributors,
lines of code.
I know lines of code is not a great measure,
but it is a measure.
So we have just under 40 million lines of code right now.
That's a lot.
That's all the kernel.
That's the kernel.
The core part is like 5% of that that everybody runs.
And then everybody, the rest of it is hardware support.
Different drivers, different devices,
different architectures, different chips.
So your laptop runs about two,
two and a half million lines of code.
Your server runs about one and a half million.
Servers are really easy.
Those are very simple.
Wow.
your phone runs about 4 million.
Your phone SSCs are in those complex pieces of CPU and interaction out there.
They're just crazy complex chips.
Why is it?
Can we just pause for a second?
So, again, lines of code we know is not a perfect measure of complexity, but in this sense,
comparing it between the two of them with the same code base is somewhat.
So you said roughly, give or take, a server is well and a half million.
A phone is four million.
Like three times the lines of code for a phone.
Why the difference?
Even though I would think that the server, you know, does all this mission critical stuff.
A server is really simple.
CPU and a network card and storage.
And storage.
That's it.
So, SFC on a phone has, you have power control, you have clocks, you have five different buses on there talking to different types of devices.
You have battery control.
You have talking to your modem.
You have another version of Linux in the modem.
You got USB out the back.
You got USB bypass to talk to the audio side.
You have audio drivers.
You have a zillion different clocks and fives and all sorts of stuff in there.
And it's an eight core machine.
There's eight processors and nothing.
Those are not trivial things.
And sometimes those processors are different sizes.
So you have big and little sizes, which add the complexity just for some control, for some power management.
But they all run the same core of the Linux, but it's the drivers and the devices and things like that.
So your pixel phone.
I look at pixel phone.
Google ships a core kernel that all Android.
devices pick, not hardware specific, just says RM64. Pixel has 300 other drivers they add to get
the pixel phone working. I mean, some of these are tiny. This is for this tiny chip, this is
this tiny. But your phone is really one of the most complex beasts out there for software.
Is it safe to say that, you know, the complexity and, you know, the lines of code will to some extent
scale what that has to do with the hardware, the capabilities, and, you know, not about, you know,
like how mission critical? Because, of course, it needs the phone was.
needs to be stable. The server needs to be stable. My TV needs to be stable. So, you know,
that's just kind of a given, right? Yeah. Oh, and all TVs for the past 15 years are all running Linux.
Oh, so my Samsung TV is running Linux. Oh, yeah. Same, my Samsung, my Samsung washer and dryer
running Linux. So your Samsung watch is running Linux. So Samsung has their own, yeah, distrail.
All works really nice. It's all due down to the complexity of the hardware. So the kernel controls
the hardware. The job of Linux is to make all the hardware look,
diagnostic to programs.
So you can write the same user space program and run it on the same, on different
hardware, and it just works.
A kernel's job is to manage memory and devices in a common way and provide that to
user space.
It's not a, we joke from the kernel that user space is just a test load.
But, I mean, it's a tool there for you to actually solve your problems.
So when you're running servers, you're wanting to put message through and network and storage
and stuff.
That's your load.
And that's what they're there.
for a phone, you want to control a display, you want to talk out the modem, you want to talk on the thing, you want to listen to audio.
Yeah.
Lots of different things there.
And I'm just saying want to touch back on the kernel because, like, I'm not a Linux developer.
Like, I know, you know, I've heard of the kernel.
I, in my assumption, it is the critical parts as, as you said, the, you know, the thing that runs immediately and then it will, you know, the user space will run on top of it.
But what is the differential?
or what makes a kernel?
And you said it's about 5% of all of these things.
How do you split this?
Or is there a definition of, again, you're a kernel of developer.
So I'm trying to get a sense.
How can someone who I'm, let's say I'd like to contribute to Linux and understand how it is.
Eventually I'm going to figure out what this kernel is.
But what is it?
What makes kernel and non-colonel?
So kernel versus user space.
So there's an idea.
Chips have a protected mode and a not.
protective mode.
In a very simplified way.
There's different levels of protection.
So the protected mode is where the operating system runs, the kernel.
And that is where we share all the resources.
It's one flat address space.
Got it.
And we are not isolating processes.
So a user space process then runs on top of that and we isolate them.
And they all individually think they have the whole machine, but they don't.
So it's multitasking.
You can run multiple programs at the same time.
And the kernel is there to give you memory, to give access to storage, to give access in a common way,
to give access to the network in a common way, to give, or provide the pipes to go around the network stack in the user space.
Some people don't like using Linux as network stack.
They have their own.
To provide a way for all your different mice to show up to user space in the common way.
We know all the different mice.
USB to storage devices, your graphics controller.
We provide a way to make it so that user space can talk to the kernel in an agnostic way.
And their stuff will just work because all the graphics work the same interface.
We talk to keyboards all the same way, things like that.
So it's a commonality of providing a shim layer above the hardware.
And then, for example, drivers, do they always live in the user space?
No, all the drivers live in the kernel.
So the kernel and drivers are all.
Linux is not a micro-cernel architecture.
It's a monolithic.
So the code is all in the same address space.
So a bug in any one of them has a chance to take any part of the kernel down.
So Linux ships all the drivers for all the architecture in one big tarball.
That's 40 million lines of code.
Other operating systems try and go out there and had split things off.
So the core windows is their kernel, and then you could put drivers additional on the top.
We tie everything together in one big giant blob.
There's a still monolithic.
Any driver in theirs can crash the kernel within reason.
In that way, we can refactor the way the interfaces between drivers and the kernel are.
Linux drivers are on average one-third smaller than other operating system drivers.
because we can see the commonalities.
If you send three different drivers for three kind of same hardware,
well, let's combine them all, make it smaller,
and refactor things and make it easier.
And, oh, let's change this API.
And this has to do with the open source approach, right?
That you see it like that.
So we see all this common code,
and we can refactor it, and we can make it better and cleaner.
And we're not tied to any fixed interface.
Our fixed interface is between the user space and the kernel.
We will not break that.
That's our guarantee.
We've guaranteed it for a long, long time.
And so we always want you to be able to upgrade your kernel and not feel worried that your old programs are going to crash.
So you should always be able to upgrade.
That's our guarantee to you.
If it does break, then it's our fault.
We'll progress.
There are some exceptions.
There's some gray areas.
There's some really low-level parts between the user space and kernel that we kind of work around.
And we argue about these all the time.
But we never try and break user space on purpose.
A lot of times we do accidentally, we'll fix it up.
That's our only really rule of kernel development.
Don't break user space on purpose.
And so when we're talking about the 1.5 million lines rough for server, we're talking about the kernel.
Yes, kernel plus drivers.
Kernel plus driver, because it is part of the kernel.
And then you have this 4 million line of tarball, tarball, and then every platform will kind of take their parts of it.
They'll take what is relevant for their use case capabilities, drivers, you know, other parts.
And then this is why, I guess, you know, Raspberry Pi, you're going to say it's going to run.
It's laying on Linux, right?
Of course.
Oh, yeah, yeah, Raspberry Pi, yeah.
Those things are everywhere.
That's what's in all the electric charging stations.
Those are Raspberry pies.
Really?
Yeah, where you plug your car into?
Yeah, it's Raspberry Pi.
Yeah, those are all rats.
Wow.
Because it's a really cheap industrial thing.
Lots of signages now.
Those are all running Raspberry Pi's.
I guess it was safe to assume they're not running Windows to be fair.
No, yeah.
So the Dutch signage for the trains, those are all running Linux.
Sometimes you'll see a crash Linux machine up there.
Can we look at a specific example of how development actually flow
through with a specific patch.
Before I showed a specific example, so we had 4,000 developers last year.
So they make a change.
So those 4,000 developers will send an email to a maintainer,
and a maintainer maintains a subset of the kernel.
Every part of the kernel is owned by somebody.
And then you are one of these maintainers as well.
Yeah, I'll maintain some drivers and things like that,
but then those maintainers send things off up the tree to a subsystem maintainer.
So like USB serial, then we'll get sent up to USB, and then USB will go to Lieness.
So it's kind of a pyramid scheme that way.
We have that.
So we have like 800 maintainers, and we have the middle section.
We maybe have about 200 different trees there.
And then in our testing environment, all those trees are tested every day.
They're all merged together and things that happen, whatnot.
So we have this kind of hierarchy of developers and maintainers that way.
And part of the hierarchy is the human aspect.
So if I take code from you as a maintainer, I'm now responsible for it because my name's on it.
So if it's a simple one-off or it's a simple driver that nobody cares about, except you, great.
I know you're the only one that's going to be affected by it.
It's fine.
But if it's the core part of the kernel and I take changes from you, now I'm responsible for it if you disappear.
So I have to trust that either you're going to be here or that I understand it good enough that I can maintain it.
So part of Linux development trust or model is trust and it's trust in human interaction.
Like I will take stuff from people.
If they, whatever they send me, because I trust, not that they got to.
right, but they'll be there to fix it when they get it wrong, because we all get it wrong.
And that's the part. So that's the trust model we have. And that we've been burned in the
past by some major features where it landed in the networking core subsystem a long time ago.
And then once they landed and were merged and taken, the email address behind it disappeared.
And the network developers took six months to unwind the mess. So it's hard to change the core part
of the kernel for a good reason, because it affects everybody. And also a good reason in that we want to
make sure that you are going to be there to fix it if he breaks.
Yeah.
But for drivers and things like that, we'll take anything.
Drive-by.
It's really simple.
Yeah.
It's very simple thing that way.
But that's the hierarchy.
So it changed flows up the tree that way.
Yeah.
So I can show that.
All right.
So what are we going to see?
So here is a change.
So this was written by somebody named Chester.
He made a change to the USB serial driver.
It's an option.
The chip is called option.
These are a USB, the serial devices.
They're in modems, there are in lots of different things.
There's a ton of different ones.
And there's no standard for these types of devices.
So you have to add a custom device ID for every single one that you want to use.
It's just the way they work.
So here's the patch.
And here's the description of it.
This is just an email.
The description here is the subject line.
Yeah, USB serial option, adding whatever.
Adding that device.
And then here's, so the hardest part about,
the hardest part about writing a kernel chain.
is the description of what's going on.
Really?
Yeah, I mean, the code is easy.
It's the description explaining it is hard.
You don't explain usually what it's doing.
You have to explain why you're doing it.
For something as simple as this, it's like, it's really easy.
So this person says this driver is part of a cat six modem.
The product ID is shared across multiple modems.
It gives them a little dump of what it looks like in the device.
And then there's some more information.
There's a signed off by line.
And signed off by is what we created a long time ago.
that shows that I have the proper authorship of this and ownership and I give it to this project under the license by which the project is run.
So it's saying I'm licensed this thing under the GPL.
And then way down below is probably the one line patch.
This is all just.
So this is all the description.
This person is giving context on like here's what needs to do about the modem, the different specifications or what.
And here's the change.
So somebody changed this, remove that line.
The red is removed.
the green is that.
Yeah.
So somebody added and had to reformat the lines based on some new ones they added.
And that's it.
So they had a few new device IDs.
And then there's a device idea and then we see a few hex numbers.
Those are like some IDs here.
So for USB devices have a product and a vendor and a product ID.
That's how they're a vendor in the new products.
And then there's some sub device and subproduct IDs.
Got it.
So that's what this is for.
Okay.
So they're saying, we're just adding support for some, the driver already works for these chips,
this chip, but we just have a new ID because a new vendor came along and they wanted to put their own vendor ID on it.
Very common.
So it sounds like this change is as simple as they get in terms of the code change, but still the description was very extensive, right?
So very extensive.
Part of the description was also just here is a dump of the description of the hardware, just so that we can verify, yeah, that is going to match with this.
Got it.
So just it's a, we have tools that create those things.
But yeah, it's a lot of work for four light change.
But this is like if we talk GitHub language, although I'm not sure.
This will be a PR, right?
This is, this should be in the patch itself, not the PR.
So a PR would be, so say you have 10 changes you want to make.
So a PR would be the patch 0 out of 10.
Got it, got it, got it.
Yeah, this is the commit.
This is the commit itself, which is a big problem of why I don't like the GitHub model,
because people don't put the changes in the GitHub,
in the Git commit.
No.
Because the Git repo.
We don't.
So there's a problem.
When you commit the, when you're looking at the repo later and you look at the change,
you don't see the pull, you can't see the pull request information.
Yeah.
And it's gone.
And that's a big problem I feel with the GitHub.
Well, I feel this goes back to, you know, like you built the tool or, you know,
Linux group built a tool for your use case and you're using it the way you intend to use it.
Whereas GitHub built, the pull request workflows is built on top of this.
And it is not part of GitFurrift.
Git for whatever reason.
Maybe GitHub could have made it part of Git whatnot, but it's not, right?
Well, no, so we have pull requests.
We created pull requests in Linux.
We email, there's Git, create pull requests.
Oh, okay.
That was a good command.
It makes an email.
Is that part of the Git?
Yeah, it makes an email that says pull from this repo, and here's everything that's in
this repo.
And when we do a merge of that, that merge commit has all that information in it.
Okay.
And then so you'll, if you look at the Linux kernel and you see when you merge,
when Linus merges in the USB tree, he sees my little message at the top saying,
here's everything that's going to be in this pull request.
You got it.
And because Git is the source and that's where all the data is, right?
And so you can see that we don't have pull requests.
It's not external.
GitHub could change that model and put that in the merge request, but it doesn't.
I was about to say that, like, because you did it, they could.
But it's a matter of, yeah, I guess preferences and, yeah.
That's fine.
Anyway.
But the good thing about this is you can track every single line of code back to who made it and what they did and what was the change.
What was the change law?
What was the reasoning behind this?
Yeah.
Which is great.
Okay.
So this comes into, this person sends it to the module.
So he sent it to the, yeah.
So the owner to this is Johan.
There's a script we have that says take any patch and give me the people who are responsible for this and the mailing list.
So Johan and me.
I picked this and sent it to the USB list
and copies a bunch of other people
that I guess they worked with
and that have changed this driver in the past.
And then the copy is also done with the tool.
As you said, it kind of looks to who touched this code
or who might.
Exactly.
All nice.
This is all automated.
Yeah, all automated.
We do that in the soon as well.
So that was great.
He sent it and just the mailing list
has two copies of it.
That's just because they wanted two different mail lists.
But then they said,
oops, I messed up.
There's an email from the person instantly.
You have to re-send it off.
Oh.
It said, I messed up.
There's an interface.
Maybe I'd be a good idea to change this comment.
So they go and change the comment.
And then they resend it.
And you just send a new version.
And then in this case, they send a new patch?
Or do they just add one more?
No, you want to have a clean commit, right?
Yeah, we don't do.
So here they sent a version 2 patch.
If you can see that.
It says version 2, right?
Oh, there.
Version two.
Got it.
And then here's the same information.
And then there should be some comments, but what changed between the two versions?
Hopefully.
Yes, changes in version two from the previous one.
And there's a link back to the first one.
Nice.
Very nice.
So we want to see the changes because, I mean, I get a thousand emails a day.
Yeah.
And when I review patches and stuff, I'll review them.
And then they're gone because I'm reviewing the next one.
Yeah.
But if I want, when you send a next version, I want to remember.
see what changed from the previous one because I don't want to go back and dig through all
old stuff.
Okay.
So they added some information to it.
Wonderful.
And then what happened?
Johan, who is the maintainer of the subsystem.
Yep.
Wrote said, hey, thanks for the patch and how for documenting it.
Oh, he did something else.
I get the order.
First they said, oh, Chester wrote, hey, please apply this after two weeks, after a week.
He said, because after a week or after two weeks, it's nice, hey, what happened to this?
What's going on?
Johan said, you submit this patch during the merge window.
I'll talk about how we do our development model, but there's a two-week merge window when we do releases
that Linus takes all the changes from all the maintainers that have been in their development trees.
We can't add new changes at that point in time.
So there's a two-week kind of blackout for new development.
But this is where all the stuff is flowing into Linus for the next release.
So during that time, if you send me a patch, I can't really do anything.
with it, but it'll stick in my mailbox until then.
So this happened to hit that little window of time.
Just to understand, there's a nine-week release,
every nine weeks there's a new release going out, right?
And then there's a window where patches are gathered.
So yeah, here, let's talk about that.
So there's, Linus does a release.
Yeah.
This point in time.
And then the merge window is considered open.
And then for two weeks, all the maintainers,
send Linus all the stuff they've had pending from the last
release. Yeah. We have two weeks to add all new features. Yeah. And then he does release
candidate one and then from there on it's bug fixes only for the next seven weeks.
Mm-hmm. So it's bug fixes only, bug fixes only, it's regression fixes, we'll
revert things and so it's no new features. Yeah. But during that seven weeks, people are
sending me new features. Yeah. So I have a separate tree which is, which is my next. You're not
you're not bashing it for the night when the window will open. Yeah. So we call it next,
Linux Next. So we have a next tree where all these are merged together on a daily basis to see to
make sure they work. Yeah. Be prepared for Linus's next one. And then when he does a release,
after everything's good, we all throw things at him again in another two weeks. Now, Linus doesn't
pull automatically from all those merged trees. We have to explicitly ask them. Yeah. Because sometimes
our trees aren't good. Yeah. So sometimes like I maintain the TTY and cereal one time famously,
it was a mess. Our tree, it just wasn't working. There was new features added. So,
So I'm like, I'm skipping this release cycle.
I'm going to pull out some of these bug fixes and send it to you off the side and then go.
But if it was like automatically being merged in, we'd have to deal with that mess.
It's just interesting because most companies just, you know, reflecting on, you know,
the companies that use Git, large tech companies, they often have, let's say, let's talk about
native mobile development where there is a concept of releasing every week or two weeks because
of the app for review process or same with like desktop apps that you can't really just
continue to release.
There's usually an aim for something, but it's not as strict.
So every now and then it would also happen that it's just not stable enough, we'll push it back.
But there is not this rigid, like, clockwork.
Like I think, you know, most companies that I've seen, they just treat it a bit more flexible
because, again, you know, they come up with the thing that they're in charge with it.
Your feature you want to have at it.
Yeah.
And then as we know, when you have a milestone, you know, like features might be cut.
Deadline might be moved.
you know, like companies.
But do I understand correctly that in the case of Linux,
like is this a thing where every nine weeks there will be a release?
It's time-based.
So we have that two-week window of merging all the new features to Linus
that have been in our tree and accepted already and proven to work.
And the window is short, nine weeks.
Yeah.
And that's good because we used to have two-year-long development cycles,
three-year-long development cycles.
And the problem there is if you have,
even if you have six-month development cycles,
there's that fear of you have a feature.
I want to take your feature, but it's not quite ready.
Do I want to wait?
I know what you're doing.
But if you know that you can get your feature in in nine weeks from now,
and it's just not ready, it's not ready.
And it's much more like, okay, the pressure is off me as a maintainer
to take your new feature until it's ready.
You can say like, look, if it'll make it into the next one
or let's make sure it's going to work properly if it's a more complex one.
Yeah, we have lots of features.
I mean, famously there's a USB feature that's on patch version number 35.
This 25 patch series, it's on the 35th version, and it's just not ready.
And I just got email today saying, well, maybe we need to change this to this other way.
I mean, so I feel so bad for that developer.
But he's been working hard and it's a complicated feature, and it's taken him a year and a half to get there.
I have other patches that are in version three, but that's version three, and it's been two years.
Because the developer just took a lot of time in between.
Okay.
So in this case, this is a good example that, you know, the person, the contributor,
said like, hey, a reminder, I'd like this patch applied.
And then Johan replied, reminding of the timeline on how it works, right?
Yeah, exactly.
And Chester wrote back.
And then really friendly, it'll be in the next one, don't worry.
Which is nice, very positive.
Yeah, we're not mean people.
And reminding, don't ever feel bad about reminding me that I haven't reviewed your patch in two weeks.
Now, if I haven't reviewed it in two days, yeah, I'll be a little testy.
But two weeks is a good idea.
And then, Justin Robax, thanks a lot for keeping eye on it.
Keep up the good work.
And that's it.
So then Johan has it.
Yeah.
Johann applied it to his tree because he then wrote saying, hey, and Johan is very nice here.
He said, you kind of didn't do the comments in the proper format.
I fixed it up for you.
Oh, nice.
So for drive-by changes like that, we want to make it really easy.
And we're not mean people.
I mean, clearly, this feels like it's a person who is unlikely to become a regular contributor.
They're getting their work done, right?
They're adding.
They have a device that they have to share.
Yeah, pretty much.
But we want to be friendly and open and easy to everybody
because everybody submits their first patch at one time, right?
Famously, when I did my very first patch,
I wrote an email saying, how do I make a patch?
Because we didn't have good documentation.
Somebody wrote back and said, hey, here's how you do it.
He became my boss eight years later.
It was just funny. It's just like a small world and whatnot.
But yeah, and we want to make it easy.
So Johan takes us, and he's got the patch, and it's in his tree now, which is great.
But that's in his local little tree.
Then he has to get it off somewhere else.
Johann then makes a pull request to me.
So this is an output of the Git request.
Make pull request.
I don't know what the actual commandant.
And this is what a pull request from Git will look like.
And this is because Johan is a subsystem maintainer.
Yes, Johan maintains the USB to serial drivers.
There's a bunch of drivers for this types of things.
And then he sends it off to me the USB maintainer.
Got it.
And he says, take this patch or pull from this tree of this tag.
And it's a sign tag.
So it's signed with his GBG key, so I can verify that.
It's really him when I pull from it and says, take these patches and here's the information.
It's going to be some USB device IDs, and they have all been in Linux Next with no reported issues.
So they've been tested.
In our integrated testing, we test all this stuff every day.
And what does testing mean?
Is it automated testing?
Is it pushing it out on devices and device labs?
Is it a mix?
Yes, it's all of that.
So Linux Next gets merged every day as developer in Australia.
He merges all the trees together and builds them and boots them.
And virtual machines.
That's a non-trivial thing for a colonel to do.
If it can boot, it's usually a, things are going well.
It isn't testing on real devices.
Now, there are other labs out there with Colonel CI, which is our CI infrastructure,
that can run on all individual labs.
And we do push things out there and are people testing Linux Next on their real hardware,
sending us reports back in an automatic fashion.
Those are less rare.
Linas tree gets tested more on that.
Stable trees. I can talk about stable trees and a little bit.
I mean, tested more on the real hardware more.
Linux Next gets build and boot tested pretty well.
I don't run Linux Next.
I run my development trees online, so I don't run all the mix of them all.
Sometimes they interact because we don't have any fiefdoms.
If I have a USB change that needs to actually go through the networking stuff, I can change a networking code and whatnot like that.
And they can say, hey, maybe you shouldn't do that.
And we try and get approval.
You review my patches.
but it's now we can touch any part,
but he can touch any part of the kernel in a way.
But he sent me a pull request.
And a pull request is that I don't actually review the changes in it.
I'm not reviewing each individual patch through email.
I'm trusting that he sent me four patches here and that they're good.
Yeah.
And I have known Johan and I know that he will be there if something goes wrong.
Yeah.
And like you,
will you read the kind of the description and then every now and then you might decide,
for example, to like deep dive into a change?
Totally.
I mean, for USB device IDs, it's like, okay, yeah, they're all touched in the same driver.
Yeah, these are common.
They're nothing simple.
Sometimes they're a little more complex.
I don't pull from a lot of different trees, but I pull from some that I trust.
Some subsystems that I don't necessarily trust as well, I will make you send them an email.
And I'll actually review them, and then I'll review them, and then when I review them, I add my signed-off by to it.
And I guess part of trust will be here.
I'm just going to assume that since you and Johan know each other well and you work for a while,
will probably also every now and I give a comment saying, hey, Greg, there's this change.
Can you take an extra look on on this thing, et cetera?
Yeah.
So sometimes Johan makes changes to the code himself.
Or I make changes to the code myself.
I put it out for review.
And I have other people review my changes.
This is just fascinating for me to tell you explain how trust between people, maintainers,
is so important for efficient development.
Yeah, it's all, it's, yeah, it's, and then also the trust is somebody once told me that Linux development was the scariest thing they ever did because not because it was like difficult or whatnot, it's because my name is on this change and it's public.
That makes you as an engineer do really, really good work.
I mean, so much so that this person who said this patch went back and looked at it instantly and said, oh wait, the comment could be made a little bit better.
And they're like, oh, yeah.
So, I mean, that's not a normal development process in a company that I commit to go.
It makes me wonder about a few things that I kind of took for granted.
For example, could this mean that close source software where the outside world does not know how it's done?
Maybe there's just a bit less incentive to do such great work.
And actually, it's just a reflection.
Like I do remember when I worked at a company and when we actually my team, we open sourced a component that we built.
And I just remember how I put in way more work into that to make.
it look good to have the document, not just look good, but make it clean.
We cleaned up actual tech debt before we published it.
And we didn't do that with our stuff.
It was...
So open source development, by virtue of just human pressure, makes a better engineering product.
It's a better engineering.
And we've kind of shown that through the years that this development model creates a better
software.
I'm kind of revisiting some of my, like, not assumptions, but I never...
thought of it like this, but it's just, it's awesome to see this. So, so then what, what,
what happens next after, after Johan sends it to you? So Yohoan sends it to me. And then I take it,
and I put it in my tree. I think I send, I sent him saying I took it. And then if you take it,
you're responsible, right? I pulled it and pushed it out. Yes. And there's my email that says that.
So now I'm responsible. It's in my tree. So now the, um, since this is a device ID,
these can go to Linus at any time. We can add bug fixes or new device IDs. These are true. Yeah. So
then a few days later, I send this change off to Linus. So I send Linus. I said, hey,
Linus, take all these following changes, these changes, and here's a whole bunch of USB fixes.
So here's some small driver fixes, some new device IDs. And so I summarize it all. I say these are
all the things in here. Yeah. And these are going to be like a few dozen of patches, something like that.
Yeah. But here's the list of the patches down below. And here's the diff stat of them. Here's the
diff of them to make sure that this diff matches what he pulls from. This is signed with Mikey.
I do say almost all of these have been Linux next. I guess some of them slipped in. But we also
have another testing. When you send patches to the mailing list, we call the zero-day bot. We'll go
through and start applying them and build testing them. And that's run. And then our own trees that
we create also does verification that they did build and boot. And it will run some benchmarks.
For drivers, it doesn't really run benchmarks. And so then Linus takes this and he puts it
his tree. So then it got picked up. So it got picked up another day later. And then let's talk about
how we do our model. So Linus does a release every nine weeks. Yep. Bug fixes come in during those
nine weeks for the last release. You're running the last release, right? You want those bug fixes.
You had a device that's running those bug fixes. A long time ago, we realized that people don't want to wait
eight weeks. So let's create a model of we have a development tree and we have a stable tree.
So when Linus does a release, I fork off Linus's branch. And I say,
this is a stable branch. So 6.4, I do 6.4.1.2, 2, 2, 3, 4, 5. And our release numbers are just
numbers. They mean nothing. They're not semantic versioning. We were around way before that happened.
They're just meaning this number is later than that number. That's all. When we switch from 4.x to 5.x,
it's just because the X got too big. Yeah. And in your brain, when you see a number between like 14 and 18,
it looks smaller than 4 to 8. Yeah. So we just bump it up.
every couple years. So then we, so we take stable. We have stable releases. I do a release every week.
And what I do is the patches have to be in Linus's tree first. We can't diverge. So if it's in
lenesis tree, it's a bug fix and meets this criteria, I put in the stable tree and I do a release.
And so we do new releases every week for that. So during those nine weeks, I'll take new device IDs.
I'll take bug fixes and whatnot. And then you can tag the fixes that are going into the tree
with a special way that I'll automatically take them. I know to look at them. The other stable tree,
with me, Sasha, he runs through them and runs a whole bunch of fuzzing.
He's been doing AI before it was called ever AI.
It's just pattern matching.
And we have a whole body of here's a whole bunch of bug fixes.
Here's a whole bunch of changes.
Did anybody do these kind of match?
Oh yeah.
Some people don't realize that, oh, this was a bug fix.
It should go into the stable tree.
They've written academic papers on it for years.
It's fun stuff.
So just pattern matching, right?
So they'll pick up a whole bunch of stuff that, hey, maybe you forgot about that.
And you'll give you a chance to respond.
before it goes into the stable tree.
And we do those releases.
When Linus does a new release,
then I throw that stable tree away
and I make a new stable tree.
That's great for things
that I can update more often.
People want to make a device.
You want to make it something
that's going to last a long time.
So what we've come up with,
the idea is long-term stable trees.
And there I pick one kernel a year
and I maintain it for,
to start with two years,
sometimes six years.
So your Android phone is running off
a kernel that's five years old,
but it's still getting bug fixes back to it.
So I maintain like four,
or long-term stable trees at the same time.
And we backported all these fixes to all the different branches,
and then we pick one a year, and we maintain these.
So there's six of them going at a time.
And in this case, like, is it you, like,
there's one maintainer for each of these long-term?
No, it's just me.
Oh, you?
Wow.
Okay.
Yeah, it's the two of us.
The longer, the interesting thing is the older the code is,
the harder it is to maintain.
And the company's like, oh, I'll put a junior developer to maintain old code.
That's harder because it's more diverse.
from what the latest developers are using.
Can you tell me a little bit more about this?
Because the older, the code, the harder is to maintain.
I think it feels true.
But why is this the case?
Is it just lost context?
So development moves on, goes forward, right?
So say a change I make today to the code base.
It fixes the bug that affects the code that I look back.
It's affected the code for the past 10 years.
Yeah.
All right.
If I try to start backporting this change,
to code that's 10 years old, code has evolved in that time.
Yeah.
And making that change to older code is harder.
And the more I have to change it, and more diverges from the original fix.
So the more context and skill you have to have to make the change to the older code base
than even the developer who made the first change.
It's not intuitive.
Companies make this mistake all the time thinking, oh, I'll just maintain this old code base
for a long time.
We have major security bugs like Spectra and Meltdown with Chips.
Some of those spectrophixes have not been backported to some of the long-term kernels that are still being supported because it was just too hairy of a fix.
Anybody who cared, move to a new kernel.
Yeah, yeah, yeah.
So I look at a lot of these older kernels is, again, if you're using it, you will provide the resources to maintain it.
Google, I'll call out in Lenaro, Google's another group, do a lot of work in testing these old kernels because Google cares a lot about these kernels.
So they provide testing infrastructure and merges and reproducibility and running on real devices to make sure that these kernels still work on them.
and they work well.
And that way I know that if I make a change back there, it'll still work.
If I didn't have that resources for them doing that work,
I wouldn't be able to maintain these old kernels.
Yeah.
And then going back to the buck fix,
so like every week there's a new stable branch release.
And then when does the big release come?
The nine-week release come?
That's after this has been kind of baking, right,
for the stable branch has been...
So the stable's independent of Linus Street.
Oh, stable's independent.
So the only tie is it has to be in Linus's Street first.
We do not want...
Got it.
We don't want you to make a fix to a stable tree only in non-Lenus tree.
Got it.
So sometimes I will have bugs in the stable tree due to other changes I've taken.
I mean, fixes need fixes.
And I'm like, I can't take the fix for this until you get the fix in Linus of tree.
And it's kind of a forcing function on a developer to get a fixed to Linus before I'll take
it for the stable tree.
Sometimes I'll revert the change in the stable tree.
And do I understand the way to get a fix into Linux is a, well, of course, you need to
get a fix into Linus's tree, which means you need to go through the change.
through one of the maintainers who is in, you know, who maintains one of the subsystems.
Yeah.
So say it's.
And you just need to go up the tree as you up the pyramid.
Right.
So famously, Bluetooth always breaks every other release.
Bluetooth is crazy complex.
The hardware is horrible.
And if you need to get a fix in there and has to go to Bluetooth three and then that gets
sucked into the networking tree and then that network tree goes to lean this.
So it's like a two-stage process.
Sometimes and then we have somebody tracking regressions.
Regressions are really important.
We don't want anything to regress.
Sometimes Linus will say, I'll just take these bug fixes or regressions.
I'll just take them now.
Boom.
I'll just take them.
So it depends on what they are.
If they affect hardware that's really common, we prioritize that over hardware that isn't
as common, just by virtue of, hey, this broke my laptop.
Right.
I want to keep working.
So yeah, it's a little thing that way.
So we have two branches going at once, development, and then stable releases happening.
So then this went into the Linus's tree.
I picked this out as part of the stable trees, and then they ended up in the stable
tree somewhere as well.
And then I can give you dates for all this stuff.
This whole process took about a week and a half.
And that was it.
Okay.
And then here is, it ended up in the 6.13.4 kernel as well.
Yeah.
And then as another ones as well.
Back to all.
Trust isn't just earned.
It's demanded.
Whether you're a starter found on navigating your first audit or seizing secured a professional
scale your governance risk and compliance program, proving your commitment to security has never
been more critical or more complex.
That's where Vanta comes in.
Vanta can help you start or scale your security program by connecting with auditors and
experts to conduct your audit and set up your security program quickly.
Plus, with automation and AI throughout the platform, Vantac gives your time back so you can
focus on building your company.
Businesses use Vantage to establish trust by automating compliance needs across over 35
frameworks like SOC2 and ISO-2701.
With Vanta, they centralize security workflows, complete questionnaires up to five times faster,
and proactively manage vendor risk.
Join over 9,000 global companies to manage risk and prove security in real time.
For a limited time, my listeners get $1,000 off Vanta at vanta.com slash pragmatic.
That is V-A-N-T-A.com slash pragmatic for $1,000 off.
So we saw what it takes to get a fix into Linux, and it actually wasn't that complicated.
No, it really is.
I mean, it's just you email a change off.
You email, you use the Git workflow.
If you're familiar, it's pretty simple.
Obviously, obviously you need to be able to build Linux, test it on, test it yourself, validate it locally that it works, the basic things.
And then it's straightforward.
The fun thing is so I can take a change like that without really testing it because it built.
It obviously works for your hardware.
I didn't test it, but it works, and I assume that it goes.
Yeah.
And yeah, it's a very fast workflow as far as getting a project.
So it was like a two-week window from sending the first change as the merge window
to getting it out into stable kernels to the world.
That was pretty fast for overall, for a worldwide project that is everywhere.
So I think I understand what it's like to, you know, be someone who contributes to Linux every now and the.
But over time, some people start to contribute more.
They become more regular contributors.
And eventually, you're one of the few people or one of the few or many people who works on Linux full-time.
Are there many people working on a full-time?
So Linux has almost always been paid to be worked on.
So I started keeping the numbers back in, what, 2006 or something?
And at that point in time, 80% of the people that contributed were being paid to do it full-time for their employer.
And their employers want people who know how to do Linux because they're...
they want to solve their problems.
They want Linux to, it's much cheaper to pay a few engineers,
to add a few new features than it is to write your own offering system.
That's the beauty of Linux.
That's why IBM put a bunch of money into it.
That's why everybody uses it.
It's a tool for people to get their work done.
You want to run your battery.
You want to run your car charger.
You had a little driver for the one device you had.
You had an engineer to do that.
And it's good to go.
And it'll be maintained for forever because we maintain it in the community.
It's all good.
So it's cheaper.
So we've been doing it.
And the joke used to be you get three changes into the kernel.
You get a job.
It's not really a joke.
As long as they aren't spelling fixes.
But some people do spelling fixes, which is great.
We have people that do janitorial work through the kernel.
They sweep the tree for common problems, and they just clean stuff up and keep code alive and
make sure it's fresh, proper coding style.
We have coding style issues.
We have people just fixing spelling mistakes and comments, which is great because you've got to
start somewhere.
In fact, spelling mistakes and comments is a great place to start because it makes you get the
workflow down.
You figure out how to make a patch.
You have to send an email.
fix your email client and not send HTML and things like that.
And you can't use a web client.
It doesn't, web email client to send an email.
It just doesn't work.
Good email.
There's lots of really good email tools out there.
Use them.
But you're now a full-time.
I'm a full-time maintainer.
What does your kind of day-to-day or week-to-week look like?
Because I'm going to assume it's going to be a little bit different than most developers who, you know,
like write code, review code, do those kind of things.
So, yeah, I mean, I've been working for.
for the Linux Foundation for what 13 years now. Before that, I used to work at Nobel and Sousa,
before that IBM, and then a little startup, all doing Linux stuff all this time. And then before
that I did embedded work. When I worked for a company, you end up working on features
that your company wants, or reviewing code from the other developers of your company and then
sending off changes. Or if you're a maintainer, a maintainer is, and the networking maintainer
said this the best, we're like editors. We used to be a writer. All we do is critique other people's
stuff now. But because we were a writer, we have a little side project. So we do have
little things that we do dabble and stuff. So like I looked, I did 80 changes, only 80 changes
last year because I have a few little things I want to do. That was low. But working for Linux
Foundation as a full-time maintainer, that's rare. I think there's only maybe five people,
maybe, maybe a handful of people that just work on whatever they want to do. So Linux Foundation
rule is they can't tell me what to do and I can't tell them what to do. Works so great. Me and Linus and
Shia Khan, we're all fellows there and we work on improving Linux for however we feel like it. A lot of
Me and Linus do a lot of review, a lot of other stuff.
Linus still contributes.
He famously rewrote the core locking primitives in Linux a couple years ago.
I had a Microsoft developer say, there's no way any of us would be even allowed to do that on Windows.
You know, you don't test changed core bits and pieces.
For one of the security features and one of these stable releases,
Lena said to rewrite the call path from how a user space calls into the kernel, the core cis call path.
Nobody really noticed that it got rewritten, but it did, and he did it.
in a stable way and then it works like that.
So we're also part of the kernel security team.
We get security bug fixes all the time.
And if they're easy, we'll just fix them ourselves
and send out the fixes.
So we do security fixes a lot as far as that goes.
So my day-to-day is I read other people's stuff.
Like I said, I got 1,000 emails a day to do something with it.
You're not like x-rating.
No, yeah.
You get a thousand emails a day.
Yes.
Wow.
So I don't have to just file off.
And it's like, oh, this, like I subscribe to a number of current
subsystem mailing lists to see what's going on.
Yeah, yeah, yeah.
And I don't have to do something with all of those.
Yeah, but some of them, you need to do something.
Yeah, some of these I do need to do something with.
And some like, so I'll, so say for USB is one of the subsystems I retain.
I showed them all off to a mailbox, and then once a week I'll go through the mall and say,
okay, let's review all these.
And so I'll look at my inbox.
I'll have 200 USB emails to patches to go through.
And other people review them and other stuff like that.
And okay, this maintainer said this was good, not good, whatnot.
And I apply them to my trees, see if they build, if they fail.
I'll report those.
You know what you're doing reminds me a little bit of when I used to work at Uber.
We had this concept of RFCs, which I think got inspired by the RFC process.
So people would just send off a document of here's what I'm planning to do.
And they were mailing lists for like back and mobile, different parts.
And I noticed after a while that the more tenured engineers and the more experienced engineers
was spending increasingly more of their time reading through these things, critiquing,
giving feedback, giving pointers, connecting the dots.
Like, it just hit me when one of the first mobile engineers that Uber was telling me
that he has one day blocked out per day just to go through all of these things,
which again, it wasn't kind of part of his role, but he felt responsible.
He had all the context.
He actually helped so many people avoid certain things just by pointing it out.
It's the same thing or something similar happened here.
It's the same.
But we also, the different part of this, and I'll call this out, we don't have grand
proposal sent to the kernel list. We don't say, hey, wouldn't it be great if you did this?
I don't want to see that. I want to see code that works.
I love it. As proof, then code that works matters is because you've taken the time,
you've proved that this can be done. Now, not necessarily that it's done right or done the best
way, but it could be done. And that's now you have the skin in the game, and now I'm willing to
work with you, and let's go on that. People do send off RFCs of patches. If it's an area I care
about, I'll look at it. Sometimes you can get away with this. This is a fun trick with
maintainers. If you send me a patch set that solves your problem in such a way that it's
horrible, but I hate it so much, that I'll rewrite it myself. Because it'll be like, I can't say
no because it's solved the problem, and I want to solve your problem. But if I don't say no,
then I have to take that. So you can do that once a year to a maintainer.
I says that you're a little bit busy work because I've seen at different companies when you
have the proposal process. Again, a lot of companies for good for, you know, it's, you know,
it sounds logical. Instead of starting the work, instead of investing time, maybe we would all
save time by do a little planning up front, right? But then every now and then what happens is you get
into this never-ending planning, nothing happens until either the project is abandoned or someone just
sits down, write some code, and kind of, you know, just cuts all the discussions are done because
now it works. Yeah, well, you have to prove that it can work. And so inside companies, I'll say, we do have,
we did like when I work for companies, IBM, it's like we had planning, okay, we need to implement this
feature to match this parody with this old version of Unix. How are we going to do that? Let's figure out
to do this. Is this going to work? Yada, yada, yada. And we have planning, things like that.
One of the fun things is, when you're dealing with open source and this happened at IBM,
engineer over here was tasked with fix this problem. Great. He came up with a solution,
submitted all the changes upstream. Lots and lots of discussion. Turns out his solution was not
very good. Somebody else saw that it was a problem, rewrote it, submitted it, and got it accepted.
but it wasn't the original engineers work.
And so the end of the year came,
it was like, how was this person going to be reviewed?
And we're like, he caused the feature to get done.
It wasn't that his code made it in,
but he influenced the community and made,
the goal was you wanted to see Linux support this, right?
Linux supports this now.
And it had a change in mentality of how management had to treat engineers.
And also the same thing with who owns the code.
We had people come in and end up becoming maintainers
of certain subsystems.
And that's great. And they were maintaining this part of the kernel. And then they were reassigned to do something else within the company. It's like, oh, that's great. But you're going to still have to give him time to do that other thing. It's like, no, no, no, no, no. The community gave it to him. It follows him. If he goes to a different company, it follows him. He goes to a different company. He follows him. And that's actually why Linux is so good, I think. When you work in a big company, you're forced to work on new things every couple years, right? And that's part of moving up in a company. You get different tasks when not. Famously, Windows has had like eight, no.
five different teams work on their USB stack.
Linux is that one team working their USB stack for 20 years.
And then we know this stuff and we have this development in depth there.
We just keep coming back to like I was kind of expecting a little bit of a discussion about
I came in here just using Linux or indirectly using Linux a lot, but not knowing of the depths.
And I kind of thought that we would talk a lot about the tech, the processes.
And every time we come back to the people, the trust, I wanted to have.
why you think Linux has won so big that it's everywhere, but I'm starting to get the answer
to this. Because I was thinking, why Linux? Why not a commercial? If I naively ask myself before this
conversation, like we have two teams. One is commercially funded. They're selling their software. They're
paying the developers really well. And then the other one is giving it away for free. They figure
at a model where people are still paid, but, you know, it's open source.
Anyone can use it.
Anyone can contribute, which one will win the long term.
I naively would have said maybe the commercial one because they're incentivized.
They're going to, you know, create all these professional things where here is more
interesting value.
But but.
But so Linux has been contributed to by companies in their own interest.
So it turns out everybody contributes in a selfish way.
We want to solve my selfish problem.
But it turns out everybody has the same problems.
So your problem being solved is the same problem as their problem.
Famously, we had this when it came down to Embedded.
So Embedded happened, they came up saying,
we need to change Linux to make it work better on batteries.
You know, power is really important.
Power is very, very important.
Law more efficient.
So we wanted this.
This was when Linux was first getting into Embedded
and we need to make this very efficient.
We're like, great, that's a wonderful little solution.
Make it work for everybody.
Like, no, no, no, we just care about embedded over here.
That's the only person that's going to care about power.
It's like, no, you really, just make it generic.
It'll all be good.
It turns out data centers save billions of dollars in money because of power management.
And it turns out everybody.
So the mainframe is more efficient on a mobile phone, suddenly it's a good candidate for it to be a mobile OS.
Yes.
It works for everything.
Same thing for multiprocessors.
Multi-processor came out.
We have two processors in the big data center.
Who's going to care about that?
In your pocket now you have 16 processors.
It just works for everybody now, right?
Data trends shrink, go different places.
but because we solved it for in a generic way,
we forced you to solve in a generic way,
but you contributed it in a selfish manner.
And that's a good way.
The IBM knew they could put money into it,
hire developers, and get the money back.
So it was cheaper for them in the long run to do that.
And they make money selling support and selling hardware.
Red Hat makes money selling support.
And that's like that.
Intel makes money selling chips.
And that's who contributes to Linux is the people
who they want to sell a different product.
Now, one other thing that's interesting about efficiency, we have 4,000 developers contributing,
some of them only contribute one chain, some of them contribute a bunch, three to 500 companies
per year, we're talking about per year. If you told me, like, this is inside a kind of commercial
company, a tech company, you know, I would assume that in order to make this work, oh, for 4,000
developers, you know, we probably need to hire 400 p.ms. We'll have, we'll have about, for every 50
developers, we'll have about 80 TPMs. This is how it would run. Like, you're laughing. No, I know
I've been there. You've been there, but I only come from here. Now, one thing you told me,
because I was asking how many project, or are you very technical project managers you have, and
you said zero. How? Well, so in a way, the project managers already happened on the back end
before the patches got to us. So at a company, say, IBM, I want to solve this problem.
They've said, how do we solve this problem? Let's put this task, let's figure it out. And then the
hatch has come out to us. So we don't see that. So we just see the feature when it lands on us.
That's fair. So they're there for working for the individual companies to get their thing.
Sometimes, sometimes they're not. Sometimes they're just developers are spitting things out and
like this person who needed to get a new device ID. It saves company time and money if they
contribute their changes upstream than to keep it as a fork because they have to keep
maintaining that fork. So wise companies have realized let our developers work upstream, do what they
need to do there with limited project management. And it just works out better. And again,
we're only taking things when they're ready.
We're not having to track.
We do have tools.
We set everything through email.
We have tools like the networking subsystem has a webpage.
You can go to see what the status of your patches.
If it's past all the CI, if it's been reviewed by the maintainer and things like that.
So we have a bunch of automatic tools based on top of email that will help you out.
And those project managers can go look at those if they wonder what the status of their employees' patches were at, things like that.
But yeah, it's just a different model.
But it's not like they're there.
they're hidden behind the solution for that company.
It's fair, and I think it's also good to remind that that's the case.
But I feel Linux still figure it out a way to just focus on just root-less efficiency with automation,
with focusing on the work when it's done.
So as you said, all these things do happen, but they happen before.
And then you can, you know, like this part of the process will just be more efficient by design.
Yeah, and we also, but once a year we get together, the core maintainers,
and we talk about not technical things
because we can't have enough technical people
in the room for a topic.
We talk about process.
Is our process working?
Is it not working?
And we refine it and say,
oh, maybe we need to do this a little better.
Oh, wouldn't it be nicer to do this?
Hey, we need more testing over here.
Hey, can we do this type of stuff?
So we do, we talk about our process all the time.
Famously, the lead up to that meeting
is a public, another public mailing list
that we all talk about processes.
And that process, that once a year bike shedding of our process
in public, it helps shake out a lot of things
and work out.
And there are problems.
I'm not saying this development model is perfect.
It works really well.
One thing that's odd about Linux is that we keep going as fast as we are.
We're running at 9 to 10 changes an hour.
In the stable kernels, we're running 30 changes a day, 30 to 40 changes a day.
10 CBEs a day.
A bug at our level is a CVE almost.
Yeah.
So the CVs are the critical.
It's a security bug.
It's a vulnerability.
That can be as stupid as memory leak somewhere or I rebooted the machine.
I took over and got permission.
I don't know, when I create a CV, I can't, I don't know how you use Linux, so I can't tell the severity of it.
But I can say, here's a bug.
You should look at this.
So we're responsible for that.
So we're running at a huge rate of change.
Most large software projects have a huge ramp up, and then they plateau with developers and rate of change and whatnot because they've solved the problem.
Linux has never solved the problem.
And I used to have, I had a manager at IBM every year he'd come to me and said, hey, is Linux done yet?
I was like, no.
It took me 10 years to finally come up with the answer of, it'll be done when you, you'll be done
when you stop making new hardware.
And when they stop making new hardware
or having different work classes, then we'll stop.
But we're one of the few projects that keep having
to add new features because of new hardware.
We're not doing it just because, I mean,
Linux has been working for all of us for 20 years.
We're doing it to support new hardware,
to support new use models to support things.
We don't add things for fun, generally.
We add it to solve a problem that somebody had.
Most of Linux is written using C or C++, right?
No C++.
Just just C.
Just C.
And I guess for some hardware drivers, is there assembly ever involved or no?
No.
Assembly will drop down into the early boot of a processor and then some core functionality like locking and that drivers or other people will call.
Basically, we'll go down like string functions and whatnot.
We'll go down to good assembly language that's tuned for the different processors.
Also, when you boot Linux looks at the processor you're running on, patches itself to figure out the best functions that those assembly would work and then it continues on moving.
Which is crazy. It patches itself at boot time.
So hold on. But some of that is assembly?
Some of that's assembly in the very beginning. And some of those low-level functions, but drivers don't
never touch. Okay. So basically, like, from a Linux computer, but it's all C.
Now, you know, one thing that actually, the way we started talking is there is a proposal to
introduce Rust because it's just more memory safe. It's also a language growing in popularity.
And some people would like to do more Rust development. What is your take on this? Do you
think Linux at some point might support Russ or, you know, what are your, what is you're
thinking of doing things outside of C? So we have 25,000 lines of rust in the kernel already.
Oh. We do. Okay. Awesome. Yeah. So most of that is just bindings. There's no real functionality.
In the latest release, if the kernel crashes, it'll put up a QR code. You can take a picture of it
to get the crash dump. That code was written in Rust. That's in Rust. So the Rust, so the Rust for Linux
developers been working for a long time. A couple years ago, they came to us and said,
we think we're ready to do this. Do you want it? And we said, yeah, let's try this experiment.
You're willing to do the work? Who am I to tell no to? I mean, it's classic Linux. Yeah.
I mean, it's, it's now the problem with Linux and Rust is it would be easier to write a core
piece of Linux and Rust than it would be to write a driver. A driver is consumed from everywhere in
the kernel. So you want to talk locking. You want to talk input and output. You want to talk
talk to the driver model, talk to the USB port, all this stuff.
Drivers can be really tiny because they take resources from the rest of the kernel.
In Rust, you need to have a binding between the C code and the Rust code.
There's an intermediate layer.
The kernel in C has these very opinionated model ideas of how it handles objects
and how it does memory and how it has its memory model.
Rust has its very opinionated model of how it does this type.
Same idea.
This meshing is tough.
This meshing is also the most crazy complex Rustic.
you've ever seen. So from a new Rust developer, like me, I can barely read the bindings,
but I trust other people are doing it. So yes, so the trick is we now need to write a binding
for every different part of the kernel in order to write a RustCo, a Rust driver. If you want to do
the QR generator, that's simple. That was this one function. So over the year, the past couple
years, people have been writing bindings to try and do things. We've had a bunch of example drivers,
like a new disk driver, this rider driver in C versus Rust.
It turns out there are still some performance issues with Rust code versus C code
because we can do some tricks in C code that they can't do yet in Rest.
That's on the tooling and the rest developers are doing it.
The core Rust developers that in the language, some of them are Linux kernel developers.
They've always wanted to rest to be working for Linux.
The Rust model is good.
Memory safety at our level does not mean that you can't crash the kernel.
You can still override things.
Memory safety and Rust just means the memory that you pass around,
you think you have an ownership of or it isn't an ownership of.
And when things go out of scope, they'll get cleaned up properly.
So I've seen every single kernel bug for the past 18 years.
Half of them will be fixed with rest.
It's just going to be fixed with rest.
It's the stupid one-off bugs.
It's the, oops, I overwrote an array, and I didn't realize it by one.
Oops, I forgot to clean up this error path.
I forgot to unlock this lock.
It's stupid little things like that.
There's logic bugs, of course.
You can write logic bugs in Rust.
You always have those.
Right.
So, but famously, the QR code that didn't rust that made the QR.
C passed into the Rust Code, a pointer to a buffer, and the buffer size.
The Rust code forgot to look at how big the buffer was.
And it's scribbled right over memory.
So you can write memory and Safe Code and Rust is live.
And you can crash things in Rust.
So memory safety here means it's the safety of object life cycles and things like that.
It doesn't mean it's going to remove all bugs.
It's not a golden bullet or anything like that.
Silverbolt. But I think, yes, I think Russ needs to come in because it should be easier to write drivers in this stuff.
We have a lot of issues with lifetime rules of when you yank out a device, devices are dynamic.
And dealing with these reference counting of things like that is very tricky to get right.
There's parts in the kernel. We still do not have it right. And we know we don't have it right.
Rust is forcing us to actually document our C code better. And it's cleaning up.
So if Rust disappeared tomorrow, I've had to clean up code in the driver core that's like, oh, yeah, I guess we can do things better and safer in the C code in order to make Rust easier.
And we have.
And so it's making us rethink how we do a lot of our existing code in the kernel.
To be fair, a lot of core kernel people are pretty resistant to that.
They don't like change.
Don't like different languages.
One core kernel developer said, I don't like working with a project that has multiple languages just because it's tricky.
And they are free to do that.
They're not stepping on anybody's toes.
A lot of it's miscommunication, and a lot of it comes down to people again.
Honestly, in this binding, I wrote the driver core many, many years ago of how drivers work in the kernel.
There had to be a binding for that in Rust.
This code I saw, I said, this is horrible.
This isn't going to work at all.
It's miserable.
I went and actually met with the developers.
And there's a Rust Linux conference.
We sat down.
I think they gave a whole presentation just for me.
Turns out I was wrong, and they were wrong.
We both were wrong.
And they were doing crazy things.
They had a thousand lines of C, a Rust code that I do in two lines of C code.
I'm like, well, why?
They're like, well, we didn't want to change the C code.
I'm like, we can change the C code.
Because I just did that because it was easy in C.
But if I changed that, you get rid of a thousand lines of rust.
Let's do that.
And then again, it comes down to, okay, understanding what your problems are,
understanding what my problems are.
Let's work together.
And now we have bindings in a kernel that you can actually write some drivers with.
And the Red Hat developers are starting to write the new NVIA GPU drivers in Rust.
and they're starting to put the proposals out there.
The Apple GPU drivers are for the Apple MacBooks are ridden and Rust.
Those patches are not merged, but they're ridden to rust and prove on a fork.
That works great.
There's a whole bunch of crazy object life cycle issues with graphics drivers,
and Rust makes it a lot easier for them to do.
I think you'll see a lot more of the simple, stupid drivers for hardware devices being riddened Rust,
because all they want to do is reading my two some random memory bits.
And it's really easy to do that in Rust, and you can do it,
actually less code than you can do it in C code.
And I think that's good.
We now have the infrastructure in there.
So I think we've hit the tipping point.
We'll start seeing new stuff in there.
And we need to do that.
I mean, there's mandates from governments
that you can't use memory unsafe languages
like C in products.
And if I want to see Linux to succeed,
which I do, we're going to have to change.
And I can say, going forward,
if you want to write and rest,
you can write in rest.
Now, that being said, we still have 40 million lines of C code.
So we have some very, very good developers
are out there working on mitigating
the problems we have in C.
We now have bounce checking.
for our stuff. We now have other, we call them seatbelts and airbags that protect your C code
from doing stupid things. And we're working with the compiler authors to add new extensions
to see and make things safer for the C code because we want to protect the code that we have
today because you're not going to rewrite code in Rust. Don't worry about that. Google famously
published something recently saying over the past couple of years, we've ridden our new code in Rust,
and we got overwhelmingly more secure because we didn't touch the old code and bugs degrade over time.
There's still going to be bugs in the older stuff, but most bugs happen in your new code, not in your old code.
That's awesome.
I'm sensing you're excited about Russ, and I, it's also just nice to see the evolution.
Yeah, it's evolution and see what happens.
And if it fails tomorrow, we can rip it out and whatever.
But we have developers willing to do this work for us.
It's not intruding on other people's stuff.
Well, I think it does go back to what you said earlier is I understand that a big part of Linux is, like, show the work.
Like, if it works, and same thing, you know, it sounds like that's how Russ started.
It's also how it's progressing.
People are showing that it works.
They're proving that it works.
It solves their problem.
It maybe even works better for them.
And then you know, step by step.
Yeah, like, people are like, well, why not Zig or Hare?
Those are other good languages.
I'm like, that's great, but nobody's proposed to.
Yeah.
So, yeah, if they want to do that.
And fair, I think those developers who work on those languages don't care about Linux,
which is fine.
They don't have to.
So looking ahead, outside of Rust, what are other things that you're kind of excited about
that's coming in Linux?
either projects, changes.
I don't know if...
We haven't said LLMs except for once here.
I don't know if that, for example,
will LLMs have any impact on how development is done?
No.
There's not...
I mean, they're all trained on Linux kernel code,
so you write out another driver,
but LLMs are great for writing boilerplate code
and things like that.
In Linux drivers, you don't have much boilerplate code
because we've stemmed that down into the core
and made that work better.
LLMs are used to find bugs and find the bugs fixes to match that we should be taking.
But again, we've published papers on that for eight years.
There's been lots of research on that.
So we've been using that for a while.
I mean, LLMs just applies statistics, right?
So it's just pattern.
Pretty much.
So code for us at this level, it doesn't matter that much.
So no.
And then as far as I don't know what's coming tomorrow because I just see what people send to me.
So we don't have a plan.
I mean, we always joke, you know, Linux is evolution, not intelligent design.
It's just whatever shows up, right?
Because you're solving your problem, we'll figure out how to fit it in there with everybody else's stuff.
And make sure it doesn't work on it.
People are working on new features.
I mean, Linux is people are like, oh, it's an old model.
It's the old Unix model.
It's like, yeah, we can run code from 20 and 30 and 40 years ago.
But we can also run new stuff.
We have new features.
We have new IOP paths that are even better.
We have new types of functionality.
We have new security models.
We have new capabilities.
We have new types of stuff for the new stuff.
But we didn't break the old stuff.
So we can do both stuff.
You can rewrite your code.
I know the databases are rewriting it to use IOU ring, which is a new way to do I.O.,
which gets the user space to kernel boundary out of the way and does faster pass.
So they're speeding up the databases by porting to new Linux features.
But their old databases still run just fine.
And so it's like people look at it like, oh, nothing's changed because the old stuff still works.
That was the goal.
The old stuff still works.
So I don't know.
Just see what new, I mean, new hardware features.
I see the new hardware coming all the time.
We get told by the CPU vendors, like, look at this new chip.
It's like, great.
So that's always fun.
And then in terms of contributing to Linux, we just went through this example.
And it seems pretty easy to contribute, honestly.
Like, you know, I wanted to ask on advice to contribute, but my sense is just do it.
Like, it's not that difficult.
But from a professional point of view, like, what?
What do you think a developer who, you know, is building other stuff at a company,
what would they get professionally out of contributing even one change or a few changes to Linux?
Like how could their outlook change or what could they learn?
Well, the best thing is it's your resume.
So I talk to college students.
I talk to college students of Vue, other universities all the time.
Say, hey, contribute to the kernel.
You have time.
And then when you go to get hired, somebody can look at you.
Say, oh, yeah, look, you do play well with others.
And you did contribute other stream because when you come in as a company, you're not writing code from scratch.
You're working with other people.
You're working with existing codebases.
If you contribute to Linux, or any open source project, you show that you can work with others.
You can work with existing codebase.
So it shows a great skill set.
When I hired people, when I was at IBM, if you contributed versus not, it's like, oh, that's an easy.
So I'll rather take that.
So from a personal point of view, contributing to get a job easier.
Get a next job.
For another point of view is from an engineer, you get to learn new things.
I wrote my first driver and sent it out.
So, oh, here it is.
It's all perfect, whatnot.
Everybody's like, no, this is wrong, this is wrong, this is wrong.
And you've ever heard of multiprocessors?
I'm like, what?
What is all this?
And that's great.
From an engineering point of view, I want to know better.
I mean, the Linux kernel developers,
you can never have all the best developers in the world of the same company.
But when in open source, we can all work,
the best operating system, people can all work on the operating system together.
So the depths and talent of the people that are working on Linux is just amazing.
Take advantage of that.
I'll say the Rust developers that are working on Rust for Linux,
are core rust developers.
These people are really, really, really good.
They maintain core parts of the rust infrastructure.
Take advantage of them.
I mean, I'm learning so much from them.
So from an engineering point of view,
there's these people that are really out there
and willing to help you and grow as an engineer
and learn different processes
and learn different skills much better.
I mean, I learned so much more working in the community
that I ever did working at companies
because you have better review process,
you have more exposure to crazy corner cases
that you hadn't thought of.
That, oh, yeah, in the real world, yes, that would have been one in a million,
but we do have to take that because we have a million boxes,
four billion machines out there.
Plus, I guess, the more curious you are, everything is open in Linux.
So I remember when I joined Uber, I was just amazed by the RFC process.
And internally, I could read all the RFCs, and I spent like a week or two just kind of
reading and, you know, trying to take it all in.
In Linux is here.
Like, like, any, obviously, it's overwhelming if you just started at once.
But you can, like, target something.
And so you can just even if you contribute little or even before you contribute, you could just learn.
You can see how the changes are made.
You can try to understand these things.
Yeah, I will say it's not the best learning operating system.
There's really good learning operating systems out there.
We're not those.
That being said, I mean, people still write academic papers about it and all this stuff.
We want to read the schedule or do all this fun stuff with it because it is a real-world tool.
I mean, I learned from Minix and Linus learned from Minix, which was a learning operating system.
And then we took those ideas and that and we made Linux with it.
I mean, Linus did it way before me.
But learning operating systems are great, but working on a real world one system is a little bit different.
That being said, there's parts of the kernel that are very easy to get into for newbies.
We have a whole section of code with really bad, crummy drivers that are the wrong coding style.
They have the wrong formatting.
They have just a lot of dead code.
That's there for beginners to take up and take your first patch.
fix the spelling mistakes, fix the coding style,
learn how to do this stuff.
And there's a whole website,
kernel newbies.org is a wiki
that has a whole bunch of stuff
on how to write your first kernel patch.
How to get involved.
I've given old YouTube talks
if you search how to write a kernel patch.
I need to do a newer one.
It's fun.
I've gone to universities and said,
I gave everybody a file
that you're going to write a kernel patch
for this file.
It's like what?
Okay.
And they do it and by the end of the class,
end of two hours,
they send a patch off and they got accepted.
You know, it's very simple.
It's not a difficult thing to do.
And we want new people to get involved because we don't know who's out there or what they can contribute.
If you just want to do something for fun or do something for real, it's great.
Awesome.
Well, this has been really interesting.
And I just like to close off with some rapid questions where I just ask and then, you know, you do what comes in mind.
What's the most memorable patch that you've contributed to in Linux?
So this is going to be about people again.
Early 2000s, we were starting to get Microsoft saying Linux is a cancer.
We're all worried about the next.
Oh, I remember.
Yes, we remember that stuff.
We started getting some really, really good patches for some hardware that we really didn't
know that well.
It was showing some really good stuff.
And it was like, this is really good.
And we're like, where did you get this information?
How did you know this stuff?
Is this like somebody trying to sneak this in?
And the person wrote back and said, here's how I found this.
There's how I tested it, how they did this.
I'm like, okay, all right, we took this.
And over time, we took all this patches over time.
And then we have this conference once a year for all the maintainers.
and you get invited to it
and we're like, oh, this gives them an invite
because it was really good.
And it was in Canada every year
for a number of years for some reason.
And they came and they showed up
and he showed up and he's like,
oh, sorry, I had to bring my mom
because he was in high school.
He was 17 years old.
None of us knew.
And he contributed.
And it was like, okay, great.
And it turns out he later went on to MIT
and now he's a professor at Stanford.
Wow.
Yeah, all you see is an email address in Linux.
Yeah, so Adam, though.
It's like, okay.
Yeah, it's like that.
I mean, it's things like that.
Wow. That's good.
Another one I'm really happy about is there's lots of drivers that are been sitting outside the kernel tree for many, many years.
Just because people never got them upstream or went on.
One of them is the subsystem to handle Braille keyboards.
So Braille displays.
Yeah.
Just feel that those are outside the tree.
I and a couple other developers worked with those people and got them in the tree and got them working.
And now they're shipping with all devices.
So we made sure that these people who are always having to patch.
this auditory stuff because these devices only needed for a very siny subset,
but now it's available for everybody. I'm very happy to see that.
Wow. And I guess this goes back to Linux of what you said,
why companies will contribute a few developers per year,
because now when you take Linux, you get an OS, for example,
that also has rail support. Like, you know, that itself,
like adding it to an existing product or if you built an OS,
that itself would be a massive undertaking.
Yeah. So now it supports all the devices out there.
Awesome. What's your favorite programming?
language? Still C. I mean, I've been doing C for, what, 30 years every day? So, yeah, C. Yeah. I've been doing a lot of
rest lately. Rust, I feel like I'm able to write really sloppy code and it's, and it works. I don't feel
like I have to be as precise as C, which is, I don't know if that's good or bad. And what are,
what's a book or two that you would recommend reading? The old code complete book was a really good one. That was a
really good one. It taught me that coding style matters. It doesn't matter what the coding style is.
It's just a generic set coding style matters because our brains work on patterns. As programmers,
we're reading patterns. And when the patterns the same, the metadata goes away and we can see the logic
easier. Code Complete is aged a little bit weirdly. If you look at the first book, it has a lot more C
examples and whatnot. But it talks about the basics behind programming and a lot of stuff. And that was a
really, really good book. On the flip side, another really fun one.
that's really tiny programming pearls.
And it's like bit fiddling and cute little algorithms and neat stuff like that,
which surprisingly we still do today.
We're talking about adding parity functions in a common way.
And everybody's like, no, if you do it this way, you do it this way, it'll be faster in this.
So we're still messing with these things that people will have messed with for 40, 50, 60 years.
And these things still matter.
And they matter to people because cycles matter and power matters and things like that.
So between those two, those are my favorite ones.
Well, this is awesome.
This has been such an interesting and, like, for me,
just really educational and eye-opening chat.
So I'm glad we did it.
Well, thanks for having me.
I found this episode to be a really interesting one about Linux.
I'm still amazed that an open-source project managed to become the most widespread operating system in the world,
despite not being a commercial business.
It's such an interesting and inspiring project.
You can find Greg on social media as linked in the show notes below.
And if you'd like to try your hands on contribution to Linux, visit kernelubes.org.
For more deep dives related to back in engineering, check out the pragmatic engineer articles,
linked in the show notes below.
If you enjoyed this podcast, please do subscribe on your favorite podcast platform and on YouTube.
This helps more people discover the podcast and a special thank you if you leave a rating.
Thanks, and see you in the next one.
