Tech Over Tea - Linux Kernel Scheduler Developer | David Vernet
Episode Date: March 8, 2024The linux kernel is something we all use but have you ever thought about what goes into it, well today we've got David Vernet on the show who has spent quite a bit of time focusing on one aspect, ...that being the scheduler. =========Guest Links========== Twitch: https://www.twitch.tv/Byte_Lab Kernel Recipes 2023: https://www.youtube.com/watch?v=8kAcnNVSAdI LSF 2023: https://www.youtube.com/watch?v=MXejs4KGAro BPF: https://datatracker.ietf.org/wg/bpf/about/ Sched_ext Repo: https://github.com/sched-ext/sched_ext SCX Repo: https://github.com/sched-ext/scx Other Changes 1: https://lore.kernel.org/all/20230809221218.163894-1-void@manifault.com/ Other Changes 2: https://lore.kernel.org/all/20231212003141.216236-1-void@manifault.com/ ==========Support The Show========== ► Patreon: https://www.patreon.com/brodierobertson ► Paypal: https://www.paypal.me/BrodieRobertsonVideo ► Amazon USA: https://amzn.to/3d5gykF ► Other Methods: https://cointr.ee/brodierobertson =========Video Platforms========== 🎥 YouTube: https://www.youtube.com/channel/UCBq5p-xOla8xhnrbhu8AIAg =========Audio Release========= 🎵 RSS: https://anchor.fm/s/149fd51c/podcast/rss 🎵 Apple Podcast:https://podcasts.apple.com/us/podcast/tech-over-tea/id1501727953 🎵 Spotify: https://open.spotify.com/show/3IfFpfzlLo7OPsEnl4gbdM 🎵 Google Podcast: https://www.google.com/podcasts?feed=aHR0cHM6Ly9hbmNob3IuZm0vcy8xNDlmZDUxYy9wb2RjYXN0L3Jzcw== 🎵 Anchor: https://anchor.fm/tech-over-tea ==========Social Media========== 🎤 Discord:https://discord.gg/PkMRVn9 🐦 Twitter: https://twitter.com/TechOverTeaShow 📷 Instagram: https://www.instagram.com/techovertea/ 🌐 Mastodon:https://mastodon.social/web/accounts/1093345 ==========Credits========== 🎨 Channel Art: All my art has was created by Supercozman https://twitter.com/Supercozman https://www.instagram.com/supercozman_draws/ DISCLOSURE: Wherever possible I use referral links, which means if you click one of the links in this video or description and make a purchase we may receive a small commission or other compensation.
Transcript
Discussion (0)
Good morning, good day, and good evening.
I'm Azulija host, Brody Robertson, and today we have...
I think you might be the first kernel contributor I've had on the show before.
Probably.
Oh, cool.
Definitely the first meta employee, that's for sure.
Welcome to the show, David Vinay. How's it going?
It's going well. Thank you so much for having me.
Yeah, glad to be the possible first kernel contributor.
In terms of the order of titles, i usually put that one before meta employee but uh but they're both
they're both true yeah but yeah glad to be here yeah i was just confused why you reached out to
me like because you know most of the time i will reach out to people about things hey like oh you
want to come talk about this usually people don't come to like a giant essay about the things they want to talk about
Well, yeah, I mean you have a sizable audience
I actually found your channel because you did that video on the the six eight release cycle and how like Linus's power went out
and there was that regression in the scheduler because um
There was some like firmware issue with AMD chips or the frequency
governor. And so, yeah, I have, like I've, I mentioned to you on the email, I have a Twitch
channel, twitch.tv slash ByteLab that I'm trying to grow an audience for. So I thought, Hey, yeah,
this guy's obviously following kind of the low level stuff. So it might be nice to come on the
podcast and chat. Yeah. I'm, I'm excited to do this. So we'll see where it goes. Whilst I certainly have an interest in that side of it as well,
my knowledge is fairly surface level.
I'm not a kernel contributor myself.
I will dig through the main lists and see what's going on.
When there are certain terms that come up,
like referring to, I don't know, Skeddy XT, for example,
I'll go search documentation, find out what that's all about.
But for the most part, like, you are much more in the weeds than I will ever be with this situation.
So I guess probably the best place to start is what is it that you mainly focus on?
Because it seems like there is definitely a trend with what you are doing.
focus on because it seems like there is definitely a trend with what you were doing. So the first thing I focus on when I first started working on the Linux kernel, which
was not the first kernel that I worked on in my career, I started working on live patch
because at Meta we had an issue where when we were rolling out live patches, and for
those of you who are watching, a live patch, if you're not aware, is a way that you can
patch kernel text at runtime to fix bugs.
For example, if you forget to drop a lock
or you have a memory leak or something like that.
And we were noticing that when we did that TCP retransmit events,
we're going way up for like the few seconds it took to do the patch.
So anyways, I noticed that and I worked on that for a little bit.
I fixed that bug and that was kind of my test to get into the team.
And since then, I've been focusing almost exclusively on the scheduler and bpf and my day is usually a mix of um adding features to sked x running benchmarks and trying to tweak
things um to make it better uh i spent some time also in bpf like i mentioned on the the
standardization for bpf i'm one of the
two co-chairs so reviewing documentation uh which is which is fun but yeah you know it's it's as far
as engineering gigs go it's it's very it's very like engineering focused like i don't really have
too many meetings which is nice and uh it's usually just yeah just hacking on on the scheduler bpf
well for most i assume a lot of people watching
this do have some sort of like technical background but i don't know how many people
really know about like these internals of how a kernel works so what is the scheduler and what
does that actually do sure yeah so when you think of a system you what are you doing on your computer
you're on a web browser you're on a word, you're doing a bunch of things at the same time. But the resources of your system are finite. You only have a certain number of cores or logical CPUs. And so the job of the scheduler is to decide who gets to run where and when.
For example, if you have two cores and you have your web browser and your word processor,
maybe the scheduler will say, oh, these guys both get to run in parallel on these two cores.
But in reality, obviously, there's way more threads than that in the system.
So the scheduler decides who gets to run where.
It's a complicated problem because you have to deal with hardware issues.
Like if you have a thread that's been running on a core, you probably want to keep it there if you can because it might have better cache locality so its accesses will be faster because on the chip there are these small
really hot caches that it wants to read from um but if you keep it on the core for too long then
you might have another core that that could have run that thread that's just sitting there idle not
doing anything um so you know that's sort of the problem space. And then also with the default scheduler, especially EEVDF and the kernel, fairness
is a big problem.
So you want to give everybody kind of their fair slice, their fair share of CPU.
So how do you kind of balance all of these heuristics while also making it generally
fair?
That's kind of the goal.
So the way the, from the documents that you sent me, previously up until what, 6.4, 6.6 something in that area,
the scheduler was the completely fair scheduler,
the CFS scheduler.
And now it's using this EEVDF,
the earliest eligible virtual deadline first.
So, which is a really long name.
I'd see why you said EEVDF.
It's a lot easier to say.
Right.
Which is a really long name.
I see why you said E-V-D-F.
It's a lot easier to say.
But what is like,
I know obviously explaining the intricate details of how they are different
will probably take you the entire episode,
but at like a surface level,
what is fundamentally different
about these two approaches?
So yeah, that's a great question.
I think if you get into the weeds, it definitely takes a while to explain, but I think you can, you can think about
the scheduler like this. So if you have one CPU and you have all these threads that want to run
on it, the basic idea is you want to count how much time each thread has run on that CPU. And
you want to give the thread that has run the least amount of time, the next slice of time to run.
the thread that has run the least amount of time the next slice of time to run and that's called v runtime it's like virtual runtime um and that value so so if anybody's ever heard of uh thread
weights or thread niceness the way that you apply that is you scale how much run time you accumulate
as the thread runs depending on its weight so it's inversely scaled so if you have a really high
weight you divide how much time you're accumulating by that.
And essentially, that's kind of the idea.
It's fair because you're giving whoever has run the least amount of time scaled by their weight the CPU next.
So that's really about bandwidth allocation, like who gets to run, how long do they get to run, etc.
But there's another problem in scheduling called interactivity, where you want to be able to give applications that have latency sensitive
requirements. If you're going to play a game and you're rendering some frame, you probably need to
render the frame quickly or else it's going to look jittery. Same with calls and everything like
that. And so there was stuff built into CFS stuff built into cfs to enable kind of more interactive
workloads to be to be given the cpu more more easily but the core difference with evdf and cfs
is that um this deadline eligible deadline that you mentioned that's kind of where the interactivity
comes along and i'll try to give like a really brief overview i'm still confused by this so if
you're watching and you're confused don't feel bad. But the idea is you have the same V runtime that you had with CFS, which is used again to
count how much time you've run for bandwidth to see who gets the CPU next. But in addition to that,
you have what's called a deadline. And if you want to run for, let's say, 20 milliseconds,
your deadline would be however long you've run, your vrun time, plus 20 milliseconds.
And the idea is the scheduler schedules whoever has the earliest deadline first.
So if you have really short windows where you run, you only run for like 100 microseconds,
your deadline is way sooner.
So the scheduler is more likely to pick you first because it's not just how long you've
run, but it's also like, when is your deadline, so to speak.
And that's kind of more the interactivity part of it um there's a really good lwn article that that explains it um and maybe
more intuitively than i am because the lwn editor is has been doing this for quite a while um but
yeah that's kind of what i would say like the highest level explanation so i guess with cfs
that came from a time when was it it was was like 2006, 2007, somewhere in that range it was added to the kernel?
You had cores that were much more homogeneous.
So you had, you know, quad core systems that each had the same cache topology.
You didn't have as many NUMA machines.
And migrations were usually more expensive because the cores were spaced further apart.
And so you mentioned this project SkedX, which allows you, I'm sure you're going to go into this a little bit.
It allows you, yeah.
The TLDR is, yeah, CFS was in a very different time, in my opinion. Hardware is way more eclectic, let's say, than it used to be.
So scheduling is more important than it used to be as well.
Right.
And I guess the kinds of workloads that we do on Linux nowadays
are also fairly different as well.
Just a common example, gaming, for example,
that wasn't really a use case for Linux back in 2006.
There was a couple of open source games,
but not like we have today.
Right, no, it was a meme back in 2006.
And now it's like we have the Linux Steam Deck.
So Steam is like dead serious about Linux on gaming on Linux.
And yeah, I mean, it's a big scheduling problem.
There are people that are working on SkedX
that are looking closely at that problem
specifically as well.
And to give you maybe a concrete example,
so I don't know if you've ever played Factorio
or if any of your viewers have ever played Factorio,
but it's a game where you have to build this huge factory
that does all this stuff in parallel.
So it's a very like parallel heavy game
where you have to have a lot of computing power. um i think it was a non-tech did a benchmark where he ran
on the the 7950xd as i think the the amd cpu that has um it's kind of wild it has a v it has a 3d
v cache sitting on top of one of its two uh l3 caches so if you imagine there's two different
groups of cores on the CPU,
there's a cache sitting on top of one of them, which means that that set of cores has better
memory access. There's more cache around it, but it has to throttle itself more often because
heat is actually trapped by that cache. So it's a really crazy scheduling problem where on the
one hand you have better locality and the other you have better CPU. And on Windows it ran like way,
way better than on Linux because, you know, the Windows scheduler is kind of, I guess it was more
amenable to this type of workload. And so that's the kind of thing, that's an example of the kind
of thing that we can do better in the modern age. So besides the interactivity problem, why would someone even
care about changing the schedule? If it's this generic scheduler that just works well enough,
like what sort of improvements would someone want to make to that to better suit their workflow?
So, well, yeah. So if you're talking about like the average person
that's just using Linux,
you probably...
Definitely not, yeah.
Yeah, well, they're probably going to be just fine
with the EVDF for the default,
but there's a few different types of users, I would say.
So I'll give you an example from Meta.
So we, with HHVM,
and if I'm going too deep into like the crazy weeds,
just let me
know um but with before you move on hhvm what is that one uh hip-hop vm so that's the that's the
php jit engine that we use in meta to run our web workloads um if you laugh when i said php that's
totally fair but this is um this is a new type of php that's statically typed and has tons of optimizations for JITing.
So it's actually really fast now.
But one of the interesting things about JIT engines and compilers as well, actually, is that they have really, really bad instruction cache locality.
Which means that they're not really doing the same code a lot in a row.
They're going to this branch and this branch, and then they're compiling this code over here, especially with JIT engines. And so they have really, really poor
front end CPU locality is the term for that. And that means that they also have really poor IPC,
which stands for instructions per cycle. And so a lot of the time when you're writing system
software code, you want to try to use the CPU as efficiently as possible so that it can pipeline
things and do a bunch of things at the same time. But with something like a JIT engine, it's really hard
to do that because it's just not really possible if you're basically having to decode instructions
every time you're doing something. So in such a scenario, something like CFS, which is quite
sticky because again, it was built in a time when you had um when you had a cores that were further apart and it was more expensive to migrate it doesn't really lend itself very well to that
philosophy of stickiness because um you actually just want to throw that thing onto a cpu and just
just let it go like you want to be able to run this thing as fast as possible maybe keeping it
on the same cpu for cache locality like might be okay but if you have a cpu and you're waiting
around then you should just throw it over there so um that's i sent a patch set for that actually upstream that hasn't been
merged yet but that's an example of like where we want to make the scheduler more work conserving is
the term for that so it's it's it's just doing it's it's erring on the side of doing more work
as opposed to kind of improving locality or these kinds of things um and there's i mean there's so
many things
like we have, we have a ton of SkedX schedulers already that are all, that are all cute and
eclectic in their own ways. Um, and I can certainly give you really interesting examples if you're,
if you're interested in more. Yeah. If you have some, before we get like deep into what SkedX is
specifically, if you want to give those examples, we can do that. Sure. Yeah. So here's another one.
So, um, so VMs are interesting interesting. If you're talking about a VM,
the way that the scheduler views a VM is by what are called vCPUs, so virtual CPUs.
And so in the guest operating system, you have obviously whatever threads have spawned in this
guest OS. But from the perspective of the host, the threads of the VM are just its actual CPUs
that are running, which kind of makes sense if you think about it, right? Because the guest OS has CPUs, it's scheduling stuff on those CPUs,
but it's the actual host OS that decides when those CPUs get to run, right? It's multiplexing
the physical CPUs to these virtual ones. And that's fine if you're working on like an
overcommitted environment, which is obviously not uncommon at all for VMs. But for a lot of
workloads like on AWS or on a lot of of cloud providers you can imagine that it actually
might be better to give us a vcpu uh an actual physical core and just turn off timer interrupts
basically do everything you can so that you never interrupt the guests at all um it's a little it's
pretty expensive to exit the guest it's called a v VM exit and it's there's hardware is doing a lot of stuff
It's it's not cheap to do so you want to try to avoid that so
You could for example build a scheduler where all scheduling decisions are made from a single core where you're not running a guest vcpu
And you just let the vcpus burn on the core
you're not no timer interrupts nothing that would pull it out of the the guest and
If you need to actually do a resched you're like oh something needs to run there there's like a k thread a kernel thread
that needs to do some io or something like that then you can send what's called an ipi inter
processor interrupts and there's a specific one called a resched ipi it's designed for making it
do a resched and you send it from the one core that's actually doing the scheduling and kind of
organizing everybody and that works right because you don't really need to do very many scheduling decisions in real time like you would with a normal scheduler and you also you kind of organizing everybody. And that works, right? Because you don't really need to do very many scheduling decisions in real time, like you would with a normal scheduler.
And you also, you kind of take the scheduler and the host out of the way of the guest. And you can
actually have really big speed ups in cloud environments by doing that. So when you're
dealing with these, like dealing with schedule, when you're, when you're at the point of dealing
with scheduler problems, you're sort of at this point where you've optimized the code
and you've got this giant deployment.
It's like, okay, how do I further optimize it from here?
Like, how do I get the absolute most out of the hardware I have?
You probably wouldn't approach the problem
anywhere before that point.
Yeah, so, yeah. that's funny you should ask that
because before I was doing this kind of work,
I kind of was like, what is left to do?
Like, for example, something I hear a lot from people
is like, oh, kernel compile is like optimized completely.
There's nothing left to do for kernel compile,
which is untrue.
We could do better for kernel compile, believe it or not.
And so how do you go about fixing
or improving something
that's been like banged on repeatedly and relentlessly for so long?
Well, the good news is hardware is extremely, extremely complex,
which gives people like us a job.
And so to give you an example, earlier I was talking about the front-end CPU pipeline
and how that is sort of really slow when you're
doing like a JIT engine, JIT workload. Well, to go into a little bit more detail to kind of make
that make sense, one of the parts on Intel CPUs, at least, one of the parts of that is called the
DSB, the decode stream buffer, which takes instructions like x86 instructions and will
compile those into microcode, into into RISC instructions, which are
actually what run on the back end of the CPU. And it'll cache those so that it doesn't have to do
that decode every time. And that's one of the things that thrashes a lot if you're doing a
JIT engine. But that thing, this decode stream buffer, if you look at the Intel CPU for the Skylake
microarchitecture, it's like, okay, if you have three,
it's actually like this well specified or like this no logic specified.
If you have like three unconditional branches within a 32 byte window,
then like you always, you always have to use a new entry in the DSB.
And there's all these like really particular hardware specific things that
have nothing to do really with the workload.
And there are times where like, you might pad out your code by like a little bit, and then you get this huge,
like 10x speed up on your on your program. So that kind of thing, you know, is is always there's
huge, I think, I mean, a lot of the time, there's huge gains to be made from that.
Now talking about the scheduler specifically. So we've talked about sort of like hardware,
hardware related stuff, but there's also a lot of software related things that we don't really
take into account now with CFS or EVDF, but in the future, and I'll talk about this more when we get
to SkedX, but you can imagine if you have like a service where you have one thread doing a lot of
IO that's reading an RPC message or remote procedure call message.
So I got a request, you know, like you want like a cat photo.
I'm going to take your message, demarshall it into something that I can run on the server
and then pass it off to like a worker thread to actually do the work.
Well, you probably want to put the worker thread on the same core as the IO thread because
the IO thread just pulled the whole message into cache, right?
So it's hardware related still, obviously, because it's still talking about caches. But it's really kind of, you have to understand the application. It's more of like a,
you're kind of getting a little bit higher level than the kernel even at that point.
And so I'll stop monologuing in a second. But in general, when I'm just going in to attack a problem, I'll usually first look at
IPC, instructions per cycle, which is kind of a good metric of how well are you using the core.
And then I'll start to look at what else is kind of going right. If it's lower than I expect,
all right, where are we stalling? Are we stalling in memory loads? Is IO the bottleneck? Are we just
compute bound? We just need more cores. And it's sort of a process of elimination from there
so one thing you did touch on earlier in there that i want to get back to is
it sounds like when you're dealing with these scheduler problems it's going to be very
platform specific like if you are dealing with like aer problems it's going to be very platform specific
like if you are dealing with like a skylight cpu if the architecture changes a bit with the next
generation you might want to restructure how that scheduler is being used to better suit that
specific platform you move to yep that's right so a lot of it is compiler stuff, like compiler level. There are a lot of
bugs you'll see like with LLVM or with GCC where it's like I was saying, oh, if we pad this out a
little bit more when we emit the code, it's much better performance on this microarchitecture or
that one. But it's absolutely also a scheduling problem. And a good example of that would be AMD,
I would say. On one of the earlier Zen microarchitectures, you had what's called an AMD Roam.
And the AMD Roam had a very different latency distribution for accessing memory outside
of your L3 cache compared to the next generation, which is called Milan.
And now the modern generation is called Bergamo.
And things are just getting huge and huger and huger.
now the modern generation is called Bergamo and things are just getting huge and huger and huger.
So, you know, with all of these things, like for what I was saying about like aligning text and that kind of thing, that's really more of a compiler problem. But, but the point is, right,
like all of these little things that there's so many details like that in, in a, in a CPU and,
you know, you have to just like every part of the system kind of has to play, but the scheduler is a really, really big part of it.
Yeah, especially for something like these big Bergamo machines
and AMD that have like hundreds of cores and stuff like that.
So I guess we probably should get into Sked EXT then.
I'm sure there's going to be a lot of terms
we need to go over as we do this.
No problem.
At a high level, what is Sked EXT and what problem
is it trying to solve? Okay. So Sked EXT is a new pluggable scheduling framework.
And the idea is it lets you implement your own scheduling policies, your own host-wide
scheduling policies, and implement them as what are called BPF programs now first of all when I say host wide what I mean
is these are threads if you are running in CFS or eVDF you instead bring them
over to your scheduler if you have like what are called real-time threads or
deadline threads these threads that are running these higher priority scheduling
classes those don't those don't run in your scheduler.
They stay in the kernel. But you migrate all of the other default threads to yours.
Now, so that's the host-wide part. The BPF part. So BPF stands for a Berkeley Packet Filter.
And that originally, if you ever heard the term, you may have heard it in terms of packet filtering,
as you might imagine, which was added to the kernel quite a while ago. But since then, something called what many people call eBPF, extended BPF, has been added,
which is completely different. And that lets you run a JIT inside the kernel, which is insane when
you think about it. But there's actually an instruction set for BPF. There's BPF-specific
encodings for instructions. There's a backend for the LLVM compiler,
where if you have C code, it'll compile the C code into BPF instructions. And then
at runtime, throughout the runtime of the kernel, the kernel will read that BPF bytecode
and emit x86 instructions or whatever architecture you're running on to run the actual program
natively in the kernel. And there's a couple of things that are special about that. So first of all, kernel modules also
do something similar for anybody who's heard of those before, but they're very different than
BPF programs for a number of reasons. For example, BPF programs can't crash the kernel.
The kernel will statically analyze the BPF program in a component called the verifier.
And if the program could be unsafe,
like you're reading a pointer that you shouldn't read,
or you're not dropping a reference,
like you have a memory leak or something like that,
it won't let you load the program.
And it also has all sorts of ways of interacting
with user space from your BPF program.
Like there are these data structures called maps
where you can have in real time shared memory that you can write and read from both user space from your BPF program. Like there are these data structures called maps where you can have in real time shared memory
that you can write and read
from both user space and kernel space.
And so obviously there's a lot more to say,
way more to say about BPF.
But the basic idea is it's a safe environment
for running dynamically loading
and running programs in the kernel.
And SCEDx is a framework that uses BPF
to implement host-wide scheduling
policies that are also safe and can't crash or can't hang the machine either. Now, as far as
what problem is it trying to solve? Well, so EVDF, CFS, these are general purpose schedulers.
They do really well for what they do, right? They're general purpose. They're fair. They've
been worked on for many, many years and they're, they're fair. They have a lot of, they've been worked on
for many, many years and they're very well optimized.
But there's a few drawbacks.
For one thing, I don't know for any of the viewers
who have ever done kernel work before,
you know how much fun it is to compile a kernel,
reinstall it, reboot it.
And then you have a bug where you crash something
or you corrupt your disk or your file system.
And you're like, great, I have to do all this again.
So the safety aspect is really nice.
You know, for a BPF program, you recompile it.
It takes like two seconds to compile.
And then you just rerun it.
And the kernel loads it, starts running it for you.
It loads it in, does everything under the hood to transition the whole system to using it.
And it just runs.
So, you know, for meta, if we're, if we're like running an experiment on, on thousands
of hosts, it's, it's just not even an option for us to do like this iteration where we're,
we're loading a kernel onto thousands of hosts, waiting for the caches to warm back up,
then doing measurements and oh, a crash or like what's different about this and whatever. So,
so it makes it as simple as like, you know, testing a regular user space application. Oh,
you just compile it, run it, it just goes. It's literally that easy, yes.
And it's, I guess, a little bit different
because it has host-wide implications,
but in terms of the iteration time,
yeah, it's exactly what you just said.
And so the other big problem it solves
is that you do leave a lot of performance on the table
for a general purpose scheduler in
certain scenarios. And so, you know, for us, we have metas as an easy example. We have a lot of
large services that are kind of monoliths, like web, stuff like that. And so there's just too much
scale for us to kind of leave the scheduling on the table, the scheduling benefits we can get on the table. And, you know, this allows us to build scheduling policies that just
aren't appropriate even to be merged into a general purpose scheduler. So things that like would never
be able to get upstream, we can build them in SkedX as well and use them internally. And then,
yeah, the crazy ideas like the, like the, like cloud computing thing that I was talking about earlier,
that stuff as well, it enables you to do that.
So you could make your own scheduler
even without something like this.
It would just be a lot more of a slow iteration process
that would just not be suitable,
especially in cases like this.
You could.
I wouldn't recommend it because the API for building these schedulers in the kernel is very complicated.
And it requires you to understand kind of the core logic of the scheduler.
Like callbacks will be called in different contexts.
And you have to understand what context you're being called in for something to make sense.
So you can do it
like Google's written some schedulers like lots of companies have but if this is something that
you're interested in I I would I mean I'm biased but I really wouldn't recommend doing that I would
recommend looking at skedx it's going to be a lot easier um the callbacks are like we also tried to
make the callbacks and the API like much more kind of intuitive and reflecting the policy instead of kind of the system around it, right
so
Is there some sort of performance overhead of this approach obviously is going to be some but like is it a
Obviously if you wrote your own you would be you know directly interacting with it. You wouldn't have this extra thing here
Maybe I'm explaining this badly
but like what sort of overhead does come with SkedXT?
There's a much better way to say it.
Oh, that's a really, really good question.
And there is an overhead.
So when you go with something like with a BPF scheduler,
you have to take the overhead
of going through the BPF interface,
which means doing like indirect calls and stuff like that.
So there's certainly an overhead to doing it in SkedX.
It's really minor from what we've seen.
It's like a couple tenths of a percentage overhead
relative to just using a native scheduler.
Now, sometimes it's pretty hard to get over that hump,
depending on what you're doing.
And certain things, you know, something like EVDF
is like super well suited for it.
So there's just no point in even trying
unless you want it to build it in SkedX itself.
But yeah, it's a couple of tenths of a percent.
So, pretty low.
And the reason it's so low,
something I should probably make clear is that
BPF is not a user space framework.
Like when you implement a scheduling policy in BPF,
the kernel is actually calling directly into your program and staying in kernel space. You can build user space components
on top of that. And we actually do have schedulers where we have like load balancing done in user
space, but the hot paths, everything stays in the kernel. There's no, there's no like handshake with
user space or anything like that. And so the overhead is really minimal. And the trade-off
obviously works out quite well in a lot of scenarios. For anyone unclear about it,
what does a load balancer do? Another great question. So I'll try to give a quick overview
of this as well. So if you imagine load is, in simple terms, load is just how much stuff is
running on a system.
So if you have two threads that are always runnable, then you might have load of 200
because the default weight for a thread is 100 and load is weight times basically how
long the thread can run for.
Now, if you imagine a really complex system, which obviously most of them are, even if
your machine is sitting idle, there's K threads running and all this stuff.
The goal of a load balancer is to balance load across the system. And the thing I said earlier
about EVDF where you have this V runtime per core, where you count how much time each thread is run,
that's from the perspective of a single core. So each core has its own run queue and has its own
counter V runtime. And so within a specific core, everything is fair. But when you go between cores,
and especially when you go between
what are called scheduling domains,
so like between cores that are grouped into L3 caches,
at that point, you have to use
this sort of higher level view of load
to try to balance the system.
And that's kind of what,
yeah, that's what the load balancer is doing.
Right.
So I don't know where I was going to go with that actually
well so I can go into a little more
yeah if you want to
I had something Garrett
dude it's so complicated
so yeah
like okay you imagine you have two cores
four threads
and three of the threads are running
you know a third of the time actually let's keep it simple they all run 100% of the time and running a third of the time.
Actually, let's keep it simple.
They all run 100% of the time, and they all have the same weight,
and they're all in one core.
And that means that one of the two cores has load of 400
because you're just adding it up, and the other one has load of zero.
And the goal is to distribute this load evenly, right?
Every thread should be getting its proportion of compute capacity
relative to its proportion of
load. And so what I mean by that is if the total load in the system is 400 amongst these four
threads, each of them have load of 100, which again is weight times how long it can run,
weight times duty cycle, then they each get a hundred over 400 equals a quarter of the compute
capacity in the system. And so there's two cores. So each of those cores should get 200, should be responsible
for 200 load each. And so the load balancer would say, oh, there's 400 on this core, zero on this
core. They should each have 200. So I'm going to move two of the threads over and now they each
have 200 and the system is fair. The system is balanced. That's essentially what it's doing.
So, okay. So the load balancer is there to make sure the work is distributed across the different threads
and then the scheduler is there to make sure the work that is there gets a suitable amount of time
for those individual tasks. Yeah. So the scheduler is both parts of it. The scheduler's job is to
both distribute load amongst cores and also to ensure fairness on a specific CPU or interactivity. If you look at the actual
scheduling code, the load balancer is kind of in its own thing. You do it after some amount of time
or when a core is going to go idle, it might pull load onto the core. But both of those things are
certainly part of the scheduler. For example, know, for example, like the scheduler has to scale to like 1000s of cores for some huge machines. So you'll accumulate load within a
specific within a single core. And then when you load balance, you'll sum the core, sorry, you'll
sum the load between them, you know, from one core, whichever one is load balancing. So yeah,
it's it's confusing, because it's sort of they're both kind of related to fairness, but they're very
different ways of looking at it. And there's very different problems with each of them.
But both of them are the scheduler.
You mentioned that someone could implement
the load balancer in user space.
Why would somebody want to do that
instead of doing it in kernel space?
So that's another great question.
So there are advantages and drawbacks of doing it.
So if you do everything in the kernel,
the advantage is, of course,
that you don't have to go to user space, right? you do everything in the kernel, the advantage is of course that you don't
have to go to user space.
You do everything in the kernel.
It's all fast, it's all right there.
But the kernel has some enormous drawbacks.
For example, you can't do floating point math in the kernel.
You have to do only fixed point math.
The registers for doing floating point aren't used in the kernel.
If you wanted to do like load load balancing i was saying you know
we're doing division right we're talking about the proportion of load that one thread is using
across the system and so like this is this is like a floating point like everything is done in
percentages and fractions so um it's it's nice to do it in user space because that's kind of the
component that's really complicated and the complexity of doing it in the kernel you probably
could do it in bpf um but but you know you're only running the load balancer like in this and one of
the schedulers we have you only run it once every two seconds you don't really need it to be in the
kernel right um and you know you can do you can really go crazy like balancing load is one thing
but you could look at um you could look at like, if you have asymmetric
CPU capacity, like one of them, like the one that I was talking about earlier, the 7950X3D,
where you have the V cache, and then you have the other thing, like all of these things, you could
just, you can model in whatever way you want. You can do machine learning from user space and make
predictions. You can classify like what thread, what quality of a thread would maybe suit it better for one domain or the other.
So to summarize the core algorithm, you probably could do it in user space, but it's just very
limiting.
It's a very difficult environment to program in.
Oh, you mean you could do it in kernel space?
I said, yeah.
I meant, sorry, that you could do it in kernel space.
Yeah.
But user space is easier.
Yeah.
Thanks.
Yeah.
Okay.
I'm sure most people have heard the terms user space and Yeah. But, but user space is easier. Yeah. Thanks. Yeah. Okay. I'm sure
most people have heard the terms user space and kernel space before, but we should probably just
quickly explain what that is as well, along with why there is this issue with like swapping back
between them and why that comes with some sort of performance degradation. Sure. So user space
is the part of the computer that you're using. Like when you're using a computer. Like when you're in a web browser, that's a user space program.
When you're using SSH, that's user space.
And the idea with user space is every process has its own virtual address space, right?
It has its own kind of virtual fake view of memory.
And that's uniform for every process in the system.
Your job is to do something, whatever the program is doing, and that's about it. Excuse me, kernel space is the component of
the system that manages all of that stuff. So, you know, in reality, memory is not virtual,
right? Memory is physical. You have some amount of RAM on the system. And the kernel has to map,
is the term, virtual memory to this physical this physical
memory this ram um you you have something like the current the scheduler excuse me where you're where
you're deciding which which threads which processes get to run on which cores you know this
is something that has to kind of be in the core of the system and so if you imagine that user space
everything is its own process its own application the kernel is like that the kernel
has its own monolithic address space every thread in the in the kernel is is in the same little
sandbox but that sandbox is like the management of the system it's it's distributing resources it's
it's multiplexing things on on finite resources um and it's it's the privilege component right
you wouldn't want one malicious thread to be able to give itself
all of the runtime in the scheduler.
So yeah, I mean, that's a very high level description,
but hopefully that makes sense.
And sorry, you also asked like about the transition
between the two.
Yes, yes.
So when you go between user space and kernel space,
that's an operation where you're changing address spaces,
you're changing privilege levels, all this stuff, right?
Like when you're in, so I don't know if anybody ever heard
of these horrible vulnerabilities called Spectre Meltdown
that happened a few years ago.
Yeah, those were fun.
I hope they're proud of that.
Yeah, so for taking Meltdown,
because that's a pretty easy one to talk about.
So that was a bug where,
so the kernel memory is protected, right? Like if you have a user space process,
it shouldn't be able to look at the memory of another one. Like that's a secret, right? Like
that would be a bug if you were able to read some remote processes memory, which makes sense.
And so when you go between user space and kernel space, a lot of things are happening in hardware
that change the execution context. Your registers, the user space registers are being saved on the stack.
You're changing what the instruction pointer to point to somewhere in the kernel.
You're loading kernel registers.
You're probably going to change to a kernel thread stack as well.
You have to copy memory from user space.
Again, you're changing privilege level.
So all this stuff, that's called trapping into the kernel.
That's the term for it.
All of this stuff happens every time you's called trapping into the kernel. That's the term for it. All of
this stuff happens every time you go back and forth between the two. And Linux is a monolithic
kernel, right? So like when you trap into the kernel, there's a lot of layers to go down before
you get to the scheduler, for example, it's pretty core part of the operating system. So if you were
to say, okay, well, who's going to run on this CPU next?
I don't even know if it'd be possible to do this.
But if you imagine before you make that decision, you schedule your user, actually, it would
be possible.
You schedule your user space process that's running on this CPU.
You schedule it and it goes, okay, who's going to run next?
These guys, this guy's, okay, we'll run this person, this thread next.
And then it traps back into the kernel. It goes back into the scheduler and it says okay this is the one to run next and
that's the one that you put on the core so that's like that's like way more overhead than if you
just look into a kernel space map right it's you're talking like orders of magnitude more
overhead to do it that way um there actually is a sked x scheduler that somebody at canonical is working on
that's doing really really well because he's he's he's been able to really push it pretty far but
in practical terms there's there is a lot of cost doing that but then the issue you have with doing
things in kernel space is you can cause serious issues like, because it is a monolithic kernel,
things might, you know,
really bad code can take down the entire kernel.
Yeah, yeah, really bad code can.
It turns out the code that we thought wasn't bad can also do it, unfortunately.
Now, within a SCEDX program,
within a BPF program,
theoretically, you're not supposed to be able
to take the host down.
And if that would happen,
then it should fail to even be loaded in the first down. And if that would happen, then it should be,
it should fail to even be loaded in the first place. But for sure,
to your point more broadly, absolutely. It's a big kernel space is like,
it's a tricky, it's a tricky place to be doing programming. And it's good,
you know, within reason,
it's good to try to push complexity out of the kernel when you can. And,
you know, if you look at a lot of the kernel,
the kernel algorithms for how it implements stuff,
the heuristics are often probably more simple than you would imagine. For prefetching IO,
I haven't looked at it in a while, but I think it was a static amount that you prefetch.
There's no tracking. I might be wrong about this, but the last time I checked, I think that was
what it was. I don't think there's any tracking of how much are we reading? Oh, we've been reading a
lot. Well, we should prefetch more
because we're expecting it to be reading this whole file.
So all this stuff that you could do with like math
and kind of more complicated reasoning
and models about how things work,
you really don't see that very often in the kernel.
And the scheduler is actually probably the most complex part
of the entire kernel in that regard.
Like how much it has in the kernel directly.
But yeah, in general it's it's uh it's it's not a great place to to be doing that you said theoretically the bpf program shouldn't be crashing the kernel i'm i think it's fair to assume that
there were some issues along the way where they were crashing the kernel yeah sure i mean it
happens you know it's especially like if there's a new release, and there's some big feature that like some, we haven't seen a corner case, it happens. So far, we haven't, I'm going to knock on wood, but we haven't had any like big issues rolling it out to meta. And, but yeah, you know, we were working on it, people are finding stuff, the community around the project has actually grown a lot in the last few months, which has been really cool. And with more eyes on it, you know, somebody, for example, opened a bug today because on the stable release of the
kernel, if you try to use what's called control flow integrity, this feature called CFI, which
basically makes sure that you're always calling a safe function in the kernel, that it would crash
the SCEDX. And it was because some patch set that was never merged to the actual stable kernel,
which it probably should have been, we didn't know know that it wasn't merged and so we just told
people like i just don't use the stable release because nobody's really using it that that often
anyways um but but yeah yeah absolutely there's i think uh anybody who works in any part of software
that tells you that there's never problems is being a little bit disingenuous.
I don't know how I didn't ask this before,
but how long has SkedX been a work in progress for?
It's been a work in progress for about two years at this point.
Another engineer at Meta,
it was kind of his brainchild
and he worked on it for about six months
and then I came onto the project
shortly after I joined the team
and did the live patch stuff.
And it's been what I've been working on ever since.
Yeah, so it's been a good amount of time.
Relative to a lot of other huge open source projects
that are contentious to get merged upstream,
it's not the longest by far. The the preempt RT patches is what everybody talks about
that's taken like 25 years to get merged.
We're hoping that's not gonna be the case.
But it's also been quite a while
that we've been working on it, iterating on it,
building the community around it until,
and then obviously after it gets merged
to the main Linux repository.
One thing I noticed in one of your talks that I don't think you touched on in the talk
was the GPLv2 requirement.
So yeah, so GPLv2,
that's the license that the kernel is licensed with.
And another BPF feature is
it'll look at the binary of the BPF program.
And if that program is not licensed with GPLv2, which is look at the binary of the BPF program.
And if that program is not licensed with GPLv2, which is emitted in the metadata for the program in the binary, then the verifier will just fail to load it if it's a SCEDX program.
That was the reason.
I mean, we added that because we wanted everything to be GPLv2.
We wanted it all to be open sourced. But more than that, there were some concerns in the community that this would maybe stifle
upstream contributions to the scheduler and stuff like that.
And we certainly don't want that.
We obviously still use EVDF internally at Meta, and it's a great scheduler.
So this is just one of our ways of saying, hey, people still have to open source them,
and we can take whatever crazy ideas
that work really well in ScudX
and we can add them to the fair scheduler as well
if we want to.
Well, yeah, that makes sense.
Because I heard that I was confused why,
because I don't think you mentioned
why there was like that,
at least in the talk that I heard.
Yeah, well, it's a tricky topic because people, people have really strong opinions.
Like, I don't know how much you,
you looked at like kind of the conversations on the upstream list,
but some of them got a little heated, some private conversations in too.
And that's, that's fine. I mean, like I get it, you know, there's,
it's not, nothing is ever black and white, but so, you know, if I,
if I didn't mention that it was,
it was because I didn't
think it was necessary. Like the fact that it's GPLv2 is its own fact, right? Like, okay,
now we know that it's GPLv2. And if you have concerns about upstreaming, then okay, you
know, that's fair. But at least theoretically, you know, this should be something that we're
protected against by GPLv2 at least. So we should probably move on to the second thing you mentioned uh co-chairing the bpf
standardization oh yes so before that uh you're doing that with the ietf i i'm sure
the very tech nerds know about the ietf but there's probably another one people don't know
who that is oh you have to be like a special kind of tech nerd to be up in the IETF. So IETF,
the Internet Engineering Task Force, is a standards body that has created standards for
a lot of different parts of the internet, like BGP, Border Gateway Protocol, QUIC,
like a lot of old TCP IP packet formats and stuff like that.
So they're, they're, you know, a very robust, excuse me, well-respected standards body.
And yeah, you know, we wanted to, we shopped around for which, which standards bodies we
wanted to go with.
And, and we had no, I've never standardized anything.
I don't think anybody else in the BPF community had.
So we decided on the IETF
because they had a lot of experts there
to help us do it right.
So what was the, I did see in the document
that BPF is also used outside of the kernel as well.
So I guess that's why it's a concern
for it to be standardized.
So that's one of the concerns. Yeah's there's something called ubpf which is user space bpf um like i was
mentioning earlier bpf has its own instruction set so if you compile into bpf bytecode theoretically
it could run like any other jit you know it could be cross-platform just like a jvm program could
run across any platform theoretically.
So we wanted to standardize for software reasons.
The big reason, though, is because there are hardware vendors that are building support for offloading BPF programs to devices as well.
And that's a big, big investment for companies that do that.
And there have been companies already, like a company called Netronome that's built BPF
offload even without the standard.
But that's a little bit of a special case.
And so we've been hearing from vendors that this is something that they want.
This is something that they kind of need before they'll actually be willing to invest the money in it.
And, you know, like a lot of the trend of the tech industry right now is going towards offloading, for networking, offloading TLS like transport layer security.
So you're doing encryption and decryption on the actual NIC itself,
instead of having to go all the way to the CPU to do it. Yeah.
You know, it's with like,
now that now that we're not doubling our compute capacity every 18 months,
these kinds of things are what we're sort of where we're going. And so BPF,
you know,
I think it's a really nice middle ground
between having nothing and having an ASIC
that costs like a billion dollars to build.
So, hardware is hard.
It's hard for us to predict
what we should be standardizing to accommodate hardware
without kind of making it too hard for them
and also giving them kind of the guardrails
to build something that's gonna be worth it it um but uh but yeah that's the idea it's it's i think
more more than software it's definitely for the hardware vendors but everybody does benefit from
it well that definitely makes sense how long has this been in progress for
oh let's see this has been in progress officially since I think March 2023.
Oh, it's fairly new.
It's pretty new. Yeah, yeah. And so there's, you can actually Google like BPF IETF working group. And we have this this charter where we have all of these documents that we that we're intending to write. Some of them are standards documents. Others are what are called informational documents. So they're basically like, are suggestions for like ABI and whatnot, but they're not actually standards that you need to write. Some of them are standards documents. Others are what are called informational documents. So they're basically like our suggestions for like ABI and whatnot, but they're not actually
standards that you need to follow. But it's pretty new. Yeah. And we're, we're pretty close to
going to last call for our, the instruction set standard, the ISA standard. And that's the first,
that'll be the first document. So we're really excited for that. But, you know you know if that goes if that continues to go well and we find that we need to keep going
then this is this is going to be like a long process i don't know if i'm going to stay a
chair the whole time i don't know if i have the the energy for that but uh but it'll it'll be
yeah it'll be a long long road for sure so has there been any obviously you can't say anything that maybe is you know off the books
not allowed to say it but like has there been any sort of pushback with the approach to
standardization or the idea of standardization or anything like that actually not really um
well so there well there have been some people at the ietf that didn't think that bpf was was the
right fit
because they're more used to standardizing protocols and packet formats and stuff.
They're not really used to standardizing JIT runtimes.
It's a different approach for them.
So there were some people at the IETF that didn't think it was a good idea.
We're still there.
In terms of people in the industry we haven't really gotten
much pushback from anybody on that specific point um and i think i think it's not super surprising
i mean it's for better like even for even for people that don't really like bpf it's just too
big and it's we need to we need to standardize it at this point i mean the de facto standard right
now is just whatever we do in the Linux kernel
and everybody kind of follows along,
which is great for us,
but it's not really like conducive
to growing a global community around the technology.
So I think everybody's okay with it, yeah.
So,
give me a second. I had i lost it again um what was i gonna say uh right so considering the like the history the icf has with the things that they've typically been involved
in standardizing why specifically go through working with them so yeah, we thought about working with a few
different people. We thought about working with Oasis, which is what Vert.io went through.
We thought about doing, just publishing the standard through the eBPF Foundation,
which is a subsidiary of the Linux Foundation. They haven't standardized anything. And I think
amongst the three, it's probably not controversial for me
to say that IETF is definitely the most rigorous
and has like the most kind of oversight and processes.
And so for us, you know, we we wanted it to be a good quality standard.
And you can certainly do that with Oasis.
Nothing against Oasis, but we we just need it.
You know, none of us had experience with standardizing anything,
so we just needed the help like
yeah. you know,
we, we were talking about doing like ISO standards and all this stuff. And we, you know, we just didn't know what was involved at all. And so they were able to come in and kind of give us the
standard side of it. And we got we're giving the technical side. And there are people that are
thankfully experts in both that are that are helping but more than anything it was definitely just uh you know the prestige of the organization the the
the oversight they were willing to give us and kind of the hand-holding they were willing to do
and then ultimately just you know we thought it was going to be best for for the community
so you would say it's very much been a learning experience for you then just trying
to understand how how these standards even really how you would even go about structuring it really
yeah oh yeah absolutely and i mean like a lot of it a lot of it is just doing what other people do
right like we um one of the things we were trying to come up with recently was uh how should we
group instructions bpf instructions in a word called conformance groups and what that means is like if you want
to conform to a part of the standard you have to you have to implement all of the instructions in
this group or else you're not conformant and so do we group atomic instructions do we group division
multiplication what do we do and we we were bike shedding on this for a really long time and then
finally i just went to the RISC-V standard.
And I was like, we're just going to do what RISC-V does.
Okay, like, that's it.
And I think that's actually not an incredibly dumb idea
because, you know, a lot of hardware vendors
are using RISC-V.
And so it makes sense.
RISC-V has conformance groups as well.
And so that's sort of what we've been doing.
Yeah, there's no point just imagining everything up yourself
when nobody here has any experience with doing so.
Like you can see what is already out there.
You can take inspiration from that
and just work from there.
Exactly. Yeah, exactly.
And hope that they had a good,
they made a good decision
and you're not just, you know,
following in their footsteps.
Well, hopefully in the case of risk five at least uh
extremely well thought out yeah absolutely so another thing that you um mentioned in your email
was a thing called uh shared run queue oh yeah so that was what i was talking... I briefly alluded to that when I was talking about this thing
that I sent upstream that does work conservation in the scheduler.
And so this is interesting.
So I was saying earlier that you have this vRuntime notion per CPU, right?
And so the problem with that...
The good thing about that is you can scale well
because everything is happening in the granularity of one CPU so like you don't really contend very much and whatnot
but when you're doing load balancing the problem with that is you have to iterate over all of these
cpus to like gather load and load balancing is really is really expensive and so there's um
there are heuristics in the kernel where we don't even load balance at all if we think it's going to take too long.
And load balancing can decide not to do anything if it doesn't think it's worth it and all
these things.
So shared run queue is a feature where per LLC cache, per essentially L3 domain, we have
just a FIFO queue where when a thread wakes up or when it's enqueued, we put it into this shared run queue.
And then any time a core is going to go idle,
it could just pull a task from that run queue, the shared run
queue, instead of going through the whole load balancing path
and doing the slow thing of iterating over every CPU.
And so that works well for workloads
where you need really high work conservation,
like HHVM,
the JIT web engine that Meta uses.
It might not work quite as well for something where you're doing really short bursts of
work where you need to keep your L1 cache locality high and migrating is just not worth
the overhead and whatnot.
It's kind of stalled.
There's four versions that have been sent upstream. And I'm happy to send you.
I don't know if I sent you v4.
I hope I did.
I don't know.
But it's stalled a little bit because the timing, like the EVDF has sort of changed
the performance profile a little bit.
And I just don't really have time to like go work on it.
So some people at Google said that they were interested.
So I just I put the latest version, I think it was V4 up and I'll get back to it someday
if I have time or somebody else can pick it up
and take it to the finish line.
We have two links in here.
Patch V4, sked, implement, shared run queues.
That would, I'd be looking for.
That, yeah, that's gotta be it.
It's gotta be it.
Yeah.
We're not going to have a look at all of it here it's just
i'll save i'll save reading a text dump a mailing list off for a video yeah that's
yeah yeah exactly um yeah so it's kind of in this like weird not like upstream yet but like
is it in a good state at least or it's just it's in a
good sorry go ahead no no go on yeah i was gonna say it is in a good state it should work um it
should it will do better than evdf on a good number of things i think but um you know like
it could be merged unless there's been like merge conflicts. It could be merged and it should be fine. But, you know, you have to like, scheduler is so core in the system that a lot of the
time you have to really have like kind of a bulletproof case for it, for something to
get added.
And that's, it's not, you know, it's just, it's, it's not going to work well for every
workload.
And so I think if it were to ever get merged, it might be that there are some things we
could do to improve it. Like we could maybe have some better heuristics for if we actually don ever get merged, it might be that there are some things we could do to
improve it. We could maybe have some better heuristics for if we actually don't do this
migration and whatnot. But ultimately, the people that they're just going to have to accept that
this isn't going to work well for every scenario. And you enable it when you want to use it. It's
not actually enabled by default. In the core core scheduler there's something called sked features where you can dynamically enable and
disable things at runtime um and it's disabled by default but even even then it's it's just
you know it hasn't been merged yet it's interesting to hear you talk about kernel problems like that
because it even at the kernel level like there are the same sort of problems like okay this is
just not gonna work for everybody but something needs to get upstream like when i talk about like stuff in the whalen project for
example like there are very similar sorts of debates they're like okay well here's this very
specific case where this doesn't actually address the problem well here's this one group that
doesn't want to do it and it sounds like in the kernel the same sort of challenges are still there oh 100 yeah it's i mean you know at the end of the day it's just
software right the kernel is 100 just software the maintainership model makes things very
interesting no for sure um but but yeah you know like like there was another bug i don't remember
actually what what the bug even fixed but um we had submitted a patch to the scheduler and uh it took like two years for it to get upstreamed and it only got
upstreamed when we were i know and this this is just like a no you say that once again i'm just
thinking of problems that happen in wayland where it's taken like five years for protocols to be
upstreamed oh fantastic yeah and yeah so you get it, in this case, we had to literally write a benchmarking tool
that showed the problem.
I'm sure in Wayland, it's similar.
I'm sure in every big...
Well, the problem is just you have a project that has a lot of people.
Like, you know, when you have a...
When it's like a small application and it's just we have...
Like, when you just have a BDFL and what the BDFL says is what happens,
things go smoothly.
But when you have, you know, 10,000 different cooks in the kitchen, you know, there's going to be challenges there.
There's going to be a lot of bike shedding.
Yeah, there will be.
There will be.
So out of curiosity, so does Wayland, I'm not familiar with, like, the maintainership model.
Do they have a BDFL or is it a communal thing?
So they have, the different desktops have different voting members and they all have the ability to NAC different protocols.
So if there is a NAC from one of the desktops,
typically it's from GNOME. If there's a NAC from one of the desktops typically Typically, it's from GNOME. If there's a knack from one of the desktops
that
If I'm recalling correctly, it cannot be in the XDG namespace, which is the general namespace
But it can still be in the ext namespace, which is the extension namespace
and
Then there's the whole matter of
before something can be upstream,
there needs to be implementations
in at least three different desktops
and three different projects,
along with there being acts
from three different projects as well.
Like it's very much, you know,
there's a lot of people that can vote on stuff.
And when you have desktops that have fundamentally different approaches for how they want to do things, it can get complicated.
Like right now, one of the big ones that people are arguing about, this is going to sound really stupid to you from your perspective.
this argument over whether application like whether application windows should be able to set an icon for themselves along with the the children of that window having separate icons
so if you want to have like a settings window and that's going to have like a little settings cog
if you can set a little icon there that tells you the settings window and there is like a at least 500 message email thread arguing over whether this
should be done the format of the images uh whether you know how how we even do this like what the
protocol should look like it's it's an absolute mess and that's just one of them i i could talk
about a messes all day that project i i actually
think waylon sounds worse than the linux kernel like i mean i don't know the process at all but
we at least have bdfl right we have lenis and ultimately his word is the word that
that matters the most um well good luck i Yeah. Is your...
It gives me great content to talk about on the channel.
Yeah.
Happy to help.
Do you think that they should be able to use their own icons
and their children should be able to define their own icons?
The thing is, it's trying to address a problem
where the issue already was solved on X.org.
So a lot of the issues we have are something can be done on Xorg.
But now that we're moving to Weyland, it's like, okay,
now we have an opportunity to change it and maybe do something better.
The issue is sometimes it's okay that wheels are round.
You don't need to change the shape of a wheel.
I think they should be able to.
you don't need to change the shape of a wheel uh i think they should be able to the issue and there's a big discussion about how that should be done like what the protocol should look like
and that's totally understandable i don't i don't see any any reasonable discussion happening around
whether it should be possible like that that's just insane to me yeah that's that's a really
interesting way to frame it though that like should yeah basically
the idea of building a new system is that you want to kind of get rid of some of the cruft from the
old one but then if you have you have to get people to use it it has a print server in it you can
print your screen to a printer because i mean it is it treats the output device very generically
so it doesn't care what it is literally anything could be an output device oh god i'm glad i never got into that i feel like i should but i guess yeah i'll let you
guys handle that i look i i'm i'm sure there's at least i actually i have said before that i feel
like whalen would go a lot better if there was a BDFL. You would have issues...
The issue you have there is if they align with what a lot of the users want,
because it seems like Linus generally steps in when people are just doing something stupid.
It's, at least for the most part,
and then it would go like a big tirade, big frantic tirade you get.
We all love those, yeah.
But, yeah, go on, what were you saying uh well yeah i think so he does
step in when people are doing stupid stuff but he also he does care about i don't want to speak for
lenis i mean he you know maybe i don't want to misrepresent him but my impression is that he
he does care about how linux is used and so if you have a if you have a tool that's like
care about how Linux is used.
And so if you have a tool that's widely used,
like the
whole GKE stuff with Android
and everything like that, there are times
where he'll come in and he'll say, guys,
figure out how to get this merged.
Enough is enough.
There's no point to keeping stuff
out of tree that's used everywhere.
Upstream should
roughly reflect how this is used
in practice. And so I think his voice of reason, he's great because he's extremely, obviously,
he's lean as Torvalds. He's very technically sound. He knows, he can read code very, very well
and understand every part of the OS. And I think he's a good manager of the project. I think,
you know, it's, yeah, he does, he does step in and kind of make,
make calls that are outside the scope of just technical stuff as well.
But, you know, even, even so like he, he's only one person, right?
So there's hundreds of thousands of lines of code that go into the kernel.
So a lot of stuff goes in that he never even looks at. And for us,
I'd be curious how Waylon resolves these,
but honestly a lot of the time in the kernel,
the way that you fix a problem is you go out and get beer with somebody
at a conference and you
have beers until 2 in the morning and you're like
oh yeah okay
we agree we agree and then the next day you know
you send it and they merge it
it's funny you say that because Fostham
just happened and
all of a sudden half the issues
that people had with
the icon thread suddenly suddenly, like,
oh, yeah, so I actually talked to all these people, and it's just like, yeah, it's not
actually that big of a deal.
But there's a lot of people that get very, very heated.
And, you know, oftentimes there's not someone putting a stop to, like, people just throwing
insults at each other.
And once that's happening, then things have completely evolved you're not
gonna get any progress being done because the second you insult someone that's when they're
gonna be like nope i'm right don't care what you have to say you're wrong who cares yeah yeah when
it gets personal it's it's problematic and that happens and that happens in the kernel obviously
a lot as well and it's it's it sucks i mean it. At the end of the day, all of these things are fairly small communities.
I know that FOSDEM is a pretty big conference, but yeah, it is good.
And Linus can also, he does chime in also for personal matters where he's like,
hey, you're hard to work with, get your act together.
And yeah, I guess it sounds like
whaling could benefit from that but i don't know i mean if half the if half of the thing the the
issues on the the thread were solved with with beer i know that beer is a core part of the
fostum itinerary that uh it's a good sign did you get to make it to fostum it cost me about
three thousand dollars to fly over there that makes sense not really i will get
there eventually australia is just it's a difficult place to get anywhere really um yeah that makes
sense i might see if i can do like you know go there journalist funding at some point we'll see
we'll see if anyone's interested in doing that i don't know what their sort of funding is like for
that but i know some of the other conferences definitely uh do have funding that regard um we'll see what happens yeah totally yeah that'd be awesome um
the next ietf conference is actually in brisbane in march so i don't know how close that is to you
but uh i you want to come get beer let's is about 270 to get there oh section here you know
maybe i'll go bad yeah it's uh it's it's not going to be the most i mean fostem is like way
more interesting you know itf is like a lot of people arguing about like bike shedding about
minutiae and protocols but
it's still it's still fun you get to meet people and uh and you know there's the uh
there's the the the beer reviews or whatever you want to call them that happen there too
well you're mentioning how linus is kind of like this this voice of reason he's
it everyone sort of knows like over the years he's very much I guess
smoothened out we'll say the way he interacts with people in the kernel because you know you
go back to like 90s lioness early 2000s like we've all seen the early emails that have flown around
in that thread they were just like absolutely tearing
people apart like just cursing them out like what are you even doing in this project like get out
um i don't know if you i don't remember when uh there was like rust support being
considered for the kernel um but back when that first got introduced there was a line of code
and it was it should not have been
they were using the standard Rust library
and if something
went wrong it like threw
a panic like in
kernel space which is like no
and Linus responded like
this cannot be here like
what are you doing
but like it was it's a lot more tame than it would have been like
you know absolutely just cursing them out like all crazy things um i i think you know i get why
people didn't like the way linus used to act it makes a lot of sense but i i feel like there is
some sort of benefit there having someone that is going to, like, someone there needs to be separate and have that sort of oversight of what people are doing.
You can't just let, you know, you can't just let people just blindly do what they're doing, especially on a project as big as the kernel.
Yeah, absolutely.
And, you know, if there were ever a project where you want people not to make, like, it's in the kernel right because everybody has to deal with your problem if you do that
and yeah like the panic thing you know it's the pers unfortunately that that's indicative of
somebody that doesn't really have experience with kernel work right like you you can't do that you
can't just throw your hands up and like panic in the kernel it's a very different environment
the kernel should never crash ever, ever, ever,
unless there's an actual bug in which case, okay, fine. You like limp to the finish line and crash.
I think, yeah, you know, you need to have real standards, right? Like, like the,
one of the, one of the nice things about the kernel is that there's no manager,
like you don't have to report to anybody. I mean, there, there are people that work in the kernel
that have managers, so they, I guess they still have to kind of watch what they say but
it's um yeah like many many people don't work at the same company and i think lenis
having that sort of like like you just can't bs him and like he won't let bad stuff in
it's a good culture it's it's a good it's a good culture to hand down to the maintainers and have
them apply it in their own ways um i do think that yeah you know he's he's he's he does like he is very direct in a lot of
cases i think also he's like way humbler than i think i would be if i were in his position like
essentially like essentially like as influential as bill gates or like steve jobs or whatever right
and he just kind of chills in Portland and just like,
just like does code reviews for the Colonel,
just kind of crazy.
But he's,
so I'll give him credit for that.
I will say my personal opinion is that I think Colonel work is a little
bit too.
The community could be way more inviting to people.
I think there's,
there's definitely a mystique of like,
this is like elite stuff and yada, yada, yada.
I really believe that at the end of the day,
it's just software and you kind of just have to understand
like what you're working, the area you're working in
and then just build stuff like you would anywhere else.
Maybe not just like you would, but it's pretty similar.
And I think that like culture of publicly shaming people
kind of also has that effect.
Right.
And, you know,
the effect ends up being that you filter out people that shouldn't be there
by any means and people that really would, would provide value.
And at the end of the day, you know, the kernel has been successful.
And I think, I think even if we have filtered out people that,
that would have actually been great members of the community,
clearly it hasn't like hamstrung us, right?
Like it's still a great kernel.
The kernel is just this, it's this weird project, right?
Like, you know, just the fact there's a mailing list
you have to get involved with is the,
like the Bugzilla, which is kind of official,
kind of, there's some maintainers
that don't even like the Bugzilla exists.
I've seen giant mailing list discussions
arguing about the existence of the Bugzilla exists. I've seen giant mailing list discussions arguing about the existence of the
Bugzilla. Just the fact the mailing list is there, like that in itself is this weird thing that a
lot of people just don't really have experience with. You know, Fedora has a mailing list, for
example, and Ubuntu has theirs as well. But, you know, if you're involved in just general user
space stuff, typically the way you interact with a project is through like a
github or a gitlab bug tracker so having to go through this like weird thing that you've probably
never used and then there's like specific etiquette on how you interact on the main list and all of
this stuff like i i get why it's confusing and the fact that it's the kernel right like it it's this
giant project with a million i don't know how
many lines of code it has now probably like definitely millions yeah millions of some will
find the exact number of time we ordered this um it's a very big project and it can be hard to
even on something smaller like a desktop environment it's hard to find like where
your piece fits in like what can you add to it and i can only imagine that problem is even worse on the kernel once you're in there and you know
that you can change it i'm sure it's a lot easier to like work out what you can do there but like
just getting that first step in the door i could imagine being really really difficult
it is yeah it is i mean you know it's interesting because the whole mailing list thing
i think probably more than anything else is actually the biggest filter to people participating
and i think i don't know if this is true or not but i have to imagine that's kind of by design
but yeah it's it's a little bit wonky like you do you email people patches um they reply to your
email with code reviews and then eventually either they get dropped
or the maintainer takes your patch that you emailed them
and like merges it to their local repo.
And then eventually sends it to Linus
and Linus merges it to his repo,
which is the actual like upstream kernel.
It's a really weird model.
Yeah, but you know, it works well.
Um, I think if it is really hard to get started for sure, uh, you have to, you have to, first of all, learn a lot of like low level stuff.
Like how do I even test it?
How do I build the kernel?
How do I test it?
Um, how do I configure it in the way that I want to test it?
I have to like add this K config option to like compile nine P so that I can mount a
host file system and then like run tests. I mean, it's really complicated. tested oh i have to like add this k config option to like compile 9p so that i can mount a host
file system and then like run tests i mean it's really complicated um but you know it's it's it's
that part of it like the the mailing list part and the the the like getting your rig set up kind
of thing honestly it's like probably a couple of weeks i actually wrote a blog post on how to do the mailing list part if anybody wants to get involved. To me, the best thing that you can do if you do want to get involved
is to just do code reviews. Do like actual real substantive code reviews.
You know, if there's an area that you're interested in like MM or BPF or whatever,
just follow follow the
mailing list and kind of try to keep track of what people are doing like read their their patch their
cover letters which describe the feature try to understand how it works and then when you start
doing real code reviews and you can maybe submit some bugs and fix a few things like that then
you're kind of in the in the groove you know and and in that, it's not that dissimilar, I think, from any other project,
but it is bigger for sure. And you have to just find a little piece of it and follow that piece
and start to build your repertoire from there. I guess I have to say this as well. If you want
to send a patch to the kernel, you can definitely send like a documentation fix
up or like a grammar or whatever fix up. Please don't do that more than once, maybe twice there.
There are people that send nothing but like moving commas around and like fixing spelling and stuff.
And like, everybody knows who they are and everybody is like, like eye roll, like, come on,
like, please stop. Like it's so, you know, it's cool to have your,
it's obviously awesome to have your name in the kernel,
but I think it's, the community kind of expects people
that participate to have a substantive ability
to participate and, you know,
like provide code value at the end of the day.
Right, that is understandable.
The documentation one is weird
because someone...
I get wanting it to be a substantive change.
And I don't know, it's weird, right?
Because you see an issue with the documentation
and you do want it to be fixed,
even if it's a fairly minor one.
But I can get it from that perspective as well,
especially if you are doing it a lot.
Yeah, I mean, look, I personally think it's valuable for sure,
especially if you're looking at a subsystem
and you're actually documenting it.
Right, right, sure.
Yeah, so there's a documentation subdirectory,
a subtree in the kernel where everything is documented.
And BPF has a whole bunch of stuff that could be better documented. Please feel free. I will act it. It'll land. in the like subtree in the kernel where like everything is documented and you know bpf has
a whole bunch of stuff that could be better documented like please feel free i will act
it it'll it'll land i promise you that what i was talking about was more like people who
like will fix a typo or like move like punct like fix grammar it is like i do think it's valuable
but people know that you're just doing it to like get your patches in you know and it's
it does have a cost because you do have to have like people with very limited bandwidth review
these things and whatever but uh but yeah i mean it's it's hard to describe like some of the people
that do this also like will like engage in code reviews like kind of like in a confrontational
way and be like you need to document this.
And it's like, you know what?
No, you have no, you have no right to demand anything of me at all.
You know, so yeah, I don't want to misrepresent anything.
It's good.
It's fine.
It's good work.
Again, I think it's valuable.
It's just, you're dealing with people that are, that are kind of cynical by default and they're not going to assume good intent if like you keep doing it you know they're just
going to assume that you're like an attention wannabe whatever sure sure yeah no i get it
it's a fairly similar thing happens around um october with hacktoberfest with a lot of github
repos so you get these a ton of repos just getting these like very tiny changes
because you know some years they'll give out shirts for example if you have a commit made so
it ends up with a lot of projects being like ah here's a comma change here's this here's that
actually a really bad one i saw like this this was insane there was a a YouTube channel. I think they had like 6 million subs.
They got like a 1.5 million views on the video.
They were teaching how to use GitHub, right?
Totally understandable.
And they showed how to make a pull request, issues, and all that.
They showed how to do it on an actual real repo on Express.js.
They use that repo.
And if you go to their repo there is hundreds of
uh pull requests being made with people just adding their name to the readme
what yeah oh that sucks that's super annoying yeah i mean that's that's unfortunate because
it's like it's actually i don't know i haven't seen the video but i assume that they made like a somewhat substantive pull request right no that's the thing the video didn't either no
so in the video they just added the name of the college they were at um oh that's poor judgment
yeah they did say after the fact don't do this but the problem is when you have one and a half
million views you're gonna have a subset of those people who just actually go and do it oh man that would be like extremely bad at the company
that would take down the kernel email servers if people did that yeah i mean it yeah exactly like
and i it's it's it's sad because like you get it right i mean so the hack the hacktoberfest thing
i guess to your point like you actually get like shirts sometimes and people will submit
absolute nonsense i'm sure and i'm sure they even say this is nonsense i just want a shirt in some
of the prs yeah yeah but um but yeah so it's understandable that people want to want to have
their names on stuff but after a certain point like you like i don't know like i don't think it means that much
for somebody to have their name in the kernel like i don't know i know people that have never
submitted a patch to the kernel that have like much deeper systems experience than like some
of the people that only submit documentation patches you know i'm not trying to pick on
documentation i'm just saying like it's at a certain point like i think for you as a person
the royal you obviously it's,
it's like, you're going to want to kind of participate more meaningfully anyways.
So usually I do this at the start, but I guess we just didn't do it.
What's your background in programming like, and how did you actually find yourself doing
this kind of work?
So I majored in math in college and I kind of realized like towards the end of my
undergrad degree that I wasn't going to be a mathematician, which I'm glad that I figured
that out young enough where it wasn't a problem. So I took a web dev course and I got a job as a
web developer at Columbia University for my first job. And I did that for a few years. And then I got into grad
school and did my master's in grad school. I thought I was going to be more like a graphics
machine learning person because I had a math background. But it turns out that I really wasn't
supposed to be a mathematician. And so I ended up finding that the operating system stuff was
way more interesting to me. I like asking the does, you know, I was just, I like asking the question, how does this
work at the end of the day?
And that's kind of the only question that really matters in operating systems.
It's not true, obviously, but you know what I mean?
Um, and so anyway, so I was more interested in that.
Um, I took more like way more OS courses at that point.
And, um, I really liked my kernel course in particular.
We had to build like a full 32 bit x86 kernel, which was a good exercise. Everything other than the bootloader,
which I'm glad they didn't force us to do that. And so my first job out of grad school was working
at VMware. I was on the core kernel team there for a little while, and then I was on the core
hypervisor team both extremely the amazing
engineers on those teams I really learned a lot from them and then but I hadn't really done much
open source work I kind of wanted to but I had done like some web developer open source stuff
but nothing nothing like to write home about and then yeah I went to meta I worked on a an internal
thing for a little while that ended up getting canceled.
And then the Linux kernel team, thankfully, was hiring.
So I switched over to there.
And that's kind of how I ended up in this situation.
But yeah, so TLDR, web dev to grad school to OS
to kernel work and industry.
That's kind of how it played out.
Oh, so the kernel work started when you were already there?
When I was at meta? When I was at Meta, when I like entered Meta. Yeah. Yeah. I'd been doing kernel work for quite a while at that point. Yeah. It was more related to, to like virtualization because I was
working at VMware. So the problems were like, oh, we have like VMs migrating between NUMA nodes.
Like let's migrate memory with them, whatever, stuff like that. But it's all the
same at the end of the day, really. There was an article that came out recently on LWN about
how there's NUMA text replication is something that people are trying to do, and just a very
brief overview of that. So when you're talking about a big computer, if you ever heard the term
multi-socket, that's if you have two NUMAMA nodes is really what multiple NUMA nodes is what's going on.
And essentially what that means is there's like, there are different places where you
have pockets of RAM that are closer to certain sets of cores.
And so you want to read from the memory that's closer to your cores.
And one of the things that people want to do right now in the kernel is replicate the
kernel code and read only data to all of these NUMA nodes.
And that's something that we did way long ago
on the hypervisor side of my previous company.
So, you know, people play catch up.
It's a very different problem in Linux
because you have like live patch and stuff like that.
So it's hard to do this sort of synchronously.
But yeah, you know, the skills translate for sure.
So you've had a fairly interesting career i i guess
it's fair to say then it feels like it um yeah it's it's certainly been a wild ride i mean i
definitely didn't expect to be working on the linux kernel team um at meta but uh but you know but it's been an awesome experience. Yeah.
Kernel work, it's wild.
I'm glad I'm here, but we'll see.
I don't know if I'm going to want to do kernel work for my whole career either
because it's a pretty specialized space.
And I think you are somewhat limited
in what you can do at the end of the day as well.
I think a lot of people just don't realize
how much of the kernel is developed
like from people at companies like meta obviously people know obviously the red hat stuff that's
there but like a lot of you know amazon does a lot of work in the kernel and v does a lot of work
in the kernel amd uh just say amazon maybe I did intel does a lot of work on the kernel like
most of yes there is a lot of people who are like volunteers who are doing kernel
work but a lot of the work is also being done and would the kernel would not be able to be in the
state it's in today if it wasn't for the support it gets from these companies as well oh absolutely
yeah absolutely um yeah i mean for meta in particular uh a lot of the the bigger maintainers
are on the team, like the
multi-queue, like the block maintainer, Jens Axbo and IOU Ring, he's on the team.
The C Group's maintainer, and actually now the other SkedX guy, he's on the team.
So there's, and you know, like Meta, I think, I can't really speak for other companies,
but for Meta, at least, they are given a lot of time to be maintainers like one one could even argue that a certain large percentage of their their their
salaries are basically just donations to the linux kernel community and there's like no expectation
that they do anything um a lot of their work there's no expectation that it has to even have
any benefits in meta whatsoever so yeah you know i think it's i know that people there are people
that have different opinions and that's i I understand it. That's fine, but absolutely it wouldn't,
there's no way the kernel would be anywhere near where it is today if you didn't have companies
throwing tons of, of engineering resources at it. And, you know, yeah, I think it's a good thing
personally. Like it's, it's these, you know, engineers aren't cheap, so it's good to get,
good to put them in an environment
where they can contribute back to an open source kernel
Well we can kind of imagine where it would be
because there is sort of a good example of it
and that's something like the herd kernel
which was very exciting back in the late 80s and early 90s
and then even in the early
I don't know how much you've read like the early
linux mailing yourself or this is just me being crazy just reading things that don't matter uh
but there was a even after linux came out and there was clearly a lot of excitement around like
even after debian came out people like yeah okay this is going to be temporary until the herd kernel is ready. And then the herd kernel,
I think they're just getting 64 bits support in like the last year.
Oh my gosh.
I didn't like,
I think it's like,
it's great.
Right.
Like it's a cool project.
It's fun.
But at the end of the day,
yeah.
Like you have to just invest like these kernels,
these open source projects need, need companies to sponsor people to work on them.
There was an LWN article recently that talked about how after the whole, what was it?
The Log4J thing, I forget what the name of the library was, where there was a horrible
zero-day vulnerability.
Should companies be paying maintainers to work on these
core libraries uh or should the government yeah so the government companies i think i i personally
think companies should i think that you know like if you're if you have uh if you have a company
that's like that needs this tool then you know you should pay somebody to do it i don't know i mean i
don't want to like be reductive and simplify it.
But I think at the end of the day,
the really good big projects usually do have company sponsorship
for better or for worse.
And I don't know.
I mean, a lot of really great ones don't too.
I don't want to be reductive.
But it's just like if you were building a project,
you would want everybody to be using it
and everybody to be contributing, right? And that's kind of how it goes for, for Linux.
And the nice thing is like, you know, Linus, Greg, Greg Crow Hartman, all these folks work
for the Linux foundation. They have no absolutely zero tie to like any of the other companies beyond
their, their relationship with the engineers who maintain the subsystems and so um they you know there is still
like a very deep element of impartiality like lenis is not going to accept something from anybody
at meta if he thinks it's dumb or if he doesn't think it belongs and you don't have to look very
far back to see him like blowing up at people that have like a lot of influence um on the community
that work in meta so i think it's a healthy balance but you know i see everybody's side i see both sides of the uh the argument you mentioned there um there's always these like
little things these great projects that don't have any funding that is just like you know
buy one go whatever you have web dev experience i'm sure you've seen you know very critical
libraries that are used by every company out there that you know one dude just maintains by himself that
it's like 10 layers down the dependency stack that you know everybody needs but nobody even
knows it exists yeah yeah i don't know man that's that is really you're right that's that's a problem
and like i don't know like even if you were to have the government pay for people to work on
that which would also be great and they deserve to be compensated like what do you do if you don't know like even if you were to have the government pay for people to work on that which would also be great and they deserve to be compensated like what do you do if you don't
even if like nobody even knows that it's actually sitting that low right right it's but yeah it's
it's totally i totally agree with you it's not as simple as i don't know and you know log4j was
was a recent one but there's so many more for sure.
But, you know, the other part of it is, like, a lot of these core libraries, even if the person that's maintaining it has all the time in the world, like, it's their judgment as to what gets merged, too, right?
And, like, if you're, like, a maintainer for OpenSSL and you accept something, you're like, oh, okay.
And then all of a sudden, like, you have some stack overflow or, like can read arbitrary memory from on the NIC, then like that's not great.
So to me, there's a certain element of like house of cardsness to the tech industry that's kind of never going to go away.
But yeah, it would be nice.
It would be nice for the people that do that to at least be compensated for sure i'm sure you've seen the the xkcd the uh yeah what number i don't know what number it was
it was like the teeny little like block holding yeah huge yeah yeah that's a good one where is it
i i looked at it just the other day and now i'm not gonna be able to find it am i
uh there was a oh i think i got it yes if it's two three four seven okay thank you yeah
okay okay let's see
two three four is that it is that how the website works? Yeah. XKCD, yeah. Yeah, okay.
Yeah, so all modern digital infrastructure,
all these complicated little applications,
a project some random person in Nebraska
has been thankfully maintaining since 2003.
It's holding everything up.
Yeah, and it's absolutely true.
I mean, at least with the kernel,
it's all very well,
the maintainer ship is
documented in this maintainers file.
And like, it's, there are drivers that everyone's like, who's maintaining that, you know, like
there's a lot of code, but, um, but it's well documented, but yeah, I mean the kernel is
a kernel, right?
Like it's not user space where you have everything that's actually used on the system.
That's building up your whole ecosystem.
And so, yeah, it's, it's unfortunate for sure.
I did do a video on, there was this,
I don't remember what the exact product was,
but there was this Intel,
like some weird Intel hardware that came out in 2008.
They had a driver for it in the kernel.
Some Intel devs did work on it.
No one is sure this hardware actually exists.
So they got the patches in there before it released publicly and is sure this hardware actually exists. So, they got the
patches in there before it released publicly
and they must have canned the project.
So, someone was like,
does anyone actually have this?
Can we just get rid of this?
Nobody was maintaining it. Nobody knew what it
did or why it was still here.
And it was just random.
Obviously, yeah, the kernel is well maintained,
but there are going to be those parts where
You know nobody's touching it or like a little while back there was this culling of a bunch of random old Wi-Fi hardware like
Like 802.11a hardware and stuff like that
Where's just like is anybody actually still using this? Like, can we, do we need this here?
Can we get rid of it?
Isn't anyone sure it works?
Because, like, it's hardware that nobody even knows if any, like, if a kernel maintainer has.
Like, nobody can test it.
Nobody's sure if it's still working.
It's like, so, and they were saying, well, yeah, if somebody has the hardware, please speak up and we'll keep it around if it's still working.
But, like, you know, if nobody's using it at this point,
like, there's no reason to keep it around.
Like, um, a couple years
back there was a discussion about dropping
I'm surprised it's still
supporting the kernel, but support
for the, uh,
Intel 486.
And they're like,
is anybody actually running a modern
kernel on a 486
you know probably is the sad answer to that question though and yeah like it's it's it's
it's tough because the the process for getting rid of it is exactly what you just said like
you kind of ask timidly like is anybody using this like hoping nobody says anything and then
you may you remove it and sometimes people are like like, is anybody using this? Like hoping nobody says anything. And then you may remove it.
And sometimes people are like,
like, you can't remove that.
Like, we don't know.
And then sometimes you get some rando that said,
hey, this broke my build and you have to leave it in, right?
Like the kind of like golden rule of Linux
is you can't break user space.
You can't break old devices or anything like that.
And yeah, I mean, the Intel story is pretty funny.
I wonder if that was the,
oh, what was that feature called
where they had like the secure enclaves SGX
that might've been it.
I think that got canceled.
But yeah, that's funny.
That's such an Intel thing to happen
because like their whole business model
at this point practically is just having accelerators like do stuff faster than CPUs.
And yeah, that story checks out for sure.
With the Wi-Fi one, I believe one of the drivers that was on the block for culling was the PS3 Wi-Fi driver.
And that's one where people did speak up like, yeah, no, I'm still running a modern kernel on a PS3.
Like, sure. Why not? Okay, go ahead. speak up like yeah no i'm still running uh i'm still running a modern kernel on a ps3 like sure
why not okay go ahead i mean the so the nice thing about like the uh the driver model i think is
pretty good with linux where the policy is if you haven't upstreamed your driver then you get no
backwards compatibility guarantees at all um and so you don't really actually have any abi
requirements for the drivers in the kernel which is really nice because and kernels where you do compatibility guarantees at all um and so you don't really actually have any ABI requirements
for the drivers in the kernel which is really nice because and kernels where you do have that
it can be extremely painful but then the flip side of that of course is that if you have an
upstream driver then you do have these guarantees and that's why you're like okay I want to like
change something that's really dumb to do something that's way less dumb but okay ps3 needs it um so it has its drawbacks i mean yeah i think
yeah i i you'd have to hope that like at a certain point i don't know i mean i was gonna say we'll
come up with a policy that's like you deprecate devices after like some number of decades but
486 like i don't know so it's it's it's going to be a problem for the
foreseeable future for sure well deprecation is a weird run a weird one right because you can
deprecate stuff but then there's the issue of can you actually remove it a recent example i saw was
um with grep most systems ship f grep f gre E-Grep. These have technically been deprecated for the past
20 years, but nobody
knows they're deprecated, and specifically they're only
deprecated in the GNU project.
So people just keep using them.
There are distros that are still shipping them today,
and there's this argument like,
okay, it's deprecated, but
well, can we remove it now? But like, there's
all these scripts that use it, and it's
one of these things where it's like
When I was younger, I thought deprecation was a lot easier of a problem, but the second you realize that
This was the same problem that was people worried about with Y2K when there are people that are running
Your software on 30 year old like installations and have not changed anything in that long.
And it's still in deployment. Like it's hard to make changes if they're still using the modern
stuff. Yeah, totally. I mean, it's, I, I, it's funny. Cause I feel like I had the same exact
scenario, like the same exact kind of arc where I was like you just just like deprecate it for a
long time and then if people complain like just tell them to deal with it but well i don't remember
at all what the actual issue was but do you remember like in the last year or so github
had that huge outage because they like deprecated some like i think it was some some kind of like
encryption algorithm they were using i really don't recall exactly what it was, but they were like familiar.
Yeah.
But they were like deprecated,
deprecated,
like chain,
chain,
chain,
chain,
change,
like email,
email,
email.
And then they,
they flip it off.
And like 90% of the world,
like can't connect to GitHub anymore.
And like,
of course they reverted.
Right.
And there's,
I forget what the rule is,
but there's some like,
just like with what Moore's law or whatever,
like there's some law where like once something is sufficiently large, like you can't undo
it.
And I think the kernel is like, and really suffers from that because it's the core of
the operating system.
But yeah, it's a really, really difficult problem.
And the only person who can really force the issue is Linus at the end of the day.
And if you have a user that's going to email the list and say, I'm using this, I mean,
yeah, it's, it's, it's pretty fat chance that Linus is going to override them unless there's
like a really good reason to.
Yeah.
Like it's, you know, causing some issue in something newer that more people are using
and things, things of that nature, I would assume.
Yeah.
Yeah.
Like this is preventing us from like, I don't know,
like using 64-bit memory, like something like that.
Like there isn't anything like that, but it's, yeah.
Or what you said too.
I mean, it's just really, really rare.
So yeah, and actually that's another interesting point.
If we bring the discussion back to SkedX.
So another one of the big challenges that modern maintainers have in the kernel is dealing with something called
UAPI constraints.
And UAPI is the term for the part of the kernel that's essentially the user space visible
interface.
So if you have header files or you can link against them and make system calls or whatever,
or some ABI abi some structure that
has certain byte layout that has fields that mean a specific thing that you can never change i mean
you can change it if lenis lets you but it's really really really really hard to change like
you basically have to assume that you can't change it ever so maintainers are all are very
understandably guarded for new stuff that's being added to api like it's a very very high bar and
even if something is a great feature they might just feel like i can know i don't want to maintain
this it's going to take too much time so one of the nice things about bpf is that um especially
in the modern version of bpf uh because these schedulers or whatever bpf program you load is a
kernel program there's no uapi constraints at at all. And the program can talk to
user space over the maps, the data structures I was talking about. There is a user space component,
but the maps themselves are UAPI. So the structure of how you share this array with user space,
all of that's UAPI. But the actual communication itself isn't, right? You use the array however you want for whatever program
and the actual, whatever, the communication channel
is completely outside the scope of UAPI.
So BPF, one of the things that makes me excited about it
is it's going to, I think it's kind of the way forward
for the kernel to be extended
without having to tie the hands of all these maintainers
that don't want to have to add UAPI,
but still extended and still have something
that actually performs better anyways,
because it's in the kernel.
So yeah, that's just another kind of nice thing about SkedX.
There's one last thing I want to touch on,
but I will be back in just a moment.
Sure. you you Teksting av Nicolai Winther uh i didn't actually stop the recording i'll just cut that bit out yeah that's i figured that that's
fine i i did stop the recording with the last person i
had and then i forgot to upload i forgot to clip on that part that i cut out so that i had to upload
that separately uh we're not doing that this time um okay because people are going to yell at me
again for not uploading the ending um the something we touched on at the start and didn't really
expand much upon past that was you also have the Twitch channel you have
and you're going to start using that to do stuff
yeah, thanks for bringing that up
it's something I started fairly recently
I thought that people might be interested in watching
watching me work on the kernel in real time
answer questions that folks have.
And yeah, it's engineering,
so it's not always interesting to just watch.
But I think it's an element of the tech industry
that doesn't really get much, ironically,
because it's so core,
it doesn't really get as much attention as it should have.
So it's just, yeah, it's a way for me
to just like chat with people,
show people what I'm doing, kind show them how how the kernel works do some deep dives on um on various subsystems like the scheduler bpf rcu stuff like that um and yeah you know just open
open open forum for people to ask questions and uh and participate however they want to
i think having something like this actually is really cool
because it does give people an insight
into the way the kernel actually works,
the way it's developed.
Like, it's not...
It's...
Like, yes, it's C code,
but it's just C code.
Like, it's not this crazy thing that you can't...
that you can't modify.
Like, you can see the code here. Like, it's... If you understand C, thing that you can't you that you can't modify like you can see the code here
like it's if you understand c like you can once you start understanding like what specific variables
frodo what where functions are located things like that like you can like piece it out and start
making sense of it you you absolutely can and i think you know you i think you were right when
you said earlier like this this huge hill to climb when you first start is that you feel like there's this literal mountain of code that you can't even see.
It goes into the clouds, right?
And that's true.
But what you can do if you have somebody that's there to explain it to you, they can explain conceptually what big pieces of this thing are doing.
And then everything else is an abstraction, right?
Like, oh, I want to add a BPF feature.
Okay, I'll tell you exactly what, I'll explain BPF,
I'll do a deep dive in kind of the background of it.
And then I'll tell you what this specific type of object does.
And then you'll realize, oh, there's a lot of different types
of these things that we actually could add
that would be really useful that actually wouldn't even
really require a lot of expertise to add.
And I think, you you know the kernel the community in the kernel is is as uh they're they're they've been around for a while
right like it's it's a lot of like experienced people that maybe they want to retire soon maybe
not you know but it's it is definitely an older demographic and i think it's i think it's important
for the linux kernels community to start to kind of grow into the younger generation as well. And I think, unfortunately, one of the
big drawbacks is that there isn't really this, like, this, this sort of, this sort of vision
into it, right? You kind of have to just jump into the super deep end and figure stuff out,
but it doesn't have to be that way. I'm kind of trying to make it maybe not that way a little bit.
And yeah, you know, I'm not sure, I'm not sure if it'll work out or not,
but for now it's been really fun.
The community's growing and yeah, it's a fun place to hang out.
I've tried to do programming streams before,
but I get way too distracted by people talking.
I can't.
You know, I feel like I have to have i have to be working on
something that's like really easy to reason about like i i can't be like debugging a scheduler
thing although we actually did debug um a kernel bug in real time which was a fun stream um but
yeah i'm the same way i mean it's it's like but that's the fun for me that's the fun of it like
people i'll like talk and do stuff, and then people ask questions,
and we'll kind of go on a tangent for a while,
and then people ask questions about the questions.
But usually when I do a stream at this point,
I'll have some kind of plan for it.
The other day, I showed people how Stack Overflow works
and gave an example of how to write a Stack Overflow.
And so it wasn't really kernel work.
It was kind
of just systems systems oriented you know but um but that was useful it was like it was kind of
organized enough that i i felt like if people were asking questions i wasn't like the whole like big
picture in your head of the system that you're trying to reason about doesn't like get toppled
over which obviously does happen if if uh if you're working on something that's kind of too
complicated yeah yeah well even for me it's not some it's working on something that's kind of too complicated. Yeah, yeah.
Well, even for me, it's not just something that's too complicated.
I just cannot.
When I want to do any sort of programming work,
I need to be like locked in.
I don't want any distractions.
I don't even particularly like listening to music when I program.
Like I honestly want to have earplugs.
That's awesome.
I should try that, man.
Yeah, I get it.
I totally get it.
Yeah, my wife and I had to set up a system where I would put,
I have like an on-air neon sign
hanging outside my office door.
And she's like, okay, like he's on air,
whether he's on air or not.
And yeah, it can be really tough.
So yeah, exactly to your point
like I just I usually
work in areas that I have like a lot of
I feel very comfortable in and it's
usually just an educational thing even more than
like breaking ground
on like a big new thing
no there are a couple other people that
do these like very in-depth streams
like the person who works on
ushe linux
the work to get the m1 m2
the apple silicon max like working that she streams all of that stuff uh we're talking about
hector martin right yeah uh hector and then also we have us helena doing that work oh i haven't
seen any of those streams okay cool um cool. Yeah, so we have like,
there's people doing this kind of work
and there's some people that are really good at it.
And, you know, if you can find a way that makes it work,
if you're in a way that you feel like it's,
it's like useful to people seeing it.
Like I honestly, I think it's worth like just experiment with
and just see what happens with it.
That's the plan.
Yeah, that's the plan.
We'll see how it goes.
I feel like I have more work than...
There's an infinite amount of content is the nice part.
It's millions of lines of code and it's always changing and growing.
So I think it's fun.
Yeah, we'll see.
So somebody like Hector Martin,
I mean, he's so knowledgeable about stuff too that like i think he he literally can hack on like really complicated stuff and like it's
trivial for him and he can talk about it for that i don't think that'll be i could do that in my
stream necessarily right maybe but um yeah mine i think it's if i had to guess it's probably going
to be more educational and um i might i might do like youtube and make make educational videos as well
um that's the that's the the rough plan but you know we'll see we'll see how it works out
so if you want to check that out where can they go to find you they can go to twitch.tv
slash byte lab that's byte b-y-t-e underscore lab um that's probably the best place to start for now and you can do twitter twitter.com slash
bite lab as well awesome yep um i think we've touched on pretty much
uh okay gypsy died uh lovely
this happens sometimes.
Hey, I think I lost you for a second.
Yeah, no, this happened the last time I used Jitsi as well.
I think it's because the call was going...
Yeah, it just hit the two-hour mark.
That's why.
Oh, interesting.
What happens at two hours
apparently it kicks me from the call and now there's two of me which is fun um i was gonna say
i guess we should probably uh end it off now unless there's anything else you want to say
we've sort of touched on everything i wanted to talk about uh i think that was it yeah thank you
so much for for inviting me on the podcast i
had a really good time like lots of really really interesting deep questions so thank you very much
and um yeah i think everybody enjoys watching it i think some of it went a bit over my head um i i
try to keep up as much as i could but you know this is a complex area to deal with for sure
it is yeah it is and uh it's yeah i i'm sure if it went over your head it's because i
it's just there's just no learning this stuff without staring at it for a long time but um
but yeah if anybody has any questions i'm happy to clarify in the comments or
or uh come come ask me on the stream and I can clarify as well.
Awesome.
Um, I guess, uh, we already mentioned the Twitch, but is there anything else you want to shout out?
Anything you want to direct people to, um, let them know.
Yeah.
You know, for now, for now, just, just Twitch and Twitter, the two, I'm like I said, I'm
going to start doing YouTube.
Um, that's the plan at least.
Uh, but for now that's kind of where all the content is going.
You can follow me on Instagram. I haven't posted
anything yet.
And there's also a
Discord channel as well that doesn't have
an easy to pronounce
link. So I don't really know how to tell people to join.
Is that linked on your Twitch or your Twitter? Someone like that?
It is on, yeah. So that's a good call out.
It is on the Twitch.
So just go
there and people can find the link to join awesome oh did the no the other me didn't leave yet
okay i oh there it is okay it left on your side okay now we're good cool
whatever jitsy will be jitsy um is that is that what you want to mention if it is i'll do my outro
uh that is all i wanted to mention yeah thanks again for your time awesome uh okay so if you
want to see more of my stuff the main channel is brodie robertson i do linux videos there six
days a week uh i have no idea what will be out by the time this comes out because this is kind
of getting recorded a bit ahead of schedule I've got like three episodes backlogged
so we out March sometime um if you want to see my gaming stuff I do gaming
streams over on Twitch at a Brodie on games I've probably close to finishing
both games by now so just check what's over there you'll see what's over there
I have a react channel if I watch things on the stream, I upload them there.
Do not expect good content.
Do not expect well-researched content.
Do not expect anything that's worth watching.
But if you would like to see me ramble about nonsense,
which is what I normally do.
It's just less scripted nonsense.
Check that out.
And if you're listening to the audio version of this, you can find...
Rodeo Optics and reacts. That's the channel. And if you're listening to the audio version of this,
you can find the video version on YouTube at Tech Over Tea. If you're watching the video,
you can find the audio on any podcast platform. Search Tech Over Tea. There is an RSS feed.
You will find it. Stick in your favorite app and you'll be good to go give the final word what do you want to say uh keep hacking keep it low level keep it real and uh yeah hope to see everybody in the future at some point awesome see you guys later