Tech Over Tea - Linux Kernel Scheduler Developer | David Vernet

Episode Date: March 8, 2024

The linux kernel is something we all use but have you ever thought about what goes into it, well today we've got David Vernet on the show who has spent quite a bit of time focusing on one aspect, ...that being the scheduler. =========Guest Links========== Twitch: https://www.twitch.tv/Byte_Lab Kernel Recipes 2023: https://www.youtube.com/watch?v=8kAcnNVSAdI LSF 2023: https://www.youtube.com/watch?v=MXejs4KGAro BPF: https://datatracker.ietf.org/wg/bpf/about/ Sched_ext Repo: https://github.com/sched-ext/sched_ext SCX Repo: https://github.com/sched-ext/scx Other Changes 1: https://lore.kernel.org/all/20230809221218.163894-1-void@manifault.com/ Other Changes 2: https://lore.kernel.org/all/20231212003141.216236-1-void@manifault.com/ ==========Support The Show========== ► Patreon: https://www.patreon.com/brodierobertson ► Paypal: https://www.paypal.me/BrodieRobertsonVideo ► Amazon USA: https://amzn.to/3d5gykF ► Other Methods: https://cointr.ee/brodierobertson =========Video Platforms========== 🎥 YouTube: https://www.youtube.com/channel/UCBq5p-xOla8xhnrbhu8AIAg =========Audio Release========= 🎵 RSS: https://anchor.fm/s/149fd51c/podcast/rss 🎵 Apple Podcast:https://podcasts.apple.com/us/podcast/tech-over-tea/id1501727953 🎵 Spotify: https://open.spotify.com/show/3IfFpfzlLo7OPsEnl4gbdM 🎵 Google Podcast: https://www.google.com/podcasts?feed=aHR0cHM6Ly9hbmNob3IuZm0vcy8xNDlmZDUxYy9wb2RjYXN0L3Jzcw== 🎵 Anchor: https://anchor.fm/tech-over-tea ==========Social Media========== 🎤 Discord:https://discord.gg/PkMRVn9 🐦 Twitter: https://twitter.com/TechOverTeaShow 📷 Instagram: https://www.instagram.com/techovertea/ 🌐 Mastodon:https://mastodon.social/web/accounts/1093345 ==========Credits========== 🎨 Channel Art: All my art has was created by Supercozman https://twitter.com/Supercozman https://www.instagram.com/supercozman_draws/ DISCLOSURE: Wherever possible I use referral links, which means if you click one of the links in this video or description and make a purchase we may receive a small commission or other compensation.

Transcript
Discussion (0)
Starting point is 00:00:00 Good morning, good day, and good evening. I'm Azulija host, Brody Robertson, and today we have... I think you might be the first kernel contributor I've had on the show before. Probably. Oh, cool. Definitely the first meta employee, that's for sure. Welcome to the show, David Vinay. How's it going? It's going well. Thank you so much for having me.
Starting point is 00:00:21 Yeah, glad to be the possible first kernel contributor. In terms of the order of titles, i usually put that one before meta employee but uh but they're both they're both true yeah but yeah glad to be here yeah i was just confused why you reached out to me like because you know most of the time i will reach out to people about things hey like oh you want to come talk about this usually people don't come to like a giant essay about the things they want to talk about Well, yeah, I mean you have a sizable audience I actually found your channel because you did that video on the the six eight release cycle and how like Linus's power went out and there was that regression in the scheduler because um
Starting point is 00:01:01 There was some like firmware issue with AMD chips or the frequency governor. And so, yeah, I have, like I've, I mentioned to you on the email, I have a Twitch channel, twitch.tv slash ByteLab that I'm trying to grow an audience for. So I thought, Hey, yeah, this guy's obviously following kind of the low level stuff. So it might be nice to come on the podcast and chat. Yeah. I'm, I'm excited to do this. So we'll see where it goes. Whilst I certainly have an interest in that side of it as well, my knowledge is fairly surface level. I'm not a kernel contributor myself. I will dig through the main lists and see what's going on.
Starting point is 00:01:37 When there are certain terms that come up, like referring to, I don't know, Skeddy XT, for example, I'll go search documentation, find out what that's all about. But for the most part, like, you are much more in the weeds than I will ever be with this situation. So I guess probably the best place to start is what is it that you mainly focus on? Because it seems like there is definitely a trend with what you are doing. focus on because it seems like there is definitely a trend with what you were doing. So the first thing I focus on when I first started working on the Linux kernel, which was not the first kernel that I worked on in my career, I started working on live patch
Starting point is 00:02:13 because at Meta we had an issue where when we were rolling out live patches, and for those of you who are watching, a live patch, if you're not aware, is a way that you can patch kernel text at runtime to fix bugs. For example, if you forget to drop a lock or you have a memory leak or something like that. And we were noticing that when we did that TCP retransmit events, we're going way up for like the few seconds it took to do the patch. So anyways, I noticed that and I worked on that for a little bit.
Starting point is 00:02:40 I fixed that bug and that was kind of my test to get into the team. And since then, I've been focusing almost exclusively on the scheduler and bpf and my day is usually a mix of um adding features to sked x running benchmarks and trying to tweak things um to make it better uh i spent some time also in bpf like i mentioned on the the standardization for bpf i'm one of the two co-chairs so reviewing documentation uh which is which is fun but yeah you know it's it's as far as engineering gigs go it's it's very it's very like engineering focused like i don't really have too many meetings which is nice and uh it's usually just yeah just hacking on on the scheduler bpf well for most i assume a lot of people watching
Starting point is 00:03:27 this do have some sort of like technical background but i don't know how many people really know about like these internals of how a kernel works so what is the scheduler and what does that actually do sure yeah so when you think of a system you what are you doing on your computer you're on a web browser you're on a word, you're doing a bunch of things at the same time. But the resources of your system are finite. You only have a certain number of cores or logical CPUs. And so the job of the scheduler is to decide who gets to run where and when. For example, if you have two cores and you have your web browser and your word processor, maybe the scheduler will say, oh, these guys both get to run in parallel on these two cores. But in reality, obviously, there's way more threads than that in the system. So the scheduler decides who gets to run where.
Starting point is 00:04:18 It's a complicated problem because you have to deal with hardware issues. Like if you have a thread that's been running on a core, you probably want to keep it there if you can because it might have better cache locality so its accesses will be faster because on the chip there are these small really hot caches that it wants to read from um but if you keep it on the core for too long then you might have another core that that could have run that thread that's just sitting there idle not doing anything um so you know that's sort of the problem space. And then also with the default scheduler, especially EEVDF and the kernel, fairness is a big problem. So you want to give everybody kind of their fair slice, their fair share of CPU. So how do you kind of balance all of these heuristics while also making it generally
Starting point is 00:04:57 fair? That's kind of the goal. So the way the, from the documents that you sent me, previously up until what, 6.4, 6.6 something in that area, the scheduler was the completely fair scheduler, the CFS scheduler. And now it's using this EEVDF, the earliest eligible virtual deadline first. So, which is a really long name.
Starting point is 00:05:22 I'd see why you said EEVDF. It's a lot easier to say. Right. Which is a really long name. I see why you said E-V-D-F. It's a lot easier to say. But what is like, I know obviously explaining the intricate details of how they are different
Starting point is 00:05:32 will probably take you the entire episode, but at like a surface level, what is fundamentally different about these two approaches? So yeah, that's a great question. I think if you get into the weeds, it definitely takes a while to explain, but I think you can, you can think about the scheduler like this. So if you have one CPU and you have all these threads that want to run on it, the basic idea is you want to count how much time each thread has run on that CPU. And
Starting point is 00:05:59 you want to give the thread that has run the least amount of time, the next slice of time to run. the thread that has run the least amount of time the next slice of time to run and that's called v runtime it's like virtual runtime um and that value so so if anybody's ever heard of uh thread weights or thread niceness the way that you apply that is you scale how much run time you accumulate as the thread runs depending on its weight so it's inversely scaled so if you have a really high weight you divide how much time you're accumulating by that. And essentially, that's kind of the idea. It's fair because you're giving whoever has run the least amount of time scaled by their weight the CPU next. So that's really about bandwidth allocation, like who gets to run, how long do they get to run, etc.
Starting point is 00:06:42 But there's another problem in scheduling called interactivity, where you want to be able to give applications that have latency sensitive requirements. If you're going to play a game and you're rendering some frame, you probably need to render the frame quickly or else it's going to look jittery. Same with calls and everything like that. And so there was stuff built into CFS stuff built into cfs to enable kind of more interactive workloads to be to be given the cpu more more easily but the core difference with evdf and cfs is that um this deadline eligible deadline that you mentioned that's kind of where the interactivity comes along and i'll try to give like a really brief overview i'm still confused by this so if you're watching and you're confused don't feel bad. But the idea is you have the same V runtime that you had with CFS, which is used again to
Starting point is 00:07:29 count how much time you've run for bandwidth to see who gets the CPU next. But in addition to that, you have what's called a deadline. And if you want to run for, let's say, 20 milliseconds, your deadline would be however long you've run, your vrun time, plus 20 milliseconds. And the idea is the scheduler schedules whoever has the earliest deadline first. So if you have really short windows where you run, you only run for like 100 microseconds, your deadline is way sooner. So the scheduler is more likely to pick you first because it's not just how long you've run, but it's also like, when is your deadline, so to speak.
Starting point is 00:08:03 And that's kind of more the interactivity part of it um there's a really good lwn article that that explains it um and maybe more intuitively than i am because the lwn editor is has been doing this for quite a while um but yeah that's kind of what i would say like the highest level explanation so i guess with cfs that came from a time when was it it was was like 2006, 2007, somewhere in that range it was added to the kernel? You had cores that were much more homogeneous. So you had, you know, quad core systems that each had the same cache topology. You didn't have as many NUMA machines. And migrations were usually more expensive because the cores were spaced further apart.
Starting point is 00:08:58 And so you mentioned this project SkedX, which allows you, I'm sure you're going to go into this a little bit. It allows you, yeah. The TLDR is, yeah, CFS was in a very different time, in my opinion. Hardware is way more eclectic, let's say, than it used to be. So scheduling is more important than it used to be as well. Right. And I guess the kinds of workloads that we do on Linux nowadays are also fairly different as well. Just a common example, gaming, for example,
Starting point is 00:09:22 that wasn't really a use case for Linux back in 2006. There was a couple of open source games, but not like we have today. Right, no, it was a meme back in 2006. And now it's like we have the Linux Steam Deck. So Steam is like dead serious about Linux on gaming on Linux. And yeah, I mean, it's a big scheduling problem. There are people that are working on SkedX
Starting point is 00:09:46 that are looking closely at that problem specifically as well. And to give you maybe a concrete example, so I don't know if you've ever played Factorio or if any of your viewers have ever played Factorio, but it's a game where you have to build this huge factory that does all this stuff in parallel. So it's a very like parallel heavy game
Starting point is 00:10:03 where you have to have a lot of computing power. um i think it was a non-tech did a benchmark where he ran on the the 7950xd as i think the the amd cpu that has um it's kind of wild it has a v it has a 3d v cache sitting on top of one of its two uh l3 caches so if you imagine there's two different groups of cores on the CPU, there's a cache sitting on top of one of them, which means that that set of cores has better memory access. There's more cache around it, but it has to throttle itself more often because heat is actually trapped by that cache. So it's a really crazy scheduling problem where on the one hand you have better locality and the other you have better CPU. And on Windows it ran like way,
Starting point is 00:10:49 way better than on Linux because, you know, the Windows scheduler is kind of, I guess it was more amenable to this type of workload. And so that's the kind of thing, that's an example of the kind of thing that we can do better in the modern age. So besides the interactivity problem, why would someone even care about changing the schedule? If it's this generic scheduler that just works well enough, like what sort of improvements would someone want to make to that to better suit their workflow? So, well, yeah. So if you're talking about like the average person that's just using Linux, you probably...
Starting point is 00:11:28 Definitely not, yeah. Yeah, well, they're probably going to be just fine with the EVDF for the default, but there's a few different types of users, I would say. So I'll give you an example from Meta. So we, with HHVM, and if I'm going too deep into like the crazy weeds, just let me
Starting point is 00:11:45 know um but with before you move on hhvm what is that one uh hip-hop vm so that's the that's the php jit engine that we use in meta to run our web workloads um if you laugh when i said php that's totally fair but this is um this is a new type of php that's statically typed and has tons of optimizations for JITing. So it's actually really fast now. But one of the interesting things about JIT engines and compilers as well, actually, is that they have really, really bad instruction cache locality. Which means that they're not really doing the same code a lot in a row. They're going to this branch and this branch, and then they're compiling this code over here, especially with JIT engines. And so they have really, really poor front end CPU locality is the term for that. And that means that they also have really poor IPC,
Starting point is 00:12:35 which stands for instructions per cycle. And so a lot of the time when you're writing system software code, you want to try to use the CPU as efficiently as possible so that it can pipeline things and do a bunch of things at the same time. But with something like a JIT engine, it's really hard to do that because it's just not really possible if you're basically having to decode instructions every time you're doing something. So in such a scenario, something like CFS, which is quite sticky because again, it was built in a time when you had um when you had a cores that were further apart and it was more expensive to migrate it doesn't really lend itself very well to that philosophy of stickiness because um you actually just want to throw that thing onto a cpu and just just let it go like you want to be able to run this thing as fast as possible maybe keeping it
Starting point is 00:13:18 on the same cpu for cache locality like might be okay but if you have a cpu and you're waiting around then you should just throw it over there so um that's i sent a patch set for that actually upstream that hasn't been merged yet but that's an example of like where we want to make the scheduler more work conserving is the term for that so it's it's it's just doing it's it's erring on the side of doing more work as opposed to kind of improving locality or these kinds of things um and there's i mean there's so many things like we have, we have a ton of SkedX schedulers already that are all, that are all cute and eclectic in their own ways. Um, and I can certainly give you really interesting examples if you're,
Starting point is 00:13:53 if you're interested in more. Yeah. If you have some, before we get like deep into what SkedX is specifically, if you want to give those examples, we can do that. Sure. Yeah. So here's another one. So, um, so VMs are interesting interesting. If you're talking about a VM, the way that the scheduler views a VM is by what are called vCPUs, so virtual CPUs. And so in the guest operating system, you have obviously whatever threads have spawned in this guest OS. But from the perspective of the host, the threads of the VM are just its actual CPUs that are running, which kind of makes sense if you think about it, right? Because the guest OS has CPUs, it's scheduling stuff on those CPUs, but it's the actual host OS that decides when those CPUs get to run, right? It's multiplexing
Starting point is 00:14:34 the physical CPUs to these virtual ones. And that's fine if you're working on like an overcommitted environment, which is obviously not uncommon at all for VMs. But for a lot of workloads like on AWS or on a lot of of cloud providers you can imagine that it actually might be better to give us a vcpu uh an actual physical core and just turn off timer interrupts basically do everything you can so that you never interrupt the guests at all um it's a little it's pretty expensive to exit the guest it's called a v VM exit and it's there's hardware is doing a lot of stuff It's it's not cheap to do so you want to try to avoid that so You could for example build a scheduler where all scheduling decisions are made from a single core where you're not running a guest vcpu
Starting point is 00:15:15 And you just let the vcpus burn on the core you're not no timer interrupts nothing that would pull it out of the the guest and If you need to actually do a resched you're like oh something needs to run there there's like a k thread a kernel thread that needs to do some io or something like that then you can send what's called an ipi inter processor interrupts and there's a specific one called a resched ipi it's designed for making it do a resched and you send it from the one core that's actually doing the scheduling and kind of organizing everybody and that works right because you don't really need to do very many scheduling decisions in real time like you would with a normal scheduler and you also you kind of organizing everybody. And that works, right? Because you don't really need to do very many scheduling decisions in real time, like you would with a normal scheduler. And you also, you kind of take the scheduler and the host out of the way of the guest. And you can
Starting point is 00:15:53 actually have really big speed ups in cloud environments by doing that. So when you're dealing with these, like dealing with schedule, when you're, when you're at the point of dealing with scheduler problems, you're sort of at this point where you've optimized the code and you've got this giant deployment. It's like, okay, how do I further optimize it from here? Like, how do I get the absolute most out of the hardware I have? You probably wouldn't approach the problem anywhere before that point.
Starting point is 00:16:23 Yeah, so, yeah. that's funny you should ask that because before I was doing this kind of work, I kind of was like, what is left to do? Like, for example, something I hear a lot from people is like, oh, kernel compile is like optimized completely. There's nothing left to do for kernel compile, which is untrue. We could do better for kernel compile, believe it or not.
Starting point is 00:16:43 And so how do you go about fixing or improving something that's been like banged on repeatedly and relentlessly for so long? Well, the good news is hardware is extremely, extremely complex, which gives people like us a job. And so to give you an example, earlier I was talking about the front-end CPU pipeline and how that is sort of really slow when you're doing like a JIT engine, JIT workload. Well, to go into a little bit more detail to kind of make
Starting point is 00:17:10 that make sense, one of the parts on Intel CPUs, at least, one of the parts of that is called the DSB, the decode stream buffer, which takes instructions like x86 instructions and will compile those into microcode, into into RISC instructions, which are actually what run on the back end of the CPU. And it'll cache those so that it doesn't have to do that decode every time. And that's one of the things that thrashes a lot if you're doing a JIT engine. But that thing, this decode stream buffer, if you look at the Intel CPU for the Skylake microarchitecture, it's like, okay, if you have three, it's actually like this well specified or like this no logic specified.
Starting point is 00:17:50 If you have like three unconditional branches within a 32 byte window, then like you always, you always have to use a new entry in the DSB. And there's all these like really particular hardware specific things that have nothing to do really with the workload. And there are times where like, you might pad out your code by like a little bit, and then you get this huge, like 10x speed up on your on your program. So that kind of thing, you know, is is always there's huge, I think, I mean, a lot of the time, there's huge gains to be made from that. Now talking about the scheduler specifically. So we've talked about sort of like hardware,
Starting point is 00:18:29 hardware related stuff, but there's also a lot of software related things that we don't really take into account now with CFS or EVDF, but in the future, and I'll talk about this more when we get to SkedX, but you can imagine if you have like a service where you have one thread doing a lot of IO that's reading an RPC message or remote procedure call message. So I got a request, you know, like you want like a cat photo. I'm going to take your message, demarshall it into something that I can run on the server and then pass it off to like a worker thread to actually do the work. Well, you probably want to put the worker thread on the same core as the IO thread because
Starting point is 00:19:02 the IO thread just pulled the whole message into cache, right? So it's hardware related still, obviously, because it's still talking about caches. But it's really kind of, you have to understand the application. It's more of like a, you're kind of getting a little bit higher level than the kernel even at that point. And so I'll stop monologuing in a second. But in general, when I'm just going in to attack a problem, I'll usually first look at IPC, instructions per cycle, which is kind of a good metric of how well are you using the core. And then I'll start to look at what else is kind of going right. If it's lower than I expect, all right, where are we stalling? Are we stalling in memory loads? Is IO the bottleneck? Are we just compute bound? We just need more cores. And it's sort of a process of elimination from there
Starting point is 00:19:49 so one thing you did touch on earlier in there that i want to get back to is it sounds like when you're dealing with these scheduler problems it's going to be very platform specific like if you are dealing with like aer problems it's going to be very platform specific like if you are dealing with like a skylight cpu if the architecture changes a bit with the next generation you might want to restructure how that scheduler is being used to better suit that specific platform you move to yep that's right so a lot of it is compiler stuff, like compiler level. There are a lot of bugs you'll see like with LLVM or with GCC where it's like I was saying, oh, if we pad this out a little bit more when we emit the code, it's much better performance on this microarchitecture or
Starting point is 00:20:35 that one. But it's absolutely also a scheduling problem. And a good example of that would be AMD, I would say. On one of the earlier Zen microarchitectures, you had what's called an AMD Roam. And the AMD Roam had a very different latency distribution for accessing memory outside of your L3 cache compared to the next generation, which is called Milan. And now the modern generation is called Bergamo. And things are just getting huge and huger and huger. now the modern generation is called Bergamo and things are just getting huge and huger and huger. So, you know, with all of these things, like for what I was saying about like aligning text and that kind of thing, that's really more of a compiler problem. But, but the point is, right,
Starting point is 00:21:13 like all of these little things that there's so many details like that in, in a, in a CPU and, you know, you have to just like every part of the system kind of has to play, but the scheduler is a really, really big part of it. Yeah, especially for something like these big Bergamo machines and AMD that have like hundreds of cores and stuff like that. So I guess we probably should get into Sked EXT then. I'm sure there's going to be a lot of terms we need to go over as we do this. No problem.
Starting point is 00:21:44 At a high level, what is Sked EXT and what problem is it trying to solve? Okay. So Sked EXT is a new pluggable scheduling framework. And the idea is it lets you implement your own scheduling policies, your own host-wide scheduling policies, and implement them as what are called BPF programs now first of all when I say host wide what I mean is these are threads if you are running in CFS or eVDF you instead bring them over to your scheduler if you have like what are called real-time threads or deadline threads these threads that are running these higher priority scheduling classes those don't those don't run in your scheduler.
Starting point is 00:22:25 They stay in the kernel. But you migrate all of the other default threads to yours. Now, so that's the host-wide part. The BPF part. So BPF stands for a Berkeley Packet Filter. And that originally, if you ever heard the term, you may have heard it in terms of packet filtering, as you might imagine, which was added to the kernel quite a while ago. But since then, something called what many people call eBPF, extended BPF, has been added, which is completely different. And that lets you run a JIT inside the kernel, which is insane when you think about it. But there's actually an instruction set for BPF. There's BPF-specific encodings for instructions. There's a backend for the LLVM compiler, where if you have C code, it'll compile the C code into BPF instructions. And then
Starting point is 00:23:11 at runtime, throughout the runtime of the kernel, the kernel will read that BPF bytecode and emit x86 instructions or whatever architecture you're running on to run the actual program natively in the kernel. And there's a couple of things that are special about that. So first of all, kernel modules also do something similar for anybody who's heard of those before, but they're very different than BPF programs for a number of reasons. For example, BPF programs can't crash the kernel. The kernel will statically analyze the BPF program in a component called the verifier. And if the program could be unsafe, like you're reading a pointer that you shouldn't read,
Starting point is 00:23:47 or you're not dropping a reference, like you have a memory leak or something like that, it won't let you load the program. And it also has all sorts of ways of interacting with user space from your BPF program. Like there are these data structures called maps where you can have in real time shared memory that you can write and read from both user space from your BPF program. Like there are these data structures called maps where you can have in real time shared memory that you can write and read
Starting point is 00:24:07 from both user space and kernel space. And so obviously there's a lot more to say, way more to say about BPF. But the basic idea is it's a safe environment for running dynamically loading and running programs in the kernel. And SCEDx is a framework that uses BPF to implement host-wide scheduling
Starting point is 00:24:25 policies that are also safe and can't crash or can't hang the machine either. Now, as far as what problem is it trying to solve? Well, so EVDF, CFS, these are general purpose schedulers. They do really well for what they do, right? They're general purpose. They're fair. They've been worked on for many, many years and they're, they're fair. They have a lot of, they've been worked on for many, many years and they're very well optimized. But there's a few drawbacks. For one thing, I don't know for any of the viewers who have ever done kernel work before,
Starting point is 00:24:55 you know how much fun it is to compile a kernel, reinstall it, reboot it. And then you have a bug where you crash something or you corrupt your disk or your file system. And you're like, great, I have to do all this again. So the safety aspect is really nice. You know, for a BPF program, you recompile it. It takes like two seconds to compile.
Starting point is 00:25:12 And then you just rerun it. And the kernel loads it, starts running it for you. It loads it in, does everything under the hood to transition the whole system to using it. And it just runs. So, you know, for meta, if we're, if we're like running an experiment on, on thousands of hosts, it's, it's just not even an option for us to do like this iteration where we're, we're loading a kernel onto thousands of hosts, waiting for the caches to warm back up, then doing measurements and oh, a crash or like what's different about this and whatever. So,
Starting point is 00:25:38 so it makes it as simple as like, you know, testing a regular user space application. Oh, you just compile it, run it, it just goes. It's literally that easy, yes. And it's, I guess, a little bit different because it has host-wide implications, but in terms of the iteration time, yeah, it's exactly what you just said. And so the other big problem it solves is that you do leave a lot of performance on the table
Starting point is 00:26:04 for a general purpose scheduler in certain scenarios. And so, you know, for us, we have metas as an easy example. We have a lot of large services that are kind of monoliths, like web, stuff like that. And so there's just too much scale for us to kind of leave the scheduling on the table, the scheduling benefits we can get on the table. And, you know, this allows us to build scheduling policies that just aren't appropriate even to be merged into a general purpose scheduler. So things that like would never be able to get upstream, we can build them in SkedX as well and use them internally. And then, yeah, the crazy ideas like the, like the, like cloud computing thing that I was talking about earlier, that stuff as well, it enables you to do that.
Starting point is 00:26:50 So you could make your own scheduler even without something like this. It would just be a lot more of a slow iteration process that would just not be suitable, especially in cases like this. You could. I wouldn't recommend it because the API for building these schedulers in the kernel is very complicated. And it requires you to understand kind of the core logic of the scheduler.
Starting point is 00:27:16 Like callbacks will be called in different contexts. And you have to understand what context you're being called in for something to make sense. So you can do it like Google's written some schedulers like lots of companies have but if this is something that you're interested in I I would I mean I'm biased but I really wouldn't recommend doing that I would recommend looking at skedx it's going to be a lot easier um the callbacks are like we also tried to make the callbacks and the API like much more kind of intuitive and reflecting the policy instead of kind of the system around it, right so
Starting point is 00:27:48 Is there some sort of performance overhead of this approach obviously is going to be some but like is it a Obviously if you wrote your own you would be you know directly interacting with it. You wouldn't have this extra thing here Maybe I'm explaining this badly but like what sort of overhead does come with SkedXT? There's a much better way to say it. Oh, that's a really, really good question. And there is an overhead. So when you go with something like with a BPF scheduler,
Starting point is 00:28:18 you have to take the overhead of going through the BPF interface, which means doing like indirect calls and stuff like that. So there's certainly an overhead to doing it in SkedX. It's really minor from what we've seen. It's like a couple tenths of a percentage overhead relative to just using a native scheduler. Now, sometimes it's pretty hard to get over that hump,
Starting point is 00:28:41 depending on what you're doing. And certain things, you know, something like EVDF is like super well suited for it. So there's just no point in even trying unless you want it to build it in SkedX itself. But yeah, it's a couple of tenths of a percent. So, pretty low. And the reason it's so low,
Starting point is 00:28:57 something I should probably make clear is that BPF is not a user space framework. Like when you implement a scheduling policy in BPF, the kernel is actually calling directly into your program and staying in kernel space. You can build user space components on top of that. And we actually do have schedulers where we have like load balancing done in user space, but the hot paths, everything stays in the kernel. There's no, there's no like handshake with user space or anything like that. And so the overhead is really minimal. And the trade-off obviously works out quite well in a lot of scenarios. For anyone unclear about it,
Starting point is 00:29:31 what does a load balancer do? Another great question. So I'll try to give a quick overview of this as well. So if you imagine load is, in simple terms, load is just how much stuff is running on a system. So if you have two threads that are always runnable, then you might have load of 200 because the default weight for a thread is 100 and load is weight times basically how long the thread can run for. Now, if you imagine a really complex system, which obviously most of them are, even if your machine is sitting idle, there's K threads running and all this stuff.
Starting point is 00:30:04 The goal of a load balancer is to balance load across the system. And the thing I said earlier about EVDF where you have this V runtime per core, where you count how much time each thread is run, that's from the perspective of a single core. So each core has its own run queue and has its own counter V runtime. And so within a specific core, everything is fair. But when you go between cores, and especially when you go between what are called scheduling domains, so like between cores that are grouped into L3 caches, at that point, you have to use
Starting point is 00:30:33 this sort of higher level view of load to try to balance the system. And that's kind of what, yeah, that's what the load balancer is doing. Right. So I don't know where I was going to go with that actually well so I can go into a little more yeah if you want to
Starting point is 00:30:50 I had something Garrett dude it's so complicated so yeah like okay you imagine you have two cores four threads and three of the threads are running you know a third of the time actually let's keep it simple they all run 100% of the time and running a third of the time. Actually, let's keep it simple.
Starting point is 00:31:07 They all run 100% of the time, and they all have the same weight, and they're all in one core. And that means that one of the two cores has load of 400 because you're just adding it up, and the other one has load of zero. And the goal is to distribute this load evenly, right? Every thread should be getting its proportion of compute capacity relative to its proportion of load. And so what I mean by that is if the total load in the system is 400 amongst these four
Starting point is 00:31:29 threads, each of them have load of 100, which again is weight times how long it can run, weight times duty cycle, then they each get a hundred over 400 equals a quarter of the compute capacity in the system. And so there's two cores. So each of those cores should get 200, should be responsible for 200 load each. And so the load balancer would say, oh, there's 400 on this core, zero on this core. They should each have 200. So I'm going to move two of the threads over and now they each have 200 and the system is fair. The system is balanced. That's essentially what it's doing. So, okay. So the load balancer is there to make sure the work is distributed across the different threads and then the scheduler is there to make sure the work that is there gets a suitable amount of time
Starting point is 00:32:12 for those individual tasks. Yeah. So the scheduler is both parts of it. The scheduler's job is to both distribute load amongst cores and also to ensure fairness on a specific CPU or interactivity. If you look at the actual scheduling code, the load balancer is kind of in its own thing. You do it after some amount of time or when a core is going to go idle, it might pull load onto the core. But both of those things are certainly part of the scheduler. For example, know, for example, like the scheduler has to scale to like 1000s of cores for some huge machines. So you'll accumulate load within a specific within a single core. And then when you load balance, you'll sum the core, sorry, you'll sum the load between them, you know, from one core, whichever one is load balancing. So yeah, it's it's confusing, because it's sort of they're both kind of related to fairness, but they're very
Starting point is 00:33:02 different ways of looking at it. And there's very different problems with each of them. But both of them are the scheduler. You mentioned that someone could implement the load balancer in user space. Why would somebody want to do that instead of doing it in kernel space? So that's another great question. So there are advantages and drawbacks of doing it.
Starting point is 00:33:22 So if you do everything in the kernel, the advantage is, of course, that you don't have to go to user space, right? you do everything in the kernel, the advantage is of course that you don't have to go to user space. You do everything in the kernel. It's all fast, it's all right there. But the kernel has some enormous drawbacks. For example, you can't do floating point math in the kernel.
Starting point is 00:33:35 You have to do only fixed point math. The registers for doing floating point aren't used in the kernel. If you wanted to do like load load balancing i was saying you know we're doing division right we're talking about the proportion of load that one thread is using across the system and so like this is this is like a floating point like everything is done in percentages and fractions so um it's it's nice to do it in user space because that's kind of the component that's really complicated and the complexity of doing it in the kernel you probably could do it in bpf um but but you know you're only running the load balancer like in this and one of
Starting point is 00:34:10 the schedulers we have you only run it once every two seconds you don't really need it to be in the kernel right um and you know you can do you can really go crazy like balancing load is one thing but you could look at um you could look at like, if you have asymmetric CPU capacity, like one of them, like the one that I was talking about earlier, the 7950X3D, where you have the V cache, and then you have the other thing, like all of these things, you could just, you can model in whatever way you want. You can do machine learning from user space and make predictions. You can classify like what thread, what quality of a thread would maybe suit it better for one domain or the other. So to summarize the core algorithm, you probably could do it in user space, but it's just very
Starting point is 00:34:52 limiting. It's a very difficult environment to program in. Oh, you mean you could do it in kernel space? I said, yeah. I meant, sorry, that you could do it in kernel space. Yeah. But user space is easier. Yeah.
Starting point is 00:35:02 Thanks. Yeah. Okay. I'm sure most people have heard the terms user space and Yeah. But, but user space is easier. Yeah. Thanks. Yeah. Okay. I'm sure most people have heard the terms user space and kernel space before, but we should probably just quickly explain what that is as well, along with why there is this issue with like swapping back between them and why that comes with some sort of performance degradation. Sure. So user space is the part of the computer that you're using. Like when you're using a computer. Like when you're in a web browser, that's a user space program.
Starting point is 00:35:27 When you're using SSH, that's user space. And the idea with user space is every process has its own virtual address space, right? It has its own kind of virtual fake view of memory. And that's uniform for every process in the system. Your job is to do something, whatever the program is doing, and that's about it. Excuse me, kernel space is the component of the system that manages all of that stuff. So, you know, in reality, memory is not virtual, right? Memory is physical. You have some amount of RAM on the system. And the kernel has to map, is the term, virtual memory to this physical this physical
Starting point is 00:36:05 memory this ram um you you have something like the current the scheduler excuse me where you're where you're deciding which which threads which processes get to run on which cores you know this is something that has to kind of be in the core of the system and so if you imagine that user space everything is its own process its own application the kernel is like that the kernel has its own monolithic address space every thread in the in the kernel is is in the same little sandbox but that sandbox is like the management of the system it's it's distributing resources it's it's multiplexing things on on finite resources um and it's it's the privilege component right you wouldn't want one malicious thread to be able to give itself
Starting point is 00:36:45 all of the runtime in the scheduler. So yeah, I mean, that's a very high level description, but hopefully that makes sense. And sorry, you also asked like about the transition between the two. Yes, yes. So when you go between user space and kernel space, that's an operation where you're changing address spaces,
Starting point is 00:37:04 you're changing privilege levels, all this stuff, right? Like when you're in, so I don't know if anybody ever heard of these horrible vulnerabilities called Spectre Meltdown that happened a few years ago. Yeah, those were fun. I hope they're proud of that. Yeah, so for taking Meltdown, because that's a pretty easy one to talk about.
Starting point is 00:37:22 So that was a bug where, so the kernel memory is protected, right? Like if you have a user space process, it shouldn't be able to look at the memory of another one. Like that's a secret, right? Like that would be a bug if you were able to read some remote processes memory, which makes sense. And so when you go between user space and kernel space, a lot of things are happening in hardware that change the execution context. Your registers, the user space registers are being saved on the stack. You're changing what the instruction pointer to point to somewhere in the kernel. You're loading kernel registers.
Starting point is 00:37:53 You're probably going to change to a kernel thread stack as well. You have to copy memory from user space. Again, you're changing privilege level. So all this stuff, that's called trapping into the kernel. That's the term for it. All of this stuff happens every time you's called trapping into the kernel. That's the term for it. All of this stuff happens every time you go back and forth between the two. And Linux is a monolithic kernel, right? So like when you trap into the kernel, there's a lot of layers to go down before
Starting point is 00:38:15 you get to the scheduler, for example, it's pretty core part of the operating system. So if you were to say, okay, well, who's going to run on this CPU next? I don't even know if it'd be possible to do this. But if you imagine before you make that decision, you schedule your user, actually, it would be possible. You schedule your user space process that's running on this CPU. You schedule it and it goes, okay, who's going to run next? These guys, this guy's, okay, we'll run this person, this thread next.
Starting point is 00:38:43 And then it traps back into the kernel. It goes back into the scheduler and it says okay this is the one to run next and that's the one that you put on the core so that's like that's like way more overhead than if you just look into a kernel space map right it's you're talking like orders of magnitude more overhead to do it that way um there actually is a sked x scheduler that somebody at canonical is working on that's doing really really well because he's he's he's been able to really push it pretty far but in practical terms there's there is a lot of cost doing that but then the issue you have with doing things in kernel space is you can cause serious issues like, because it is a monolithic kernel, things might, you know,
Starting point is 00:39:29 really bad code can take down the entire kernel. Yeah, yeah, really bad code can. It turns out the code that we thought wasn't bad can also do it, unfortunately. Now, within a SCEDX program, within a BPF program, theoretically, you're not supposed to be able to take the host down. And if that would happen,
Starting point is 00:39:44 then it should fail to even be loaded in the first down. And if that would happen, then it should be, it should fail to even be loaded in the first place. But for sure, to your point more broadly, absolutely. It's a big kernel space is like, it's a tricky, it's a tricky place to be doing programming. And it's good, you know, within reason, it's good to try to push complexity out of the kernel when you can. And, you know, if you look at a lot of the kernel, the kernel algorithms for how it implements stuff,
Starting point is 00:40:06 the heuristics are often probably more simple than you would imagine. For prefetching IO, I haven't looked at it in a while, but I think it was a static amount that you prefetch. There's no tracking. I might be wrong about this, but the last time I checked, I think that was what it was. I don't think there's any tracking of how much are we reading? Oh, we've been reading a lot. Well, we should prefetch more because we're expecting it to be reading this whole file. So all this stuff that you could do with like math and kind of more complicated reasoning
Starting point is 00:40:33 and models about how things work, you really don't see that very often in the kernel. And the scheduler is actually probably the most complex part of the entire kernel in that regard. Like how much it has in the kernel directly. But yeah, in general it's it's uh it's it's not a great place to to be doing that you said theoretically the bpf program shouldn't be crashing the kernel i'm i think it's fair to assume that there were some issues along the way where they were crashing the kernel yeah sure i mean it happens you know it's especially like if there's a new release, and there's some big feature that like some, we haven't seen a corner case, it happens. So far, we haven't, I'm going to knock on wood, but we haven't had any like big issues rolling it out to meta. And, but yeah, you know, we were working on it, people are finding stuff, the community around the project has actually grown a lot in the last few months, which has been really cool. And with more eyes on it, you know, somebody, for example, opened a bug today because on the stable release of the
Starting point is 00:41:30 kernel, if you try to use what's called control flow integrity, this feature called CFI, which basically makes sure that you're always calling a safe function in the kernel, that it would crash the SCEDX. And it was because some patch set that was never merged to the actual stable kernel, which it probably should have been, we didn't know know that it wasn't merged and so we just told people like i just don't use the stable release because nobody's really using it that that often anyways um but but yeah yeah absolutely there's i think uh anybody who works in any part of software that tells you that there's never problems is being a little bit disingenuous. I don't know how I didn't ask this before,
Starting point is 00:42:11 but how long has SkedX been a work in progress for? It's been a work in progress for about two years at this point. Another engineer at Meta, it was kind of his brainchild and he worked on it for about six months and then I came onto the project shortly after I joined the team and did the live patch stuff.
Starting point is 00:42:28 And it's been what I've been working on ever since. Yeah, so it's been a good amount of time. Relative to a lot of other huge open source projects that are contentious to get merged upstream, it's not the longest by far. The the preempt RT patches is what everybody talks about that's taken like 25 years to get merged. We're hoping that's not gonna be the case. But it's also been quite a while
Starting point is 00:42:54 that we've been working on it, iterating on it, building the community around it until, and then obviously after it gets merged to the main Linux repository. One thing I noticed in one of your talks that I don't think you touched on in the talk was the GPLv2 requirement. So yeah, so GPLv2, that's the license that the kernel is licensed with.
Starting point is 00:43:19 And another BPF feature is it'll look at the binary of the BPF program. And if that program is not licensed with GPLv2, which is look at the binary of the BPF program. And if that program is not licensed with GPLv2, which is emitted in the metadata for the program in the binary, then the verifier will just fail to load it if it's a SCEDX program. That was the reason. I mean, we added that because we wanted everything to be GPLv2. We wanted it all to be open sourced. But more than that, there were some concerns in the community that this would maybe stifle upstream contributions to the scheduler and stuff like that.
Starting point is 00:43:53 And we certainly don't want that. We obviously still use EVDF internally at Meta, and it's a great scheduler. So this is just one of our ways of saying, hey, people still have to open source them, and we can take whatever crazy ideas that work really well in ScudX and we can add them to the fair scheduler as well if we want to. Well, yeah, that makes sense.
Starting point is 00:44:13 Because I heard that I was confused why, because I don't think you mentioned why there was like that, at least in the talk that I heard. Yeah, well, it's a tricky topic because people, people have really strong opinions. Like, I don't know how much you, you looked at like kind of the conversations on the upstream list, but some of them got a little heated, some private conversations in too.
Starting point is 00:44:36 And that's, that's fine. I mean, like I get it, you know, there's, it's not, nothing is ever black and white, but so, you know, if I, if I didn't mention that it was, it was because I didn't think it was necessary. Like the fact that it's GPLv2 is its own fact, right? Like, okay, now we know that it's GPLv2. And if you have concerns about upstreaming, then okay, you know, that's fair. But at least theoretically, you know, this should be something that we're protected against by GPLv2 at least. So we should probably move on to the second thing you mentioned uh co-chairing the bpf
Starting point is 00:45:09 standardization oh yes so before that uh you're doing that with the ietf i i'm sure the very tech nerds know about the ietf but there's probably another one people don't know who that is oh you have to be like a special kind of tech nerd to be up in the IETF. So IETF, the Internet Engineering Task Force, is a standards body that has created standards for a lot of different parts of the internet, like BGP, Border Gateway Protocol, QUIC, like a lot of old TCP IP packet formats and stuff like that. So they're, they're, you know, a very robust, excuse me, well-respected standards body. And yeah, you know, we wanted to, we shopped around for which, which standards bodies we
Starting point is 00:45:59 wanted to go with. And, and we had no, I've never standardized anything. I don't think anybody else in the BPF community had. So we decided on the IETF because they had a lot of experts there to help us do it right. So what was the, I did see in the document that BPF is also used outside of the kernel as well.
Starting point is 00:46:21 So I guess that's why it's a concern for it to be standardized. So that's one of the concerns. Yeah's there's something called ubpf which is user space bpf um like i was mentioning earlier bpf has its own instruction set so if you compile into bpf bytecode theoretically it could run like any other jit you know it could be cross-platform just like a jvm program could run across any platform theoretically. So we wanted to standardize for software reasons. The big reason, though, is because there are hardware vendors that are building support for offloading BPF programs to devices as well.
Starting point is 00:46:56 And that's a big, big investment for companies that do that. And there have been companies already, like a company called Netronome that's built BPF offload even without the standard. But that's a little bit of a special case. And so we've been hearing from vendors that this is something that they want. This is something that they kind of need before they'll actually be willing to invest the money in it. And, you know, like a lot of the trend of the tech industry right now is going towards offloading, for networking, offloading TLS like transport layer security. So you're doing encryption and decryption on the actual NIC itself,
Starting point is 00:47:31 instead of having to go all the way to the CPU to do it. Yeah. You know, it's with like, now that now that we're not doubling our compute capacity every 18 months, these kinds of things are what we're sort of where we're going. And so BPF, you know, I think it's a really nice middle ground between having nothing and having an ASIC that costs like a billion dollars to build.
Starting point is 00:47:50 So, hardware is hard. It's hard for us to predict what we should be standardizing to accommodate hardware without kind of making it too hard for them and also giving them kind of the guardrails to build something that's gonna be worth it it um but uh but yeah that's the idea it's it's i think more more than software it's definitely for the hardware vendors but everybody does benefit from it well that definitely makes sense how long has this been in progress for
Starting point is 00:48:19 oh let's see this has been in progress officially since I think March 2023. Oh, it's fairly new. It's pretty new. Yeah, yeah. And so there's, you can actually Google like BPF IETF working group. And we have this this charter where we have all of these documents that we that we're intending to write. Some of them are standards documents. Others are what are called informational documents. So they're basically like, are suggestions for like ABI and whatnot, but they're not actually standards that you need to write. Some of them are standards documents. Others are what are called informational documents. So they're basically like our suggestions for like ABI and whatnot, but they're not actually standards that you need to follow. But it's pretty new. Yeah. And we're, we're pretty close to going to last call for our, the instruction set standard, the ISA standard. And that's the first, that'll be the first document. So we're really excited for that. But, you know you know if that goes if that continues to go well and we find that we need to keep going then this is this is going to be like a long process i don't know if i'm going to stay a chair the whole time i don't know if i have the the energy for that but uh but it'll it'll be
Starting point is 00:49:17 yeah it'll be a long long road for sure so has there been any obviously you can't say anything that maybe is you know off the books not allowed to say it but like has there been any sort of pushback with the approach to standardization or the idea of standardization or anything like that actually not really um well so there well there have been some people at the ietf that didn't think that bpf was was the right fit because they're more used to standardizing protocols and packet formats and stuff. They're not really used to standardizing JIT runtimes. It's a different approach for them.
Starting point is 00:49:56 So there were some people at the IETF that didn't think it was a good idea. We're still there. In terms of people in the industry we haven't really gotten much pushback from anybody on that specific point um and i think i think it's not super surprising i mean it's for better like even for even for people that don't really like bpf it's just too big and it's we need to we need to standardize it at this point i mean the de facto standard right now is just whatever we do in the Linux kernel and everybody kind of follows along,
Starting point is 00:50:28 which is great for us, but it's not really like conducive to growing a global community around the technology. So I think everybody's okay with it, yeah. So, give me a second. I had i lost it again um what was i gonna say uh right so considering the like the history the icf has with the things that they've typically been involved in standardizing why specifically go through working with them so yeah, we thought about working with a few different people. We thought about working with Oasis, which is what Vert.io went through.
Starting point is 00:51:10 We thought about doing, just publishing the standard through the eBPF Foundation, which is a subsidiary of the Linux Foundation. They haven't standardized anything. And I think amongst the three, it's probably not controversial for me to say that IETF is definitely the most rigorous and has like the most kind of oversight and processes. And so for us, you know, we we wanted it to be a good quality standard. And you can certainly do that with Oasis. Nothing against Oasis, but we we just need it.
Starting point is 00:51:39 You know, none of us had experience with standardizing anything, so we just needed the help like yeah. you know, we, we were talking about doing like ISO standards and all this stuff. And we, you know, we just didn't know what was involved at all. And so they were able to come in and kind of give us the standard side of it. And we got we're giving the technical side. And there are people that are thankfully experts in both that are that are helping but more than anything it was definitely just uh you know the prestige of the organization the the the oversight they were willing to give us and kind of the hand-holding they were willing to do and then ultimately just you know we thought it was going to be best for for the community
Starting point is 00:52:20 so you would say it's very much been a learning experience for you then just trying to understand how how these standards even really how you would even go about structuring it really yeah oh yeah absolutely and i mean like a lot of it a lot of it is just doing what other people do right like we um one of the things we were trying to come up with recently was uh how should we group instructions bpf instructions in a word called conformance groups and what that means is like if you want to conform to a part of the standard you have to you have to implement all of the instructions in this group or else you're not conformant and so do we group atomic instructions do we group division multiplication what do we do and we we were bike shedding on this for a really long time and then
Starting point is 00:53:03 finally i just went to the RISC-V standard. And I was like, we're just going to do what RISC-V does. Okay, like, that's it. And I think that's actually not an incredibly dumb idea because, you know, a lot of hardware vendors are using RISC-V. And so it makes sense. RISC-V has conformance groups as well.
Starting point is 00:53:19 And so that's sort of what we've been doing. Yeah, there's no point just imagining everything up yourself when nobody here has any experience with doing so. Like you can see what is already out there. You can take inspiration from that and just work from there. Exactly. Yeah, exactly. And hope that they had a good,
Starting point is 00:53:40 they made a good decision and you're not just, you know, following in their footsteps. Well, hopefully in the case of risk five at least uh extremely well thought out yeah absolutely so another thing that you um mentioned in your email was a thing called uh shared run queue oh yeah so that was what i was talking... I briefly alluded to that when I was talking about this thing that I sent upstream that does work conservation in the scheduler. And so this is interesting.
Starting point is 00:54:14 So I was saying earlier that you have this vRuntime notion per CPU, right? And so the problem with that... The good thing about that is you can scale well because everything is happening in the granularity of one CPU so like you don't really contend very much and whatnot but when you're doing load balancing the problem with that is you have to iterate over all of these cpus to like gather load and load balancing is really is really expensive and so there's um there are heuristics in the kernel where we don't even load balance at all if we think it's going to take too long. And load balancing can decide not to do anything if it doesn't think it's worth it and all
Starting point is 00:54:49 these things. So shared run queue is a feature where per LLC cache, per essentially L3 domain, we have just a FIFO queue where when a thread wakes up or when it's enqueued, we put it into this shared run queue. And then any time a core is going to go idle, it could just pull a task from that run queue, the shared run queue, instead of going through the whole load balancing path and doing the slow thing of iterating over every CPU. And so that works well for workloads
Starting point is 00:55:22 where you need really high work conservation, like HHVM, the JIT web engine that Meta uses. It might not work quite as well for something where you're doing really short bursts of work where you need to keep your L1 cache locality high and migrating is just not worth the overhead and whatnot. It's kind of stalled. There's four versions that have been sent upstream. And I'm happy to send you.
Starting point is 00:55:45 I don't know if I sent you v4. I hope I did. I don't know. But it's stalled a little bit because the timing, like the EVDF has sort of changed the performance profile a little bit. And I just don't really have time to like go work on it. So some people at Google said that they were interested. So I just I put the latest version, I think it was V4 up and I'll get back to it someday
Starting point is 00:56:07 if I have time or somebody else can pick it up and take it to the finish line. We have two links in here. Patch V4, sked, implement, shared run queues. That would, I'd be looking for. That, yeah, that's gotta be it. It's gotta be it. Yeah.
Starting point is 00:56:23 We're not going to have a look at all of it here it's just i'll save i'll save reading a text dump a mailing list off for a video yeah that's yeah yeah exactly um yeah so it's kind of in this like weird not like upstream yet but like is it in a good state at least or it's just it's in a good sorry go ahead no no go on yeah i was gonna say it is in a good state it should work um it should it will do better than evdf on a good number of things i think but um you know like it could be merged unless there's been like merge conflicts. It could be merged and it should be fine. But, you know, you have to like, scheduler is so core in the system that a lot of the time you have to really have like kind of a bulletproof case for it, for something to
Starting point is 00:57:13 get added. And that's, it's not, you know, it's just, it's, it's not going to work well for every workload. And so I think if it were to ever get merged, it might be that there are some things we could do to improve it. Like we could maybe have some better heuristics for if we actually don ever get merged, it might be that there are some things we could do to improve it. We could maybe have some better heuristics for if we actually don't do this migration and whatnot. But ultimately, the people that they're just going to have to accept that this isn't going to work well for every scenario. And you enable it when you want to use it. It's
Starting point is 00:57:41 not actually enabled by default. In the core core scheduler there's something called sked features where you can dynamically enable and disable things at runtime um and it's disabled by default but even even then it's it's just you know it hasn't been merged yet it's interesting to hear you talk about kernel problems like that because it even at the kernel level like there are the same sort of problems like okay this is just not gonna work for everybody but something needs to get upstream like when i talk about like stuff in the whalen project for example like there are very similar sorts of debates they're like okay well here's this very specific case where this doesn't actually address the problem well here's this one group that doesn't want to do it and it sounds like in the kernel the same sort of challenges are still there oh 100 yeah it's i mean you know at the end of the day it's just
Starting point is 00:58:30 software right the kernel is 100 just software the maintainership model makes things very interesting no for sure um but but yeah you know like like there was another bug i don't remember actually what what the bug even fixed but um we had submitted a patch to the scheduler and uh it took like two years for it to get upstreamed and it only got upstreamed when we were i know and this this is just like a no you say that once again i'm just thinking of problems that happen in wayland where it's taken like five years for protocols to be upstreamed oh fantastic yeah and yeah so you get it, in this case, we had to literally write a benchmarking tool that showed the problem. I'm sure in Wayland, it's similar.
Starting point is 00:59:10 I'm sure in every big... Well, the problem is just you have a project that has a lot of people. Like, you know, when you have a... When it's like a small application and it's just we have... Like, when you just have a BDFL and what the BDFL says is what happens, things go smoothly. But when you have, you know, 10,000 different cooks in the kitchen, you know, there's going to be challenges there. There's going to be a lot of bike shedding.
Starting point is 00:59:37 Yeah, there will be. There will be. So out of curiosity, so does Wayland, I'm not familiar with, like, the maintainership model. Do they have a BDFL or is it a communal thing? So they have, the different desktops have different voting members and they all have the ability to NAC different protocols. So if there is a NAC from one of the desktops, typically it's from GNOME. If there's a NAC from one of the desktops typically Typically, it's from GNOME. If there's a knack from one of the desktops that
Starting point is 01:00:10 If I'm recalling correctly, it cannot be in the XDG namespace, which is the general namespace But it can still be in the ext namespace, which is the extension namespace and Then there's the whole matter of before something can be upstream, there needs to be implementations in at least three different desktops and three different projects,
Starting point is 01:00:35 along with there being acts from three different projects as well. Like it's very much, you know, there's a lot of people that can vote on stuff. And when you have desktops that have fundamentally different approaches for how they want to do things, it can get complicated. Like right now, one of the big ones that people are arguing about, this is going to sound really stupid to you from your perspective. this argument over whether application like whether application windows should be able to set an icon for themselves along with the the children of that window having separate icons so if you want to have like a settings window and that's going to have like a little settings cog
Starting point is 01:01:17 if you can set a little icon there that tells you the settings window and there is like a at least 500 message email thread arguing over whether this should be done the format of the images uh whether you know how how we even do this like what the protocol should look like it's it's an absolute mess and that's just one of them i i could talk about a messes all day that project i i actually think waylon sounds worse than the linux kernel like i mean i don't know the process at all but we at least have bdfl right we have lenis and ultimately his word is the word that that matters the most um well good luck i Yeah. Is your... It gives me great content to talk about on the channel.
Starting point is 01:02:08 Yeah. Happy to help. Do you think that they should be able to use their own icons and their children should be able to define their own icons? The thing is, it's trying to address a problem where the issue already was solved on X.org. So a lot of the issues we have are something can be done on Xorg. But now that we're moving to Weyland, it's like, okay,
Starting point is 01:02:32 now we have an opportunity to change it and maybe do something better. The issue is sometimes it's okay that wheels are round. You don't need to change the shape of a wheel. I think they should be able to. you don't need to change the shape of a wheel uh i think they should be able to the issue and there's a big discussion about how that should be done like what the protocol should look like and that's totally understandable i don't i don't see any any reasonable discussion happening around whether it should be possible like that that's just insane to me yeah that's that's a really interesting way to frame it though that like should yeah basically
Starting point is 01:03:05 the idea of building a new system is that you want to kind of get rid of some of the cruft from the old one but then if you have you have to get people to use it it has a print server in it you can print your screen to a printer because i mean it is it treats the output device very generically so it doesn't care what it is literally anything could be an output device oh god i'm glad i never got into that i feel like i should but i guess yeah i'll let you guys handle that i look i i'm i'm sure there's at least i actually i have said before that i feel like whalen would go a lot better if there was a BDFL. You would have issues... The issue you have there is if they align with what a lot of the users want, because it seems like Linus generally steps in when people are just doing something stupid.
Starting point is 01:03:56 It's, at least for the most part, and then it would go like a big tirade, big frantic tirade you get. We all love those, yeah. But, yeah, go on, what were you saying uh well yeah i think so he does step in when people are doing stupid stuff but he also he does care about i don't want to speak for lenis i mean he you know maybe i don't want to misrepresent him but my impression is that he he does care about how linux is used and so if you have a if you have a tool that's like care about how Linux is used.
Starting point is 01:04:24 And so if you have a tool that's widely used, like the whole GKE stuff with Android and everything like that, there are times where he'll come in and he'll say, guys, figure out how to get this merged. Enough is enough. There's no point to keeping stuff
Starting point is 01:04:40 out of tree that's used everywhere. Upstream should roughly reflect how this is used in practice. And so I think his voice of reason, he's great because he's extremely, obviously, he's lean as Torvalds. He's very technically sound. He knows, he can read code very, very well and understand every part of the OS. And I think he's a good manager of the project. I think, you know, it's, yeah, he does, he does step in and kind of make, make calls that are outside the scope of just technical stuff as well.
Starting point is 01:05:10 But, you know, even, even so like he, he's only one person, right? So there's hundreds of thousands of lines of code that go into the kernel. So a lot of stuff goes in that he never even looks at. And for us, I'd be curious how Waylon resolves these, but honestly a lot of the time in the kernel, the way that you fix a problem is you go out and get beer with somebody at a conference and you have beers until 2 in the morning and you're like
Starting point is 01:05:30 oh yeah okay we agree we agree and then the next day you know you send it and they merge it it's funny you say that because Fostham just happened and all of a sudden half the issues that people had with the icon thread suddenly suddenly, like,
Starting point is 01:05:47 oh, yeah, so I actually talked to all these people, and it's just like, yeah, it's not actually that big of a deal. But there's a lot of people that get very, very heated. And, you know, oftentimes there's not someone putting a stop to, like, people just throwing insults at each other. And once that's happening, then things have completely evolved you're not gonna get any progress being done because the second you insult someone that's when they're gonna be like nope i'm right don't care what you have to say you're wrong who cares yeah yeah when
Starting point is 01:06:17 it gets personal it's it's problematic and that happens and that happens in the kernel obviously a lot as well and it's it's it sucks i mean it. At the end of the day, all of these things are fairly small communities. I know that FOSDEM is a pretty big conference, but yeah, it is good. And Linus can also, he does chime in also for personal matters where he's like, hey, you're hard to work with, get your act together. And yeah, I guess it sounds like whaling could benefit from that but i don't know i mean if half the if half of the thing the the issues on the the thread were solved with with beer i know that beer is a core part of the
Starting point is 01:06:53 fostum itinerary that uh it's a good sign did you get to make it to fostum it cost me about three thousand dollars to fly over there that makes sense not really i will get there eventually australia is just it's a difficult place to get anywhere really um yeah that makes sense i might see if i can do like you know go there journalist funding at some point we'll see we'll see if anyone's interested in doing that i don't know what their sort of funding is like for that but i know some of the other conferences definitely uh do have funding that regard um we'll see what happens yeah totally yeah that'd be awesome um the next ietf conference is actually in brisbane in march so i don't know how close that is to you but uh i you want to come get beer let's is about 270 to get there oh section here you know
Starting point is 01:07:52 maybe i'll go bad yeah it's uh it's it's not going to be the most i mean fostem is like way more interesting you know itf is like a lot of people arguing about like bike shedding about minutiae and protocols but it's still it's still fun you get to meet people and uh and you know there's the uh there's the the the beer reviews or whatever you want to call them that happen there too well you're mentioning how linus is kind of like this this voice of reason he's it everyone sort of knows like over the years he's very much I guess smoothened out we'll say the way he interacts with people in the kernel because you know you
Starting point is 01:08:35 go back to like 90s lioness early 2000s like we've all seen the early emails that have flown around in that thread they were just like absolutely tearing people apart like just cursing them out like what are you even doing in this project like get out um i don't know if you i don't remember when uh there was like rust support being considered for the kernel um but back when that first got introduced there was a line of code and it was it should not have been they were using the standard Rust library and if something
Starting point is 01:09:11 went wrong it like threw a panic like in kernel space which is like no and Linus responded like this cannot be here like what are you doing but like it was it's a lot more tame than it would have been like you know absolutely just cursing them out like all crazy things um i i think you know i get why
Starting point is 01:09:36 people didn't like the way linus used to act it makes a lot of sense but i i feel like there is some sort of benefit there having someone that is going to, like, someone there needs to be separate and have that sort of oversight of what people are doing. You can't just let, you know, you can't just let people just blindly do what they're doing, especially on a project as big as the kernel. Yeah, absolutely. And, you know, if there were ever a project where you want people not to make, like, it's in the kernel right because everybody has to deal with your problem if you do that and yeah like the panic thing you know it's the pers unfortunately that that's indicative of somebody that doesn't really have experience with kernel work right like you you can't do that you can't just throw your hands up and like panic in the kernel it's a very different environment
Starting point is 01:10:21 the kernel should never crash ever, ever, ever, unless there's an actual bug in which case, okay, fine. You like limp to the finish line and crash. I think, yeah, you know, you need to have real standards, right? Like, like the, one of the, one of the nice things about the kernel is that there's no manager, like you don't have to report to anybody. I mean, there, there are people that work in the kernel that have managers, so they, I guess they still have to kind of watch what they say but it's um yeah like many many people don't work at the same company and i think lenis having that sort of like like you just can't bs him and like he won't let bad stuff in
Starting point is 01:10:57 it's a good culture it's it's a good it's a good culture to hand down to the maintainers and have them apply it in their own ways um i do think that yeah you know he's he's he's he does like he is very direct in a lot of cases i think also he's like way humbler than i think i would be if i were in his position like essentially like essentially like as influential as bill gates or like steve jobs or whatever right and he just kind of chills in Portland and just like, just like does code reviews for the Colonel, just kind of crazy. But he's,
Starting point is 01:11:31 so I'll give him credit for that. I will say my personal opinion is that I think Colonel work is a little bit too. The community could be way more inviting to people. I think there's, there's definitely a mystique of like, this is like elite stuff and yada, yada, yada. I really believe that at the end of the day,
Starting point is 01:11:49 it's just software and you kind of just have to understand like what you're working, the area you're working in and then just build stuff like you would anywhere else. Maybe not just like you would, but it's pretty similar. And I think that like culture of publicly shaming people kind of also has that effect. Right. And, you know,
Starting point is 01:12:06 the effect ends up being that you filter out people that shouldn't be there by any means and people that really would, would provide value. And at the end of the day, you know, the kernel has been successful. And I think, I think even if we have filtered out people that, that would have actually been great members of the community, clearly it hasn't like hamstrung us, right? Like it's still a great kernel. The kernel is just this, it's this weird project, right?
Starting point is 01:12:30 Like, you know, just the fact there's a mailing list you have to get involved with is the, like the Bugzilla, which is kind of official, kind of, there's some maintainers that don't even like the Bugzilla exists. I've seen giant mailing list discussions arguing about the existence of the Bugzilla exists. I've seen giant mailing list discussions arguing about the existence of the Bugzilla. Just the fact the mailing list is there, like that in itself is this weird thing that a
Starting point is 01:12:52 lot of people just don't really have experience with. You know, Fedora has a mailing list, for example, and Ubuntu has theirs as well. But, you know, if you're involved in just general user space stuff, typically the way you interact with a project is through like a github or a gitlab bug tracker so having to go through this like weird thing that you've probably never used and then there's like specific etiquette on how you interact on the main list and all of this stuff like i i get why it's confusing and the fact that it's the kernel right like it it's this giant project with a million i don't know how many lines of code it has now probably like definitely millions yeah millions of some will
Starting point is 01:13:30 find the exact number of time we ordered this um it's a very big project and it can be hard to even on something smaller like a desktop environment it's hard to find like where your piece fits in like what can you add to it and i can only imagine that problem is even worse on the kernel once you're in there and you know that you can change it i'm sure it's a lot easier to like work out what you can do there but like just getting that first step in the door i could imagine being really really difficult it is yeah it is i mean you know it's interesting because the whole mailing list thing i think probably more than anything else is actually the biggest filter to people participating and i think i don't know if this is true or not but i have to imagine that's kind of by design
Starting point is 01:14:15 but yeah it's it's a little bit wonky like you do you email people patches um they reply to your email with code reviews and then eventually either they get dropped or the maintainer takes your patch that you emailed them and like merges it to their local repo. And then eventually sends it to Linus and Linus merges it to his repo, which is the actual like upstream kernel. It's a really weird model.
Starting point is 01:14:41 Yeah, but you know, it works well. Um, I think if it is really hard to get started for sure, uh, you have to, you have to, first of all, learn a lot of like low level stuff. Like how do I even test it? How do I build the kernel? How do I test it? Um, how do I configure it in the way that I want to test it? I have to like add this K config option to like compile nine P so that I can mount a host file system and then like run tests. I mean, it's really complicated. tested oh i have to like add this k config option to like compile 9p so that i can mount a host
Starting point is 01:15:05 file system and then like run tests i mean it's really complicated um but you know it's it's it's that part of it like the the mailing list part and the the the like getting your rig set up kind of thing honestly it's like probably a couple of weeks i actually wrote a blog post on how to do the mailing list part if anybody wants to get involved. To me, the best thing that you can do if you do want to get involved is to just do code reviews. Do like actual real substantive code reviews. You know, if there's an area that you're interested in like MM or BPF or whatever, just follow follow the mailing list and kind of try to keep track of what people are doing like read their their patch their cover letters which describe the feature try to understand how it works and then when you start
Starting point is 01:15:55 doing real code reviews and you can maybe submit some bugs and fix a few things like that then you're kind of in the in the groove you know and and in that, it's not that dissimilar, I think, from any other project, but it is bigger for sure. And you have to just find a little piece of it and follow that piece and start to build your repertoire from there. I guess I have to say this as well. If you want to send a patch to the kernel, you can definitely send like a documentation fix up or like a grammar or whatever fix up. Please don't do that more than once, maybe twice there. There are people that send nothing but like moving commas around and like fixing spelling and stuff. And like, everybody knows who they are and everybody is like, like eye roll, like, come on,
Starting point is 01:16:42 like, please stop. Like it's so, you know, it's cool to have your, it's obviously awesome to have your name in the kernel, but I think it's, the community kind of expects people that participate to have a substantive ability to participate and, you know, like provide code value at the end of the day. Right, that is understandable. The documentation one is weird
Starting point is 01:17:05 because someone... I get wanting it to be a substantive change. And I don't know, it's weird, right? Because you see an issue with the documentation and you do want it to be fixed, even if it's a fairly minor one. But I can get it from that perspective as well, especially if you are doing it a lot.
Starting point is 01:17:26 Yeah, I mean, look, I personally think it's valuable for sure, especially if you're looking at a subsystem and you're actually documenting it. Right, right, sure. Yeah, so there's a documentation subdirectory, a subtree in the kernel where everything is documented. And BPF has a whole bunch of stuff that could be better documented. Please feel free. I will act it. It'll land. in the like subtree in the kernel where like everything is documented and you know bpf has a whole bunch of stuff that could be better documented like please feel free i will act
Starting point is 01:17:48 it it'll it'll land i promise you that what i was talking about was more like people who like will fix a typo or like move like punct like fix grammar it is like i do think it's valuable but people know that you're just doing it to like get your patches in you know and it's it does have a cost because you do have to have like people with very limited bandwidth review these things and whatever but uh but yeah i mean it's it's hard to describe like some of the people that do this also like will like engage in code reviews like kind of like in a confrontational way and be like you need to document this. And it's like, you know what?
Starting point is 01:18:27 No, you have no, you have no right to demand anything of me at all. You know, so yeah, I don't want to misrepresent anything. It's good. It's fine. It's good work. Again, I think it's valuable. It's just, you're dealing with people that are, that are kind of cynical by default and they're not going to assume good intent if like you keep doing it you know they're just going to assume that you're like an attention wannabe whatever sure sure yeah no i get it
Starting point is 01:18:53 it's a fairly similar thing happens around um october with hacktoberfest with a lot of github repos so you get these a ton of repos just getting these like very tiny changes because you know some years they'll give out shirts for example if you have a commit made so it ends up with a lot of projects being like ah here's a comma change here's this here's that actually a really bad one i saw like this this was insane there was a a YouTube channel. I think they had like 6 million subs. They got like a 1.5 million views on the video. They were teaching how to use GitHub, right? Totally understandable.
Starting point is 01:19:33 And they showed how to make a pull request, issues, and all that. They showed how to do it on an actual real repo on Express.js. They use that repo. And if you go to their repo there is hundreds of uh pull requests being made with people just adding their name to the readme what yeah oh that sucks that's super annoying yeah i mean that's that's unfortunate because it's like it's actually i don't know i haven't seen the video but i assume that they made like a somewhat substantive pull request right no that's the thing the video didn't either no so in the video they just added the name of the college they were at um oh that's poor judgment
Starting point is 01:20:16 yeah they did say after the fact don't do this but the problem is when you have one and a half million views you're gonna have a subset of those people who just actually go and do it oh man that would be like extremely bad at the company that would take down the kernel email servers if people did that yeah i mean it yeah exactly like and i it's it's it's sad because like you get it right i mean so the hack the hacktoberfest thing i guess to your point like you actually get like shirts sometimes and people will submit absolute nonsense i'm sure and i'm sure they even say this is nonsense i just want a shirt in some of the prs yeah yeah but um but yeah so it's understandable that people want to want to have their names on stuff but after a certain point like you like i don't know like i don't think it means that much
Starting point is 01:21:06 for somebody to have their name in the kernel like i don't know i know people that have never submitted a patch to the kernel that have like much deeper systems experience than like some of the people that only submit documentation patches you know i'm not trying to pick on documentation i'm just saying like it's at a certain point like i think for you as a person the royal you obviously it's, it's like, you're going to want to kind of participate more meaningfully anyways. So usually I do this at the start, but I guess we just didn't do it. What's your background in programming like, and how did you actually find yourself doing
Starting point is 01:21:39 this kind of work? So I majored in math in college and I kind of realized like towards the end of my undergrad degree that I wasn't going to be a mathematician, which I'm glad that I figured that out young enough where it wasn't a problem. So I took a web dev course and I got a job as a web developer at Columbia University for my first job. And I did that for a few years. And then I got into grad school and did my master's in grad school. I thought I was going to be more like a graphics machine learning person because I had a math background. But it turns out that I really wasn't supposed to be a mathematician. And so I ended up finding that the operating system stuff was
Starting point is 01:22:22 way more interesting to me. I like asking the does, you know, I was just, I like asking the question, how does this work at the end of the day? And that's kind of the only question that really matters in operating systems. It's not true, obviously, but you know what I mean? Um, and so anyway, so I was more interested in that. Um, I took more like way more OS courses at that point. And, um, I really liked my kernel course in particular. We had to build like a full 32 bit x86 kernel, which was a good exercise. Everything other than the bootloader,
Starting point is 01:22:51 which I'm glad they didn't force us to do that. And so my first job out of grad school was working at VMware. I was on the core kernel team there for a little while, and then I was on the core hypervisor team both extremely the amazing engineers on those teams I really learned a lot from them and then but I hadn't really done much open source work I kind of wanted to but I had done like some web developer open source stuff but nothing nothing like to write home about and then yeah I went to meta I worked on a an internal thing for a little while that ended up getting canceled. And then the Linux kernel team, thankfully, was hiring.
Starting point is 01:23:28 So I switched over to there. And that's kind of how I ended up in this situation. But yeah, so TLDR, web dev to grad school to OS to kernel work and industry. That's kind of how it played out. Oh, so the kernel work started when you were already there? When I was at meta? When I was at Meta, when I like entered Meta. Yeah. Yeah. I'd been doing kernel work for quite a while at that point. Yeah. It was more related to, to like virtualization because I was working at VMware. So the problems were like, oh, we have like VMs migrating between NUMA nodes.
Starting point is 01:24:00 Like let's migrate memory with them, whatever, stuff like that. But it's all the same at the end of the day, really. There was an article that came out recently on LWN about how there's NUMA text replication is something that people are trying to do, and just a very brief overview of that. So when you're talking about a big computer, if you ever heard the term multi-socket, that's if you have two NUMAMA nodes is really what multiple NUMA nodes is what's going on. And essentially what that means is there's like, there are different places where you have pockets of RAM that are closer to certain sets of cores. And so you want to read from the memory that's closer to your cores.
Starting point is 01:24:36 And one of the things that people want to do right now in the kernel is replicate the kernel code and read only data to all of these NUMA nodes. And that's something that we did way long ago on the hypervisor side of my previous company. So, you know, people play catch up. It's a very different problem in Linux because you have like live patch and stuff like that. So it's hard to do this sort of synchronously.
Starting point is 01:24:57 But yeah, you know, the skills translate for sure. So you've had a fairly interesting career i i guess it's fair to say then it feels like it um yeah it's it's certainly been a wild ride i mean i definitely didn't expect to be working on the linux kernel team um at meta but uh but you know but it's been an awesome experience. Yeah. Kernel work, it's wild. I'm glad I'm here, but we'll see. I don't know if I'm going to want to do kernel work for my whole career either because it's a pretty specialized space.
Starting point is 01:25:36 And I think you are somewhat limited in what you can do at the end of the day as well. I think a lot of people just don't realize how much of the kernel is developed like from people at companies like meta obviously people know obviously the red hat stuff that's there but like a lot of you know amazon does a lot of work in the kernel and v does a lot of work in the kernel amd uh just say amazon maybe I did intel does a lot of work on the kernel like most of yes there is a lot of people who are like volunteers who are doing kernel
Starting point is 01:26:05 work but a lot of the work is also being done and would the kernel would not be able to be in the state it's in today if it wasn't for the support it gets from these companies as well oh absolutely yeah absolutely um yeah i mean for meta in particular uh a lot of the the bigger maintainers are on the team, like the multi-queue, like the block maintainer, Jens Axbo and IOU Ring, he's on the team. The C Group's maintainer, and actually now the other SkedX guy, he's on the team. So there's, and you know, like Meta, I think, I can't really speak for other companies, but for Meta, at least, they are given a lot of time to be maintainers like one one could even argue that a certain large percentage of their their their
Starting point is 01:26:50 salaries are basically just donations to the linux kernel community and there's like no expectation that they do anything um a lot of their work there's no expectation that it has to even have any benefits in meta whatsoever so yeah you know i think it's i know that people there are people that have different opinions and that's i I understand it. That's fine, but absolutely it wouldn't, there's no way the kernel would be anywhere near where it is today if you didn't have companies throwing tons of, of engineering resources at it. And, you know, yeah, I think it's a good thing personally. Like it's, it's these, you know, engineers aren't cheap, so it's good to get, good to put them in an environment
Starting point is 01:27:25 where they can contribute back to an open source kernel Well we can kind of imagine where it would be because there is sort of a good example of it and that's something like the herd kernel which was very exciting back in the late 80s and early 90s and then even in the early I don't know how much you've read like the early linux mailing yourself or this is just me being crazy just reading things that don't matter uh
Starting point is 01:27:52 but there was a even after linux came out and there was clearly a lot of excitement around like even after debian came out people like yeah okay this is going to be temporary until the herd kernel is ready. And then the herd kernel, I think they're just getting 64 bits support in like the last year. Oh my gosh. I didn't like, I think it's like, it's great. Right.
Starting point is 01:28:17 Like it's a cool project. It's fun. But at the end of the day, yeah. Like you have to just invest like these kernels, these open source projects need, need companies to sponsor people to work on them. There was an LWN article recently that talked about how after the whole, what was it? The Log4J thing, I forget what the name of the library was, where there was a horrible
Starting point is 01:28:39 zero-day vulnerability. Should companies be paying maintainers to work on these core libraries uh or should the government yeah so the government companies i think i i personally think companies should i think that you know like if you're if you have uh if you have a company that's like that needs this tool then you know you should pay somebody to do it i don't know i mean i don't want to like be reductive and simplify it. But I think at the end of the day, the really good big projects usually do have company sponsorship
Starting point is 01:29:11 for better or for worse. And I don't know. I mean, a lot of really great ones don't too. I don't want to be reductive. But it's just like if you were building a project, you would want everybody to be using it and everybody to be contributing, right? And that's kind of how it goes for, for Linux. And the nice thing is like, you know, Linus, Greg, Greg Crow Hartman, all these folks work
Starting point is 01:29:33 for the Linux foundation. They have no absolutely zero tie to like any of the other companies beyond their, their relationship with the engineers who maintain the subsystems and so um they you know there is still like a very deep element of impartiality like lenis is not going to accept something from anybody at meta if he thinks it's dumb or if he doesn't think it belongs and you don't have to look very far back to see him like blowing up at people that have like a lot of influence um on the community that work in meta so i think it's a healthy balance but you know i see everybody's side i see both sides of the uh the argument you mentioned there um there's always these like little things these great projects that don't have any funding that is just like you know buy one go whatever you have web dev experience i'm sure you've seen you know very critical
Starting point is 01:30:19 libraries that are used by every company out there that you know one dude just maintains by himself that it's like 10 layers down the dependency stack that you know everybody needs but nobody even knows it exists yeah yeah i don't know man that's that is really you're right that's that's a problem and like i don't know like even if you were to have the government pay for people to work on that which would also be great and they deserve to be compensated like what do you do if you don't know like even if you were to have the government pay for people to work on that which would also be great and they deserve to be compensated like what do you do if you don't even if like nobody even knows that it's actually sitting that low right right it's but yeah it's it's totally i totally agree with you it's not as simple as i don't know and you know log4j was was a recent one but there's so many more for sure.
Starting point is 01:31:15 But, you know, the other part of it is, like, a lot of these core libraries, even if the person that's maintaining it has all the time in the world, like, it's their judgment as to what gets merged, too, right? And, like, if you're, like, a maintainer for OpenSSL and you accept something, you're like, oh, okay. And then all of a sudden, like, you have some stack overflow or, like can read arbitrary memory from on the NIC, then like that's not great. So to me, there's a certain element of like house of cardsness to the tech industry that's kind of never going to go away. But yeah, it would be nice. It would be nice for the people that do that to at least be compensated for sure i'm sure you've seen the the xkcd the uh yeah what number i don't know what number it was it was like the teeny little like block holding yeah huge yeah yeah that's a good one where is it i i looked at it just the other day and now i'm not gonna be able to find it am i
Starting point is 01:32:00 uh there was a oh i think i got it yes if it's two three four seven okay thank you yeah okay okay let's see two three four is that it is that how the website works? Yeah. XKCD, yeah. Yeah, okay. Yeah, so all modern digital infrastructure, all these complicated little applications, a project some random person in Nebraska has been thankfully maintaining since 2003. It's holding everything up.
Starting point is 01:32:38 Yeah, and it's absolutely true. I mean, at least with the kernel, it's all very well, the maintainer ship is documented in this maintainers file. And like, it's, there are drivers that everyone's like, who's maintaining that, you know, like there's a lot of code, but, um, but it's well documented, but yeah, I mean the kernel is a kernel, right?
Starting point is 01:32:57 Like it's not user space where you have everything that's actually used on the system. That's building up your whole ecosystem. And so, yeah, it's, it's unfortunate for sure. I did do a video on, there was this, I don't remember what the exact product was, but there was this Intel, like some weird Intel hardware that came out in 2008. They had a driver for it in the kernel.
Starting point is 01:33:19 Some Intel devs did work on it. No one is sure this hardware actually exists. So they got the patches in there before it released publicly and is sure this hardware actually exists. So, they got the patches in there before it released publicly and they must have canned the project. So, someone was like, does anyone actually have this? Can we just get rid of this?
Starting point is 01:33:35 Nobody was maintaining it. Nobody knew what it did or why it was still here. And it was just random. Obviously, yeah, the kernel is well maintained, but there are going to be those parts where You know nobody's touching it or like a little while back there was this culling of a bunch of random old Wi-Fi hardware like Like 802.11a hardware and stuff like that Where's just like is anybody actually still using this? Like, can we, do we need this here?
Starting point is 01:34:05 Can we get rid of it? Isn't anyone sure it works? Because, like, it's hardware that nobody even knows if any, like, if a kernel maintainer has. Like, nobody can test it. Nobody's sure if it's still working. It's like, so, and they were saying, well, yeah, if somebody has the hardware, please speak up and we'll keep it around if it's still working. But, like, you know, if nobody's using it at this point, like, there's no reason to keep it around.
Starting point is 01:34:28 Like, um, a couple years back there was a discussion about dropping I'm surprised it's still supporting the kernel, but support for the, uh, Intel 486. And they're like, is anybody actually running a modern
Starting point is 01:34:44 kernel on a 486 you know probably is the sad answer to that question though and yeah like it's it's it's it's tough because the the process for getting rid of it is exactly what you just said like you kind of ask timidly like is anybody using this like hoping nobody says anything and then you may you remove it and sometimes people are like like, is anybody using this? Like hoping nobody says anything. And then you may remove it. And sometimes people are like, like, you can't remove that. Like, we don't know.
Starting point is 01:35:09 And then sometimes you get some rando that said, hey, this broke my build and you have to leave it in, right? Like the kind of like golden rule of Linux is you can't break user space. You can't break old devices or anything like that. And yeah, I mean, the Intel story is pretty funny. I wonder if that was the, oh, what was that feature called
Starting point is 01:35:30 where they had like the secure enclaves SGX that might've been it. I think that got canceled. But yeah, that's funny. That's such an Intel thing to happen because like their whole business model at this point practically is just having accelerators like do stuff faster than CPUs. And yeah, that story checks out for sure.
Starting point is 01:35:51 With the Wi-Fi one, I believe one of the drivers that was on the block for culling was the PS3 Wi-Fi driver. And that's one where people did speak up like, yeah, no, I'm still running a modern kernel on a PS3. Like, sure. Why not? Okay, go ahead. speak up like yeah no i'm still running uh i'm still running a modern kernel on a ps3 like sure why not okay go ahead i mean the so the nice thing about like the uh the driver model i think is pretty good with linux where the policy is if you haven't upstreamed your driver then you get no backwards compatibility guarantees at all um and so you don't really actually have any abi requirements for the drivers in the kernel which is really nice because and kernels where you do compatibility guarantees at all um and so you don't really actually have any ABI requirements for the drivers in the kernel which is really nice because and kernels where you do have that
Starting point is 01:36:29 it can be extremely painful but then the flip side of that of course is that if you have an upstream driver then you do have these guarantees and that's why you're like okay I want to like change something that's really dumb to do something that's way less dumb but okay ps3 needs it um so it has its drawbacks i mean yeah i think yeah i i you'd have to hope that like at a certain point i don't know i mean i was gonna say we'll come up with a policy that's like you deprecate devices after like some number of decades but 486 like i don't know so it's it's it's going to be a problem for the foreseeable future for sure well deprecation is a weird run a weird one right because you can deprecate stuff but then there's the issue of can you actually remove it a recent example i saw was
Starting point is 01:37:16 um with grep most systems ship f grep f gre E-Grep. These have technically been deprecated for the past 20 years, but nobody knows they're deprecated, and specifically they're only deprecated in the GNU project. So people just keep using them. There are distros that are still shipping them today, and there's this argument like, okay, it's deprecated, but
Starting point is 01:37:39 well, can we remove it now? But like, there's all these scripts that use it, and it's one of these things where it's like When I was younger, I thought deprecation was a lot easier of a problem, but the second you realize that This was the same problem that was people worried about with Y2K when there are people that are running Your software on 30 year old like installations and have not changed anything in that long. And it's still in deployment. Like it's hard to make changes if they're still using the modern stuff. Yeah, totally. I mean, it's, I, I, it's funny. Cause I feel like I had the same exact
Starting point is 01:38:19 scenario, like the same exact kind of arc where I was like you just just like deprecate it for a long time and then if people complain like just tell them to deal with it but well i don't remember at all what the actual issue was but do you remember like in the last year or so github had that huge outage because they like deprecated some like i think it was some some kind of like encryption algorithm they were using i really don't recall exactly what it was, but they were like familiar. Yeah. But they were like deprecated, deprecated,
Starting point is 01:38:49 like chain, chain, chain, chain, change, like email, email, email.
Starting point is 01:38:52 And then they, they flip it off. And like 90% of the world, like can't connect to GitHub anymore. And like, of course they reverted. Right. And there's,
Starting point is 01:39:00 I forget what the rule is, but there's some like, just like with what Moore's law or whatever, like there's some law where like once something is sufficiently large, like you can't undo it. And I think the kernel is like, and really suffers from that because it's the core of the operating system. But yeah, it's a really, really difficult problem.
Starting point is 01:39:20 And the only person who can really force the issue is Linus at the end of the day. And if you have a user that's going to email the list and say, I'm using this, I mean, yeah, it's, it's, it's pretty fat chance that Linus is going to override them unless there's like a really good reason to. Yeah. Like it's, you know, causing some issue in something newer that more people are using and things, things of that nature, I would assume. Yeah.
Starting point is 01:39:44 Yeah. Like this is preventing us from like, I don't know, like using 64-bit memory, like something like that. Like there isn't anything like that, but it's, yeah. Or what you said too. I mean, it's just really, really rare. So yeah, and actually that's another interesting point. If we bring the discussion back to SkedX.
Starting point is 01:40:03 So another one of the big challenges that modern maintainers have in the kernel is dealing with something called UAPI constraints. And UAPI is the term for the part of the kernel that's essentially the user space visible interface. So if you have header files or you can link against them and make system calls or whatever, or some ABI abi some structure that has certain byte layout that has fields that mean a specific thing that you can never change i mean you can change it if lenis lets you but it's really really really really hard to change like
Starting point is 01:40:35 you basically have to assume that you can't change it ever so maintainers are all are very understandably guarded for new stuff that's being added to api like it's a very very high bar and even if something is a great feature they might just feel like i can know i don't want to maintain this it's going to take too much time so one of the nice things about bpf is that um especially in the modern version of bpf uh because these schedulers or whatever bpf program you load is a kernel program there's no uapi constraints at at all. And the program can talk to user space over the maps, the data structures I was talking about. There is a user space component, but the maps themselves are UAPI. So the structure of how you share this array with user space,
Starting point is 01:41:18 all of that's UAPI. But the actual communication itself isn't, right? You use the array however you want for whatever program and the actual, whatever, the communication channel is completely outside the scope of UAPI. So BPF, one of the things that makes me excited about it is it's going to, I think it's kind of the way forward for the kernel to be extended without having to tie the hands of all these maintainers that don't want to have to add UAPI,
Starting point is 01:41:46 but still extended and still have something that actually performs better anyways, because it's in the kernel. So yeah, that's just another kind of nice thing about SkedX. There's one last thing I want to touch on, but I will be back in just a moment. Sure. you you Teksting av Nicolai Winther uh i didn't actually stop the recording i'll just cut that bit out yeah that's i figured that that's fine i i did stop the recording with the last person i
Starting point is 01:43:45 had and then i forgot to upload i forgot to clip on that part that i cut out so that i had to upload that separately uh we're not doing that this time um okay because people are going to yell at me again for not uploading the ending um the something we touched on at the start and didn't really expand much upon past that was you also have the Twitch channel you have and you're going to start using that to do stuff yeah, thanks for bringing that up it's something I started fairly recently I thought that people might be interested in watching
Starting point is 01:44:20 watching me work on the kernel in real time answer questions that folks have. And yeah, it's engineering, so it's not always interesting to just watch. But I think it's an element of the tech industry that doesn't really get much, ironically, because it's so core, it doesn't really get as much attention as it should have.
Starting point is 01:44:41 So it's just, yeah, it's a way for me to just like chat with people, show people what I'm doing, kind show them how how the kernel works do some deep dives on um on various subsystems like the scheduler bpf rcu stuff like that um and yeah you know just open open open forum for people to ask questions and uh and participate however they want to i think having something like this actually is really cool because it does give people an insight into the way the kernel actually works, the way it's developed.
Starting point is 01:45:14 Like, it's not... It's... Like, yes, it's C code, but it's just C code. Like, it's not this crazy thing that you can't... that you can't modify. Like, you can see the code here. Like, it's... If you understand C, thing that you can't you that you can't modify like you can see the code here like it's if you understand c like you can once you start understanding like what specific variables
Starting point is 01:45:30 frodo what where functions are located things like that like you can like piece it out and start making sense of it you you absolutely can and i think you know you i think you were right when you said earlier like this this huge hill to climb when you first start is that you feel like there's this literal mountain of code that you can't even see. It goes into the clouds, right? And that's true. But what you can do if you have somebody that's there to explain it to you, they can explain conceptually what big pieces of this thing are doing. And then everything else is an abstraction, right? Like, oh, I want to add a BPF feature.
Starting point is 01:46:07 Okay, I'll tell you exactly what, I'll explain BPF, I'll do a deep dive in kind of the background of it. And then I'll tell you what this specific type of object does. And then you'll realize, oh, there's a lot of different types of these things that we actually could add that would be really useful that actually wouldn't even really require a lot of expertise to add. And I think, you you know the kernel the community in the kernel is is as uh they're they're they've been around for a while
Starting point is 01:46:32 right like it's it's a lot of like experienced people that maybe they want to retire soon maybe not you know but it's it is definitely an older demographic and i think it's i think it's important for the linux kernels community to start to kind of grow into the younger generation as well. And I think, unfortunately, one of the big drawbacks is that there isn't really this, like, this, this sort of, this sort of vision into it, right? You kind of have to just jump into the super deep end and figure stuff out, but it doesn't have to be that way. I'm kind of trying to make it maybe not that way a little bit. And yeah, you know, I'm not sure, I'm not sure if it'll work out or not, but for now it's been really fun.
Starting point is 01:47:09 The community's growing and yeah, it's a fun place to hang out. I've tried to do programming streams before, but I get way too distracted by people talking. I can't. You know, I feel like I have to have i have to be working on something that's like really easy to reason about like i i can't be like debugging a scheduler thing although we actually did debug um a kernel bug in real time which was a fun stream um but yeah i'm the same way i mean it's it's like but that's the fun for me that's the fun of it like
Starting point is 01:47:42 people i'll like talk and do stuff, and then people ask questions, and we'll kind of go on a tangent for a while, and then people ask questions about the questions. But usually when I do a stream at this point, I'll have some kind of plan for it. The other day, I showed people how Stack Overflow works and gave an example of how to write a Stack Overflow. And so it wasn't really kernel work.
Starting point is 01:48:04 It was kind of just systems systems oriented you know but um but that was useful it was like it was kind of organized enough that i i felt like if people were asking questions i wasn't like the whole like big picture in your head of the system that you're trying to reason about doesn't like get toppled over which obviously does happen if if uh if you're working on something that's kind of too complicated yeah yeah well even for me it's not some it's working on something that's kind of too complicated. Yeah, yeah. Well, even for me, it's not just something that's too complicated. I just cannot.
Starting point is 01:48:32 When I want to do any sort of programming work, I need to be like locked in. I don't want any distractions. I don't even particularly like listening to music when I program. Like I honestly want to have earplugs. That's awesome. I should try that, man. Yeah, I get it.
Starting point is 01:48:48 I totally get it. Yeah, my wife and I had to set up a system where I would put, I have like an on-air neon sign hanging outside my office door. And she's like, okay, like he's on air, whether he's on air or not. And yeah, it can be really tough. So yeah, exactly to your point
Starting point is 01:49:05 like I just I usually work in areas that I have like a lot of I feel very comfortable in and it's usually just an educational thing even more than like breaking ground on like a big new thing no there are a couple other people that do these like very in-depth streams
Starting point is 01:49:21 like the person who works on ushe linux the work to get the m1 m2 the apple silicon max like working that she streams all of that stuff uh we're talking about hector martin right yeah uh hector and then also we have us helena doing that work oh i haven't seen any of those streams okay cool um cool. Yeah, so we have like, there's people doing this kind of work and there's some people that are really good at it.
Starting point is 01:49:49 And, you know, if you can find a way that makes it work, if you're in a way that you feel like it's, it's like useful to people seeing it. Like I honestly, I think it's worth like just experiment with and just see what happens with it. That's the plan. Yeah, that's the plan. We'll see how it goes.
Starting point is 01:50:08 I feel like I have more work than... There's an infinite amount of content is the nice part. It's millions of lines of code and it's always changing and growing. So I think it's fun. Yeah, we'll see. So somebody like Hector Martin, I mean, he's so knowledgeable about stuff too that like i think he he literally can hack on like really complicated stuff and like it's trivial for him and he can talk about it for that i don't think that'll be i could do that in my
Starting point is 01:50:33 stream necessarily right maybe but um yeah mine i think it's if i had to guess it's probably going to be more educational and um i might i might do like youtube and make make educational videos as well um that's the that's the the rough plan but you know we'll see we'll see how it works out so if you want to check that out where can they go to find you they can go to twitch.tv slash byte lab that's byte b-y-t-e underscore lab um that's probably the best place to start for now and you can do twitter twitter.com slash bite lab as well awesome yep um i think we've touched on pretty much uh okay gypsy died uh lovely this happens sometimes.
Starting point is 01:51:33 Hey, I think I lost you for a second. Yeah, no, this happened the last time I used Jitsi as well. I think it's because the call was going... Yeah, it just hit the two-hour mark. That's why. Oh, interesting. What happens at two hours apparently it kicks me from the call and now there's two of me which is fun um i was gonna say
Starting point is 01:51:53 i guess we should probably uh end it off now unless there's anything else you want to say we've sort of touched on everything i wanted to talk about uh i think that was it yeah thank you so much for for inviting me on the podcast i had a really good time like lots of really really interesting deep questions so thank you very much and um yeah i think everybody enjoys watching it i think some of it went a bit over my head um i i try to keep up as much as i could but you know this is a complex area to deal with for sure it is yeah it is and uh it's yeah i i'm sure if it went over your head it's because i it's just there's just no learning this stuff without staring at it for a long time but um
Starting point is 01:52:37 but yeah if anybody has any questions i'm happy to clarify in the comments or or uh come come ask me on the stream and I can clarify as well. Awesome. Um, I guess, uh, we already mentioned the Twitch, but is there anything else you want to shout out? Anything you want to direct people to, um, let them know. Yeah. You know, for now, for now, just, just Twitch and Twitter, the two, I'm like I said, I'm going to start doing YouTube.
Starting point is 01:53:00 Um, that's the plan at least. Uh, but for now that's kind of where all the content is going. You can follow me on Instagram. I haven't posted anything yet. And there's also a Discord channel as well that doesn't have an easy to pronounce link. So I don't really know how to tell people to join.
Starting point is 01:53:17 Is that linked on your Twitch or your Twitter? Someone like that? It is on, yeah. So that's a good call out. It is on the Twitch. So just go there and people can find the link to join awesome oh did the no the other me didn't leave yet okay i oh there it is okay it left on your side okay now we're good cool whatever jitsy will be jitsy um is that is that what you want to mention if it is i'll do my outro uh that is all i wanted to mention yeah thanks again for your time awesome uh okay so if you
Starting point is 01:53:53 want to see more of my stuff the main channel is brodie robertson i do linux videos there six days a week uh i have no idea what will be out by the time this comes out because this is kind of getting recorded a bit ahead of schedule I've got like three episodes backlogged so we out March sometime um if you want to see my gaming stuff I do gaming streams over on Twitch at a Brodie on games I've probably close to finishing both games by now so just check what's over there you'll see what's over there I have a react channel if I watch things on the stream, I upload them there. Do not expect good content.
Starting point is 01:54:29 Do not expect well-researched content. Do not expect anything that's worth watching. But if you would like to see me ramble about nonsense, which is what I normally do. It's just less scripted nonsense. Check that out. And if you're listening to the audio version of this, you can find... Rodeo Optics and reacts. That's the channel. And if you're listening to the audio version of this,
Starting point is 01:54:54 you can find the video version on YouTube at Tech Over Tea. If you're watching the video, you can find the audio on any podcast platform. Search Tech Over Tea. There is an RSS feed. You will find it. Stick in your favorite app and you'll be good to go give the final word what do you want to say uh keep hacking keep it low level keep it real and uh yeah hope to see everybody in the future at some point awesome see you guys later

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.