LINUX Unplugged - 605: Goodbye World

Starting point is 00:00:00 Enhanced BPF is another hotness of Linux for the last couple of years. And when the patches were first added to Linux, the lead developer, Alexey Stratoyov, said, this allows you to do crazy things. This is normally the words you don't tell Linus when you want him to accept patches into the kernel, but fortunately the patches were accepted. Enhanced BPF puts a virtual machine in the kernel

Starting point is 00:00:24 that we can program from user space. Hello friends and welcome back to your weekly Linux talk show. My name is Chris. My name is Wes. And my name is Brent. Well, hello, gentlemen. Today, we're digging into a superpower that is inside all of our Linux kernels. We're gonna talk about how eBPF works and how anyone can take advantage of it. Then we're gonna round out the show

Starting point is 00:00:56 with some great feedback, some picks, and a lot more. It is a special out of time episode. As you listen to this right now, we are at Planet Nix and scale. So this is an episode in between. You're going to get all our Planet Nix and scale coverage coming soon, but we wanted to take a moment in between episodes

Starting point is 00:01:16 and do something kind of fun and really dig in and get technical. But first I want to say a big good morning to our friends at tail scale, tailscale.com Unplugged they are the easiest way to connect your devices and services to each other wherever they are. It is modern networking It's a flat mesh network that is protected by a while ago That's right And it is so fast so quick to get going and it gives you superpowers

Starting point is 00:01:42 Not only do you get a flat mesh network across complex networks, so maybe you have multiple data centers or VPSs, you've got mobile devices or you've got double carrier grade NAT, it'll smooth all of that out. That's fantastic. But then there's also a whole suite of tools that make it really convenient to use sort of like airdrop for your entire tail net, including like your Android device and your Linux devices. You can send files around.

Starting point is 00:02:04 They will manage your SSH keys through your tail net for you. So you can just log into all your individual devices. You don't have to manually copy keys around like an animal. And they also offer more advanced features so you can set up ACLs and really manage the system and lock down only certain things to certain people. And when you try it now, when you go to tailscale.com slash unplugged, you get it for free up to a hundred devices and three users. No credit.com slash unplugged, you get it for free up to 100 devices and three users, no credit card required. I mean, you can really cook with 100 devices

Starting point is 00:02:30 and then maybe you'll discover it's great to bring to work to thousands of companies like Instacart, Hugging Face, Duolingo, Jupiter Broadcasting, and many others use tail scale and we love it. Try one for yourself. Go get yourself a little tail scale right now. You're gonna love the way it tastes and you going to love how easy it is to get going. If you've got five minutes, you'll probably get it running on three devices.

Starting point is 00:02:49 I have no inbound ports on any of my firewalls. Tailscale dot com slash unplugged. Well, like I mentioned, we are at Planet Knicks and scale right now. But we did think it was sort of a perfect out of time episode because we really first got excited about eBPF at scale back in 2019. Yeah. A million years ago. And it was a great presentation that really just brought home to us how

Starting point is 00:03:23 powerful this was going to be. Yeah, you heard from the man himself, Brendan Gregg, observability, performance, tracing, guru. EBPF was still kind of new then. He's kind of well known. You've maybe seen his famous video online using D-Trace to show how hard drives don't like it when you yell at them. Yep. So, you know, has deep insight into this area and was even in 2019 and earlier

Starting point is 00:03:46 was already getting excited about eBPF. So it's kind of neat to look back now as like a whole giant marketplace of eBPF based observability tools now exist. And the name is sort of a misnomer, right? Because it sounds like a packet filter. And so you think, well, how, okay, what, is this a firewall guys?

Starting point is 00:04:05 No, no, no, that's why we wanted to play that intro clip. It is so much more. It is really a VM inside the kernel that can run simple code that you can create and craft that is protected. And so since this is a prerecord, we're not gonna have your boost this week, but we do wanna know if you like this type of deep dive

Starting point is 00:04:20 into this particular topic. So boost those in and we'll bank them from when we come back. But let's talk about the Extended Berkeley Packet Filter. Yeah, so it does do some networking stuff still to this day, but it did, as you say, start out as a packet filter introduced in 1992 to efficiently filter network packets in BSD operating systems. MPFSense and other firewall products like Open Sense, they've been using BPF as part of their core product.

Starting point is 00:04:49 That's one of the reasons I was an early PF Sense user. And before too long, within the same decade, it made its way over to Linux in the form of TCP dump. And already you're seeing this thing, right, where you can kind of use user space to help better observe what's going on on your system. And so if you've ever written sort of an expression to filter things or look at packets using TCP dump,

Starting point is 00:05:09 while you're using a language that then gets compiled down to a BPF bytecode and executed. That bytecode right there is kind of the magic, right? Because it turns out that this thing is essentially capable of running this bytecode. So it's not just a packet filter. Yeah, and that's like how the implementation works. As it started out, it was a very simple virtual machine,

Starting point is 00:05:27 and don't think virtual machine like QEMU necessarily or like a full, simulating a full computer. The point is it's like a very limited, restricted bytecode that can only do certain things relevant to, at first, filtering packets, and that lets you make sure that like, it's not gonna do anything crazy, it can't go into infinite loops, all kinds of other nice things and optimize it, And then be able to load it in and have, you

Starting point is 00:05:49 know, you run the program, it supplies the packet and anything else that you need as input to it. And then the machine executes it. Ultimately, that's how you tell like, do I accept the packet or do I drop the packet? But at first it was a, you know was a very limited, I think I had like two registers to use, super limited thing, but eBPF, extended BPF, was introduced in Linux 3.18. This was like 2014. So BPF had been around for a while, there'd been various developments,

Starting point is 00:06:17 but eBPF really kicked things off in the Linux side of things. BPF hadn't caught up with the times in some ways, it was still 32-bit. It had some of those register limitations. So they upgraded to 64-bit registers, added more instructions. They added the verifier to the kernel,

Starting point is 00:06:33 which is a big part of it that lets you analyze BPF programs to make sure they're safe, no infinite loops. They don't do invalid memory access. There's a checker process. Right, because it's like you're loading something from user space into the kernel. That's a big security concern. So, because it's like you're loading something from user space into the kernel. That's a big security concern. So you want to make sure that you have,

Starting point is 00:06:49 and just beyond security, operationally too, right? You don't want it to be able to crash the kernel. So we actually started to see this stuff land, you know, progressively in Linux 3.18 and beyond. So this is actually, like you said, 2014, that's a, this has been landing for a while. Yeah, definitely. And then, so that was kind of like the raw stuff and then 2018 and beyond people started adding more tools

Starting point is 00:07:09 Like the BCC, but the BPF compiler collection BPF trace which we'll talk about to the company psyllium and their their products have a bunch of like eBPF stuff for kubernetes offerings Plus we got better better compiler support. There's something called co-re or compile once, run everywhere. So compilers have better support to be able to make, you know, you can compile your BPF program

Starting point is 00:07:34 and not have to worry about as much necessarily, depending on what you're doing with it, about how compatible it'll be with the kernel, where you compiled it versus where you're running it. So the idea of being like, it should be compatible across multiple versions of Linux, as long as they have this correct implementation.

Starting point is 00:07:48 Right. Okay. This has been so successful that newer versions of D-Trace on Linux, they're basically just some extra user space stuff that uses eBPF and other kernel primitives under the hood. Really? Yeah. And eBPF and other kernel primitives under the hood. Really? Yeah.

Starting point is 00:08:05 Huh. And eBPF was also ported to Windows. Yeah, I heard about this. They're getting a lot of good stuff over there. One important thing with the extended part is they were also able to make the instruction set be more sympathetic to modern hardware, and they implemented just-in-time compilation as well. So that meant eBPF can be really fast.

Starting point is 00:08:26 They've also made it so that there's relatively stable APIs we've kind of been talking about, so that as kernels, internals change, you can use eBPF to hook into internals. If you do that, there's no guarantees, right? That's kind of on the tin. But there is some stable interfaces that you can use, which is nice.

Starting point is 00:08:41 Hmm. So the other important point to know is there's kind of various things it can do. There's XDP, which is Express Data Path, which intercepts packets basically at like the earliest point that you can. You have limitations on what you can do, but it can be super low level and performant. And so you can see this sometimes maybe unlike responding to DDoS attacks where you're getting flooded with traffic that you have, you know, maybe you can identify various ways that you can see this sometimes maybe unlike responding to DDoS attacks where you're getting flooded with traffic that you have, maybe you can identify

Starting point is 00:09:07 various ways that you can program in here. So you can essentially, ideally, catch it before it begins to overwhelm the system. Because you're catching it much further in the driver stack. Yeah, that's the idea. And it doesn't do as much as the very general and powerful, but full-featured Linux kernel

Starting point is 00:09:22 networking stack, you can kind of just be like, well no, if it looks like this at all, cut it off. Yeah, if I'm getting D dust I don't want the network stack trying to figure out what to do with all this traffic because that's what's gonna take me down Yeah, one way to say is limited context, but maximum performance. I like that Okay, so then you can also do various types of k probes. I'm sorry. Yeah k probes And this is dynamic instrumentation. That's why it sounds quite probing that you can hook almost any kernel function. Mmm But you don't there's no like clear definition of what you're gonna get

Starting point is 00:09:54 You have to go look at the function you're hooking into it's all gonna be dependent on that But no kernel internals any kernel function almost I'm sure there are some limitations. That's pretty a lot of them Yeah, I mean that could be like from, you know, keyboard input to network traffic to disk I.O. I mean, that's there's all kinds of things there. So that's where a lot of some of the power comes from. Yeah. But that may or may not be stable. There's no guarantee about it being stable across kernel versions can change at any time. People can update the signatures. That's one of the things that's been happening in our in the rest discussion is the kernel. Many developers expect to be able to make a change like that and because they have one big code base,

Starting point is 00:10:27 they can update it everywhere and be able to do a refactoring like that. And, but the other one, trace points. This is an important one. Trace points, predefined stable instrumentation. Low overhead, but only available where explicitly added by kernel developers. So, OK, so a trace point would be like a way you put the spot you hook in and

Starting point is 00:10:49 start getting metrics or information out of. Exactly. But the developer of the subsystem has to implicitly support that. Yeah, you have to add that in versus totally dynamic with a CapeRope. OK. But the upside is you get a structured data specific to each trace point. Right. So you basically just get to work. Tells you what it is. Yeah, yeah, okay. And they're maintained across kernel versions, so you can kind of rely on them for more long-term use.

Starting point is 00:11:10 I have a feeling this might be relevant later. Okay, good to know. Yeah, I think that's probably like the quick, high-level version of what EBPF is and kind of what it can do. And it is so, so, so simple, but yet so powerful, is what I love about it.

Starting point is 00:11:26 And we have a couple of examples in the show, and I think some hands-on stuff that people could take away. And maybe we should start with BCC tools themselves. Yeah, because we've both had a chance to play with those. Yeah. I was, you know, I was joking around with Wes, and be like, it'd be kind of great to know where in the system,

Starting point is 00:11:47 when I open up this directory that's a fuse mounted directory, like where is the actual delay happening? And what part of the system is sitting around waiting? And through this process of trying to get to that, I came across tools that let me look at my disk I.O. analysis or let me look at the network traffic and really kind of using these different tools, putting them together. You can actually start to get a really good picture of where the delay was happening in the system.

Starting point is 00:12:12 And, you know, for me, it was like, this is going to be amazing. I had I had discovered by running it on my desktop just as it when I was experimenting with this stuff that, oh, yeah, I've still got this errands app that we talked about on the show that I'm not currently using like today But I guess when I close it, it doesn't actually close. It doesn't leave an icon I had no idea was running but using these Tools I started seeing this this this application that was hitting my disk every so often I'm like, I don't recognize this and I discovered I actually had that running the entire time and it was kind of useful Yeah And I discovered I actually had that running the entire time. And it was kind of useful. Yeah, it kind of skips through a lot of the boundaries and other limitations that can sometimes pop up when you're trying to look at your system.

Starting point is 00:12:52 So it can be surprisingly insightful. They do have a nice tutorial which we'll link. They also, I like that they have a set of things to run first. Like before you go to these tools, make sure you check all the regular system monitoring tools first because we'll see as a theme, like your H-tops and B-tops and all kinds of things, kind of get a broad look.

Starting point is 00:13:11 And you could see some specifics. Whereas you can make broad eBPF programs. But by default, they're going to be a lot more specific when you're looking at one thing like disk latency or something specific to the file system or networking. Yeah, yeah, that's a good point. All right, so we could talk about some of the commands we tried.

Starting point is 00:13:30 I know that file top and TCP top we played around with. I did not get a chance to play around with ext4 slower and XFS slower, but this is an interesting way you can use it too. Uh-huh, yeah, it'll tell you when there's things happening in your file system that are causing latency more than a programmable amount you can pass on the command line.

Starting point is 00:13:51 Or they've got ZFS dist, which traces ZFS reads, writes, opens, fsyncs, and then summarizes the latency of all that in histogram for you. That's really cool. Yeah, right? And then two really useful basic ones are exec snoop and open snoop. Yeah, open snoop. Isn't that kind of like, there's like a Mac app that like monitors traffic. Is that what it was open snoop something else? Yeah, no.

Starting point is 00:14:17 So open snoop, you're thinking of there's a Linux one, I think, right? Open snitch and little snitches. Snitch, snitch. that's yes. OpenSnoop tracks file opens. Okay. So you can run this and it hooks into the kernel via BPF and then anything that's running on your system that opens files, it'll just print out in your console right in front of you. Well, I did play with this one.

Starting point is 00:14:37 I did play with this OpenSnoop one, you're right, yeah. That's really cool because I remember I launched, you could launch particular things and then just watch all the crap that happens on your system. Right, and especially especially you know you can filter it you can have it look at specific processes or you could you know grab output or whatever so you can filter on a busy system but I think it's

Starting point is 00:14:53 especially useful on a system you think should be you know relatively quiet and just as a way to see like what is actually happening in the background because you don't have to like and for like short-lived processes that might be hard to see in a summary type program that are still doing things, it'll still print in OpenSnoop. That's where ExecSnoop does a similar thing, but for process opens.

Starting point is 00:15:14 So again, for things where it's like, it's hard to see in a general tool, but you really want to see it at a nitty gritty level, like what is happening and being exec'd on this box, ExecSnoop is handy for that. I have a sense Exec exec snoop has come up actually in real life for you as a handy tool. Do you wanna share a war story with us?

Starting point is 00:15:32 I do, so a few years ago, I was adminning a VPS box, I wasn't totally in charge of it, but had some administration over, and things were seemingly broken. I noticed some messed up statistics first, right? The metrics were a little odd and I went on and I started just kind of doing regular updates and poking around at the system and I noticed that I wasn't able to actually update the InteramFS, right? I was doing all the, it was an Ubuntu box, I had everything going but there was

Starting point is 00:16:02 some like dynamic library linking problem that was happening after I started digging into the output and tried rebuilding it way too many times. And I'm like, why? Why can't I get my init RAM if that's updated? Uh-huh. The kernel's here. Everything else in the update's fine. But you can't boot in that new kernel if you can't get that updated.

Starting point is 00:16:21 So this was maybe closer to when I really started playing with these tools a little bit more seriously and having them installed on more boxes. And so they were already available. And I used open snoop, because I wanted to see exactly what was being opened and touched by the update in the ramifest process. Right. And then that led me to look at some weird file paths. And I also started, then I did Execsnoop, and that showed similar file paths running on the system. Oh, uh-oh. And then I was able to figure out that there was, in fact, a crypto miner running on the box.

Starting point is 00:16:57 Oh no. But it had loaded a custom kernel module to hide itself from tools like PS and top and H top. That's pretty slick. But not from exec snoop. Clever. So that's why it was breaking when you were trying to update in NFS. Yeah, something it did to the system.

Starting point is 00:17:13 I never quite figured out exactly what it had changed, but it had messed with some of the files on the system in ways that meant that the linking was no longer cleanly happening. So that's a good way to stack those two tools. OpenSnoop and ExecSnoop are really a couple of nice tools you can stack together. And there's a whole bunch of the, this is part of BCC, the BPF compiler collection, which includes a whole bunch of other stuff,

Starting point is 00:17:38 but these are the tools that they ship by default, which leverages the framework they've built to implement these via eBPF. And so there's all kinds of stuff, file system-specific stuff. They've got things for looking at network connections, TCP connections. They've got stuff for monitoring databases.

Starting point is 00:17:54 It's broad. I don't know if it's necessarily best practice, but you could say that with these tools, you could be fairly confident that you had cleaned up the crypto miner and the infection, because you can actually watch at a much more you could be fairly confident that you had cleaned up the crypto miner and the infection You know because you can actually watch At a much more intimate level what's happening at the system definitely useful? Yeah, I'm not saying it's probably a best practice You know, but probably just wipe the box Yeah

Starting point is 00:18:15 But it is sort of nice if for some reason you don't have that option You can use these tools to kind of confirm that stuff isn't coming back after a reboot, right and the more you know depending on but you can see exactly where they're hooking and modify the, because a lot of this is implemented via a combo of C because that's the part that gets compiled and like, it's like a limited subset of C that does the BPF stuff that gets converted into a loadable thing for the kernel. But there's a bunch of Python utilities around it to wrap it. So you could fork that, copy it, modify it if you needed even more, or try to observe specific things once you had identified a particular problem or threat.

Starting point is 00:18:54 Now that's really going deep. One password dot com slash unplugged. That's the number one password dot com slash unplugged. All lowercase. And you're going to want to go there because this is something that if I had when I still worked in I.T. I think it would have sustained me for many, many more years. The reality is your end users don't. And I mean, without exception, work on only company owned devices, applications and services. Maybe you get lucky and they mostly do, but I don't find that to be the reality today.

Starting point is 00:19:28 And so the next question becomes, how do you actually keep your company's data safe when it's sitting on all of these unmanaged apps and devices? Well, that's where OnePassword has an answer. It's extended access management. OnePassword extended access management helps you secure every sign- on for every app on every device because it solves the problems that your traditional IAMs and MDM's just can't touch and

Starting point is 00:19:52 it's the first security solution that brings unmanaged devices and applications and identities like even vendor under your control It ensures that every credential is strong and protected. Every device is known and healthy, and every app is visible. This is some powerful stuff, and it's available for companies that use Okta or Microsoft Entra, and it's in beta for Google Workspace customers too. One, password changed the game for password management,

Starting point is 00:20:19 and now they're taking everything they've learned there and expanding it to the login level and the application level. You know what a difference it makes when people have proper password management. Now let's have proper login management and authorization. OnePassword also has regular third-party audits and the industry's largest bug bounty. They exceed the standards set by others. They really do. So go secure every app, every device, and every identity. Even the unmanaged ones.

Starting point is 00:20:46 You just go to onepassword.com slash unplugged. All over case. That's onepassword.com slash unplugged. Now Wes, it sounds like BCC is an abstraction to EBPF. What's going on under the hood here? Yeah, so there are tools that you can just run like we've been talking about, but how do those tools come to be? Well, that's where some of those abstractions and a framework comes in that allows basically

Starting point is 00:21:14 embedding C code directly within Python scripts. And then BCC's tools also sort of handle making sure you've got the whole tool chain available. So it uses LLVM and Clang to compile things, it handles verification of stuff, it handles loading it into the kernel for you and attaching it to the right hooks, basically through like Python method calls instead of you having to run the right commands in the back shell. So you kind of get like a unified approach where you can write kernel level programs without necessarily having to deal with like all the other stuff. You know, I was thinking, gosh, this really seems like going way beyond what my skill which where you can write kernel level programs without necessarily having to deal with all the other stuff.

Starting point is 00:21:45 You know, I was thinking, gosh, this really seems like going way beyond what my skill set would be able to manage, but then I thought, I mean, it's Python code. I bet an LLM would get me 90% of the way there these days. And then I could probably just finish it off. And there's a fair amount of existing tools that you can either feed into an LLM

Starting point is 00:22:03 and ask about or modify or try and hack on yourself. Yeah. And so here's a simple example that we can play with. It uses XDP. Okay. And so there's a simple C program that has a function called XDP drop all. Okay.

Starting point is 00:22:16 And that's gonna attach to the XDP hook. And so we get like a little data summary of the packet info which we're not gonna care about. All we're gonna do is return XDP drop which is a magic value that tells the kernel, hey, just drop this. Just drop all. So it's going to drop everything. And then in the Python, that's it C-wise. That's all we do. And then in the Python world, there's a little setup to make, hey, I want a new BPF thing or whatever. And then we tell it to load in our little blob of C. And then

Starting point is 00:22:42 we attach it to the interface we want. That's two lines of Python. And then it prints a nice little message and starts working. And what it should do is drop all network traffic? Yeah, so we specified a specific interface. So the... Yes, you say on this interface, just drop everything. And that's because we attached it to that specific interface. And this is like a quick kill switch.

Starting point is 00:23:02 So what we're going to do as a test here is I've set up a ping with an audible sound so we can see when we're getting a result. So we can see we're pinging the box right now. And then at some point, Wes is going to kill. There's actually like a case going to run that kill script and it's going to drop all traffic and we'll hear it drop off whenever you're ready. Mr. Payne.

Starting point is 00:23:20 Three two one. There it goes. Yeah, it takes almost no time at all. It happens almost immediately. Now, and of course, in my regular SSH session, I can't stop it, but I do have a sneaky console here, so we should be able to take it. Wes always has a sneaky console.

Starting point is 00:23:41 All right, I'm leaving the ping going, so if it resumes, we should hear it. All right, and hit and control C now. Okay if it resumes, we should hear it. And hit and control C now. Okay. Out there it goes. Look at that. Yeah. So like that wasn't that much work. This is just an Ubuntu 24.04 box. So like, you know, all the stuff you needed was in the repos already. I'm thinking, you know, like that's just a, it's not a total practical example, but it's a quick example of the power you have there, and you're executing that from user space. Yeah, you do need, you know, root permissions

Starting point is 00:24:11 to be able to load it into the kernel. But then you're executing things inside the kernel that are just immediately cutting off traffic to that interface. Yes, and I didn't have to build a per kernel module with the right headers and like have to worry about messing up my implementation in a way that's gonna like crash the box. You don't even have to create like a IP tables like that rejects traffic because that would

Starting point is 00:24:30 actually be much further down in this in the processing stack if you're using something like IP tables. Yeah. Yeah that's cool. That's fun. So in reality you'd want right you would do some sort of filtering where you would look at the data structure you get from XDP to figure out like, oh, is this one I want to block or let it keep going through processing, but yeah. This is your basic, it's not hello world, it's goodbye world. Yeah, goodbye world, exactly. But the main point to show that was that this BCC framework is one way

Starting point is 00:24:57 if you want to start developing custom tools. But you can get even more ad hoc than that. All right. Because there's BPF trace. Yeah, I want wanna talk about this. And this is basically sort of like the closest thing to D-Trace for Linux. Okay.

Starting point is 00:25:10 If you don't wanna use the Oracle tool. Okay. Which I believe Gentoo now has. Oh, all right. So BPF trace is sort of, what I know of D-Trace is like this ultimate tool when you're debugging or trying to figure out where your system has gone sideways. Again, like I was talking about,

Starting point is 00:25:23 like why is this one fuse directory taking so long to open? And so I imagine this is a similar type of... Yeah, one way to think about it is you basically get like a nice targeted little scripting language to write tracing programs that use the trace points in the kernel. Wow, okay, all right. And there are some other options.

Starting point is 00:25:43 There's a program called Ply that I haven't tried but looks also nice, but BPF Trace is quite popular. So we talked about, right, there's different, there's XDP, there's the K-P dash L or the BCC tools has TP list, which is also just lists all the trace points. It also shows you can do other user space stuff, but we're not going to talk about that today. So there is, so TP list, I guess that makes it a lot easier. If TP list is going to give you all of the kernel trace points so you know what you have to work with, that's really useful. Right.

Starting point is 00:26:19 And it tells you the shape of them too. You get the sort of data structure. Okay. Let me pull it up here. He's got it there on the machine. Yeah, so here's sort of like, here's some stuff about block dirty buffer and it tells you you get a dev device, you get a sector and a size and so you know those are the things you can work with and then it has a name and you just tell it that you want to trace that thing. So this came up because you

Starting point is 00:26:44 know we talked a little bit about BCC having file system specific tools. Like, oh, I want to see watch for slow operations from ButterFS, say, right? Well, they haven't yet. I suppose I should step up or someone should step up and add these, but there isn't yet a BcacheFS version of these tools, right?

Starting point is 00:27:00 There we go, that'd be cool. Obviously, probably what you want to do is just fork the existing ButterFS one and modify it and make it work right for BcacheFS. Sure. But, if you want to be ad hoc about it, BPF Trace can handle this too. So, I did TP list and then grepped that for things that said BcacheFS. So you got all of the kernel interfaces regarding trace points, as I should say, trace points

Starting point is 00:27:23 regarding BcacheFS. So here's just like a list of some of what the trace points as I should say trace points regarding be cache FS So here's just like a list of some of what the trace points look like I see okay So now okay, so what I'm seeing here on West of screen is like an output Be cache FS colon and I can get data update rebounds extend There's a lot of different options here that essentially just what is the file system up to and so these would all be things That Kent or team have put in explicitly so that there's a way to like Easily and with low overhead watch these things. So since these trace points already exist you can use BCC to write your own kind of monitoring. Yeah or BPF trace would write be it. Yeah. Thank you There's a lot of terms. There's a lot of BPF terms. So here's one that stood out

Starting point is 00:28:08 Copy GC wait and be cash FS is a copy-on-write file system and it does this bucket based allocation And so it has what's called a copy and garbage collector So as you're copying files, it'll handle moving things around so that it can then like get good bucket allocation and defragment things on the fly, that kind of thing. So copy GC wait is a trace point that tells you when your, when BCacheFS is waiting for copy GC to complete. So it can be a reason that your file system is being slow and not responding, especially if you're moving or copying files around.

Starting point is 00:28:41 So you can see on your screen just a little tiny script, sudo bpf trace dash e and then you pass it a little string and we tell it we want to use the trace point bcachefs copy gc wait and then we have a little script here that one of the inputs is the device and then it has a little stuff that extracts the major and minor number from that you don't have to do that and then all it does is print what device we're waiting on and how much we're waiting. Yeah.

Starting point is 00:29:09 And so it's, you know, I don't know, 10 lines of code. And you get a nice kind of structured output. It's easy to, human readable. It tells you the total flushes, the total buffers flushed. You get, you know, or the total output, whatever the stat is you're watching. And then so on my system, I'm just doing it on, you know, one RuteFest disk system, I'm just doing it on, you know,

Starting point is 00:29:25 one RueDefest disk. So I knew which disk it would be. I was just kind of interested to see. Like I went around and I DD'd some big files and copied them around and I could see like, oh yeah, right, the operation completes. And then immediately the print tells me that, oh, we waited that much.

Starting point is 00:29:38 And the practical use here is, you know, in a rate array, it might be useful to discover that one of your disk is significantly underperforming the others. It could be indicative of a larger problem, it could be indicative of why you're having performance issues. So like, it turns out that like this, there's actually probably a fair amount of situations where single trace points just on their own might be something you want to look at, right?

Starting point is 00:29:58 So other ones for BKachFS, write buffer flush, that's an important event, or journal writes. You might be wanting to know stuff about how the journal works. But you can also use the scripting language and the fact that eBPF supports basic data structures like maps and histograms to make more complicated combined programs. So I had an LLM, I fed it the output of that TP list stuff

Starting point is 00:30:22 that told it the available traceable. All the traces you had, yeah. And what the structures looked like, so you had and what the structures looked like So I write the right code That's clever and BPF trace makes it easy to have stuff that runs in an interval, right? So you can set up all the traces and then every five seconds It can print out a little summary that it's done

Starting point is 00:30:40 And so each time it traces it can update a variable and then it can calculate latencies or deltas And so each time it traces it can update a variable and then it can calculate latencies or deltas So I made this is a quick one that it did that looks at that right buffer flush trace point And it samples it over every five seconds and then it tells you how many flushes happened How many buffers were flushed how many were skipped how many were fast path? Flushes and then the total amount of data that was flushed. So just like a quick way to get an accurate little like every five second little printout. And what's great is like as far as I know,

Starting point is 00:31:09 this is really putting no measurable strain on the system. It's one of the more performing ways to do it, yeah. So you can really get deep insights without some of the overhead you sometimes get by that kind of monitoring. So then to kind of further stress just how far this whole having an AI help me out would go, I had an idea. I did ultimately tone it back a bit, but basically like what if I wanted to monitor

Starting point is 00:31:30 some of the mesh network traffic that was going on? There's a lot of options for that, but can this do this too? And so it built me a little script that you give it an interface name, like tail scale zero, say, right? And then it'll do a similar thing where every five seconds it'll just look at that interface and it'll count sent packets sent bytes received by packets received bytes and it'll do it send latency histogram for you on the sent packets yeah so you get this you actually I mean it's not a gooey but it's it's it's a bar graph on the command line yeah right and so the combination of the built-ins at eBPF and then the built-ins into BPF trace which has some of this stuff to do nice histogram display and stuff from their print command

Starting point is 00:32:13 It's you know, this is a little bit longer some of its kind of because there's a print statement for each thing We're printing it's like five different things. We're measuring right so it takes up some space on the screen, but like it's a single page code here. So it's not a, it's not crazy to start trying to understand it. Right. I've never even seen some of this stuff, but I could like the trace points that all make sense. They're all, it's just real plain English. Really. It's really easy. And then you have the print and the time interval. And so under the hood, it is tracing something called netdev start transmit. So then it filters, it gets data, it filters on interface name from that data, and then

Starting point is 00:32:51 it has a little counter it's keeping, so it increments the counter. And then it has a thread ID that you can use, and so it gets the current timestamp in nanoseconds and stores that in a map based on its thread ID. And then it also does a trace point for sent packets with netdev transmit, and then it grabs the start time if it exists for its thread ID, and then it can compute how long it took to send from that, and then that's where it can use the histogram stuff to print out a nice little thing now that it's tracking latency. And then it has another trace point to use net IF receive

Starting point is 00:33:27 SKB, which tracks incoming packets. And then it just has a little bit of code here to kind of tie it together every five seconds and do the printout. And then it clears all the counters so that it can do the next cycle. That's a neat little magic, like, super power, pocket of power that's in the Linux kernel there for this.

Starting point is 00:33:44 Now, it does mean, right, like, you can't use it power that's in the Linux kernel there for this. Now it does mean, right, like you can't use it if you don't have trace points you care about, so you have to start understanding some kernel internals and like what trace points might matter and what they mean. But if you have a specific problem or a mystery on your system that you're trying to look into and you're willing to try to chase down hypotheses using these tools, I think you could get pretty far. That's for sure. And then, you know, there are actual complete products out there that are dipping into this willing to try to chase down hypotheses using these tools, I think you could get pretty far.

Starting point is 00:34:05 Yeah, that's for sure. And then, you know, there are actual complete products out there that are dipping into this stuff. I was reading online that there's actually several Kubernetes products that are tapping into eBPF to do things under the hood. Also Falco, which is a real-time security monitoring application you can run on Linux, this is Falco uses eBPF to monitor system calls and network events directly in the kernel, enabling rapid detection of anomalous behavior with low latency.

Starting point is 00:34:33 This kernel-level approach is faster and less resource intensive than the user space monitoring tools. There's also user space tracing you can do. So if you set it up right, you can trace like JVM events or Ruby things or Python stuff. And so you can have, you have products now too that will use things like OpenTelemetry or other tracing standards. And you can have a trace then that can have both the kernel side via the EPPF stuff and the user space stuff without necessarily having to do as much custom observability implementation in the code base because they can kind of do more dynamic. And you might be able to then see stuff, right, where like,

Starting point is 00:35:09 oh, there was a problem on the kernel layer, and then that's why we started seeing increased latency in the application layer. And, but you don't know, you know, you can use that. There's a lot of open source products or open core things, and there's just a lot of sort of primitives and one layer above primitives that are probably already on your kernel. I wonder if it's possible people out there listening have already been using eBPF for

Starting point is 00:35:31 a while and this is all old news to them. Boost it and tell us, like have you used it and what applications and what utilities has it been for you? I mean we've gone through a few examples here but like Wes's story with the VPS and my story with tracking down like a background desktop application, there's just these also very practical day-to-day uses for this stuff. That's nothing more than just using

Starting point is 00:35:53 some of the pre-existing tools that you just run and tap into this, and you don't have to write any kind of scripting code. Yeah, I mean, it's been around for long enough now that it's well-packaged in most distros. Can you use eBPF to figure out why I have so many tabs open? Well maybe we can add a trace point to Firefox to keep track at least. jupiterbroadcasting.com slash river. River is the most trusted spot in the U.S. for individuals

Starting point is 00:36:20 and businesses to buy, send, receive Bitcoin. And they make it easy in three simple steps. Jupyterbroadcasting.com slash river. Hey, I just got set up with them and it was in fact very easy. I think their best low key feature actually has to do with cash, just as an aside. So Bitcoin is dipping as we record and if, oh man, this is, this was the moment. So they have a 3.8% interest cash savings account and they pay the interest out in sats so you put your

Starting point is 00:36:50 cash in there and it's FDIC insured 3.8% interest and then when the when the price dips on Bitcoin you can use that balance to smash buy and so you can you can essentially DCA with that 3.8% interest, and then you can smash buy when the price dips. It's just brilliant. And River is a really great company. I had a call with their community management person and was really impressed with what they said. And if you're in Canada

Starting point is 00:37:17 and you're looking for another trusted source to get Sats, bitcoinwell.com slash Jupiter, there you go. That's her tip. Well since this week we are hoarding our boost for next week's episode we decided to do a little dive into the old mailbag. Zach sent us a little note here. I'm behind on listening to episodes but the NixOS episode from last month, Linux Unplugged 5.9.8, was really interesting. I wanted to throw out some thoughts responding to a comment that was sent in about wanting a NixOS-like system, but being able to use the old traditional system admin tools.

Starting point is 00:37:58 For context, I've tried out NixOS, but quickly got to the point where I would need to dive in and figure out those flakes. At that point, it just became one more thing on that list to do. Since then, I've gotten into bootable containers. I've specifically been using the Universal Blue ecosystem with Bootsy, but have wanted to try out Vanilla OS and Common Arch that also use OCI images for delivery and customization. Listening to that NixOS episode, I felt like many of those talking points that were given in favor of NixOS

Starting point is 00:38:29 were also very in line with the benefits that I've seen in using OCI images to build out customized OS images. I don't use any of the headliner distros like Aurora or Bluefin, although I have tried a few, and overall they were quite well put together. But I've taken their more low level image builds and layered my own customizations on top of them.

Starting point is 00:38:52 It's been a fantastic way to manage everything as when I need a new server or desktop, all I have to do is install my image and everything's ready to go. I will add the caveat that there may be some learning if you haven't already been familiar with building OCI containers, but I live in a world where professionally and for hobbies, that's what I'm doing.

Starting point is 00:39:13 So that wasn't a huge burden for me to learn. Thanks for the show. Really have been enjoying it and wanted to send this little message in just in case someone else might find some use. Well, cool. Thank you for sharing. We love success stories like this. I wonder if you have any of it on GitHub or anything too, if people are curious. Good question. I really feel like this nails something that we've been feeling and chatting a lot about behind the scenes is the really, really brilliant thing

Starting point is 00:39:42 that the Universal Blue folks have tapped into is this existing knowledge set around how OCI images work and how you can layer things and really tapped into the whole DevOps workflow that people live every day and now you can use it that skill set that you've learned you can use it to customize your Linux desktop. I mean that is just really powerful. It's a different approach than Nix, right? Whereas Nix, you're building it from the ground up. You would use Nix to maybe generate those OCI images

Starting point is 00:40:11 or something like that, but both, like he says, are essentially accomplishing similar goals. Yeah, and a lot of the immutability side of things and the image-based approach, right? Like starting to think of your system as a cohesive whole that you deliver in those coherent units rather than ad hoc imperative updates.

Starting point is 00:40:31 Now, admittedly, I don't use them as much, but Brent, when I think of like an image-based system, I don't necessarily think of flexibility. Yeah, I was kind of wondering about this. I've had a few people tell me, oh yeah, images is totally the way to go. Of course, we've been playing with Nix and NixOS at the same time, and the thing I keep tripping on

Starting point is 00:40:49 with images is you build them once and then you use them many times. But the way, Chris, I think you and I have been using at least NixOS on the desktop is your Nix config is like constantly evolving. Therefore, the next time you go to install it somewhere else, it's always that new updated copy. And I would imagine one of the downsides

Starting point is 00:41:11 with building an image is, well, you have that intermediate step. You have to go ahead and build your image every time you want that newest, freshest version. Am I understanding that correctly? Is that a proper downside here? I know some of the silver blue types, right? They're still using a lot of RPMOS tree,

Starting point is 00:41:26 which has ways to apply things live. So it might depend a little bit on how exactly what you're putting in those images, or maybe you have like a dynamic switch image thing, but yeah, I do wonder about that. And then you also gotta remember, it depends on what you're doing. You know, it might just be maybe a lot of your applications

Starting point is 00:41:39 are flat packs and you're installing those and updating those separately anyways. Or you're using, you know, various containers or other things for your dynamic environments. I'll take Liam's email here. He says, good day gents. In the most recent LUP, you mentioned not having received much feedback about multi-monitor setup. So he sent us a picture of his. I've got a laptop screen at 1080 at 144 hertz. That's my primary display, you know, for things like Skype. Then I have two externals, a 32-inch 2K. I do think that's the sweet spot, he says, at 60 hertz.

Starting point is 00:42:07 And then my rightmost monitor is in portrait mode. That's how I, that's where I can see more lines of code. All of this works without issue on XFCE. In case you do navigate to the image, yeah, that's a treadmill desk. I'm on my second one since the start of the pandemic. This is a nice setup. Listeners since Side Byte, Lunduk,

Starting point is 00:42:24 and last days in early tech snap. Members since 2022, annual members since December, on my second one since the start of the pandemic. This is a nice setup. Listeners since Side Byte, Lunduk, and last days in early tech snap. Member since 2022, annual member since December. That's awesome. Thank you very much, Liam. And I like your setup. I am, as you boys know, big fan of the vertical monitor for our show docs

Starting point is 00:42:40 or, you know, like reviewing a config file or just having a terminal up there. Journal D. Yep. The vertical monitor is a productivity hack if you can afford the luxury of... Because it's not great, but one of the other things I will use the vertical monitor for is I stack two chats. Like if I have two different work chats or something, I'll stack them on the vertical monitor. Works really great.

Starting point is 00:43:03 Right, yeah. It doesn't fit every activity, so it's sort of a less generally useful screen at times. Yeah, yeah, it is more limited, so it's more of a luxury. I do have to admit that. I have a bit of a question here for us. Did we ever divulge what our current setups are? Chris, I know you've gone into details,

Starting point is 00:43:20 but I don't think Wes and I have actually shared that. Wes, I bet you kind of move around depending on, because you mostly use laptops. I do a lot of laptop. I've got a dual monitor set up in the sort of main officey bit and then I have like just a single, where I do like laptop with one screen in my living room sort of desk.

Starting point is 00:43:38 You got a monitor at the living room, that's nice. That's for the like casual work where I want to do stuff but with the TV on. Yeah. Okay, Brent, what's your setup? Well, I don't actually do dual monitor anymore. I am now doing the tri-monitor thing. Chris, I know you're like a quad monitor guy,

Starting point is 00:43:56 but I did steal your little tip. You've been trying to get me to use this for years. I have same as Liam here on the right hand side, a vertical monitor. It's just an old monitor I've had forever. It doesn't matter though, because it's just chats or like currently I have our episode recorder there. I have the window on which we're connected together. And then I also have the JB chat, even though we're not doing it live for some reason, it just makes me feel better and it's perfect for that and so I've got three monitors running usually for me off a laptop and got this big massive 20 well I say massive

Starting point is 00:44:33 although Liam beat me on this one but it's this 27 inch like 4k Dell monitor right as my main laptop as my main monitor in front of me and that's been a pretty sweet setup. I'm on to him. Yes this question because he wanted Exactly, and I'm here. Yeah, I love it You know, I've been I know I mentioned this recently on the show But I have I definitely have always been a multi-monitor person for many many many years now, but I have been enjoying the single extra extra ridiculously widescreen, it's actually a curved wide screen at home. And I use GNOME there as well.

Starting point is 00:45:08 And I really like, I mean, you can lay out three full windows side by side on the screen. You just really, you get a lot of room. And when you start getting that kind of room, you don't really need a second screen as much. And I will say it is a much easier way to do desktop Linux. I do think that'll be in one of my future monitor purchases for sure.

Starting point is 00:45:27 If you go with like an Intel or an AMD GPU right now, maybe one day in video, but Intel and AMD and a single monitor, I promise you your life will be easier than if you try to be a quad monitor guy. It's not an easy path. But what if I want to chain display port together? Yeah. Okay. Yeah. Oh, man. And like, I don't know what my deal is, because I've only been using computers for like 30 plus years, but like I still have major strugs trying to make sure that the correct monitor

Starting point is 00:45:56 is the default like bio screen monitor. Oh gosh. And I don't know, maybe this isn't how it works, but hand to the mixer if it doesn't change. I get it just set up right, and then like a couple of days go by, and the next time I turn the machine on, like the different monitor's lighting up.

Starting point is 00:46:15 And I've gone through and I've thought I did the proper ordering, you know, in the plugs, and it changes, I swear that happens. I don't know, maybe I'm making it up. You need to like use UU you IDs for your monitor setup. Yeah. Well, it's at the BIOS level. Like, you know, props to plasma. By the time I get to plasma, it's always fine. And I've been actually having really good success with locking my screens, they go to sleep, they wake back up, everything's fine, all the orientations are correct.

Starting point is 00:46:42 And when it boots, it always nails it. So Plasma has been just doing great with my multi-monitor setup. But my BIOS, right? Or like when the system's booting, the console where you're just seeing the output, like that is a moving target for some reason. And I don't think that's technically how this works, but maybe it's me moving monitors around, I don't know.

Starting point is 00:47:04 You could take the next year off and just rewrite your bios. Oh, I thought maybe I'd do a study where I just empirically write every move I make down, okay, this one's in this plug, boot. Yeah, you're right, that's step one. I think maybe you should study the cameras that you have in the studio, because every time I show up,

Starting point is 00:47:21 I tend to shuffle all your cables around for your monitors. That is true, That is true. Okay, well, the pick this week has to be something that gets you up and running with BPF right away, right? Like, we don't want to sit here and talk about EPF this whole episode and not give you a great tool to check out. And this week, it's Network Top. It helps you monitor traffic from your network using BPF. Yeah, we should be clear. This one is classic BPF, but it's Network Top. It helps you monitor traffic from your network using BPF. Yeah, we should be clear, this one is classic BPF, but it's still handy, and that means it is broadly useful, too. Yeah.

Starting point is 00:47:53 It's built in Rust. Oh, it is? Yeah, so it's super performant, it's easy to use, and it means you get to use, like, the TCP dump-style syntax in a nice little 2B. And so I sent you an example. What do you think of it? Well, first of all, it's very readable, right?

Starting point is 00:48:10 Because it is, like you said, you're getting charts in a terminal UI. And I mean, it's immediately understandable. I'm trying to figure out what you're doing here because I see ginormous, like you have almost no traffic, and then it jumps up to almost 26 point three megabytes a second and it hovers there for about 40 seconds and then it drops down to absolutely nothing

Starting point is 00:48:34 Yeah, so you'll you see how there's okay, there's an input section at the top. Mm-hmm, and then under that there's rules Yeah, you'll see what it says all on the left and then to the right it says host and then a specific IP So you're you're looking at all traffic from just this host. Yes You'll see it says all on the left, and then to the right it says host, and then a specific IP address. So you're looking at all traffic from just this host? Yes. And so those are two separate rules. So basically it's looking at one interface and just watching it totally by default,

Starting point is 00:48:55 and then you can add BPF rules that it'll add as additional things that it'll monitor at the same time, and then you can use the arrow keys to toggle between the views for different rules. Okay. And so in this case, there was no traffic time and then you can use the arrow keys to toggle between the views for different rules. And so in this case, there was no traffic because I sshed to another machine here at the studio. We hadn't really been talking, right? I exchanged a few packets for that,

Starting point is 00:49:14 but that was all. And then I did a net cap back to my laptop, just reading from random. And so that was the big traffic spike. And then I killed it just to watch it all drop off. But you know, like you can do, you can do like specific ports that you want to watch, you can do specific IPs, host names, TCP states, like whatever you can do with TCP dump, basically you can put those rules in here. And you're getting a command line to a terminal user interface. So it's really easy to understand like these rules Wes are talking about are clearly delineated in the UI it's really simple syntax so maybe there's something you're trying to hunt for you find it here and then you can go do the TCP dump to capture the traffic and do some more analysis and it

Starting point is 00:49:54 is MIT licensed so it is free to use and then there is a bonus pick BPF tune yeah well you kind of hinted at this a bit and I happen to know you turned it on. I did. For at least one of your systems. I have it running on my home workstation. Despite the fact that this is indeed a GPL 2.0 with Linux's call note licensed Oracle open source project. Yeah, BPF Tune aims to provide lightweight always on autouning of system behavior. The key benefits it provides is by using BPF observability features, it continuously monitors and adjusts system

Starting point is 00:50:31 behavior. Because we can observe the system behavior at a fine grain, we can then tune at a finer grain too. So like individual socket policies, individual device policies. Yeah, right? So think of all those things like C-T-L can tweak. It's using BPF hooks to watch your system

Starting point is 00:50:49 and then automatically go tweak those. Yeah, now I haven't really noticed a difference but I've only had it on for a couple of days but the idea is brilliant. It's just so brilliant, like, here's the system, I'm monitoring, okay, now I'll just go make adjustments here to make it essentially maybe like ease pressure on the system or something like that

Starting point is 00:51:05 I don't know if anybody has experience with this because I just started but it seems like a fantastic idea BPF tune Chris love a link to these in the show notes and In Nick's it's like just one. It's like, you know service enable BPF auto tune and then you just you got you good There's more you can do, but it's really simple. So I was like, all right, I'm just gonna turn this on. And I just love the concept of the system auto-tuning itself. Yeah, and they kind of make the case here, too, you know, if you're doing the cattle-not-pets approach

Starting point is 00:51:36 and like, you know, how many of your systems ever really even have a human person who might be on it who could tune it. Okay, maybe you can hand-tune the database, but you're not gonna hand- hand tune all your dynamic web workers. This could be great for my Odroid. But, you know, if some of them are longer lived, this game can make sure they're running all right.

Starting point is 00:51:52 I should put this on my Odroid. I really should, little Odroid's doing a lot of work, and I never really check in on it. Just sitting there being a little yeoman about it. Well, we will have links to this and everything else. There's a lot of links this week. Linuxunplugged.com slash six oh five. And remember, we want to know if you enjoy these deep dives.

Starting point is 00:52:11 We can get into the weeds here a bit too much, but if this is the kind of stuff you like, let us know. Because there's, it's honestly for us as creators, it's a little, creators, it's a little scary to do topics like this. I know that sounds stupid, but it is. It's just a little scary. So we always like to know what your thoughts are.

Starting point is 00:52:27 We'll be back at our regular live time, Tuesdays as in Sunday, Sunday at 12 p.m. Pacific, 3 p.m. Eastern. See you next week. Same bat time, same bat station. Now, if you want more show, remember our members get the full bootleg, which is clocking in at like an hour and 20

Starting point is 00:52:46 minutes or something right now. And this is a short one. This is a short one for them. And of course, you can get details of that at LinuxUnplugged.com. Thank you so much for joining us on this week's episode of the Unplugged program. We just really appreciate your time for listening. If you want to share it with somebody, we always like that too. Word of mouth is the best advertising for a podcast.

Starting point is 00:53:04 We always appreciate that. Thank you so much for being here and we'll see you right back here next Tuesday, as in Sunday, which isn't Tuesday at all. Thanks for watching!

Pet Camera - EBO Air 2

LINUX Unplugged - 605: Goodbye World

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Pet Camera - EBO Air 2

LINUX Unplugged - 605: Goodbye World

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.