LINUX Unplugged - 605: Goodbye World
Episode Date: March 9, 2025We are digging into a superpower inside your Linux Kernel. How eBPF works, and how anyone can take advantage of it.Sponsored By:Tailscale: Tailscale is a programmable networking software that is priva...te and secure by default - get it free on up to 100 devices! 1Password Extended Access Management: 1Password Extended Access Management is a device trust solution for companies with Okta, and they ensure that if a device isn't trusted and secure, it can't log into your cloud apps. River: River is the most trusted place in the U.S. for individuals and businesses to buy, sell, send, and receive Bitcoin. Support LINUX UnpluggedLinks:💥 Gets Sats Quick and Easy with Strike📻 LINUX Unplugged on Fountain.FMeBPF Perf Tools 2019 | SCaLE 17xSCALE2019_eBPF_Perf_Tools.pdfOracle Releases DTrace 2.0.0-1.14 For Linux SystemsGentoo Linux Touts DTrace 2.0 Supportbpftrace (DTrace 2.0) for Linux 2018Full-system dynamic tracing on Linux using eBPF and bpftraceA thorough introduction to eBPFBCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more — The BPF Compiler Collection: BCC or BPF Compiler Collection is a set of tools leveraging eBPF for kernel tracing, Linux IO analysis, networking, monitoring, and more.BCC Tutorialxdp-tools: Utilities and example programs for use with XDPply: a dynamic tracer for LinuxLiam's Multi-Monitor SetupPick: netop — Network Top -- Help you monitor network traffic with bpfPick: bpftune — bpftune aims to provide lightweight, always-on auto-tuning of system behaviour.
Transcript
Discussion (0)
Enhanced BPF is another hotness of Linux for the last couple of years.
And when the patches were first added to Linux,
the lead developer, Alexey Stratoyov, said,
this allows you to do crazy things.
This is normally the words you don't tell Linus
when you want him to accept patches into the kernel,
but fortunately the patches were accepted.
Enhanced BPF puts a virtual machine in the kernel
that we can program from user space.
Hello friends and welcome back to your weekly Linux talk show. My name is Chris. My name is
Wes. And my name is Brent. Well, hello, gentlemen.
Today, we're digging into a superpower
that is inside all of our Linux kernels.
We're gonna talk about how eBPF works
and how anyone can take advantage of it.
Then we're gonna round out the show
with some great feedback, some picks, and a lot more.
It is a special out of time episode.
As you listen to this right now,
we are at Planet Nix and scale.
So this is an episode in between.
You're going to get all our Planet Nix
and scale coverage coming soon,
but we wanted to take a moment in between episodes
and do something kind of fun
and really dig in and get technical.
But first I want to say a big good morning
to our friends at tail scale,
tailscale.com
Unplugged they are the easiest way to connect your devices and services to each other wherever they are. It is modern networking It's a flat mesh network that is protected by a while ago
That's right
And it is so fast so quick to get going and it gives you superpowers
Not only do you get a flat mesh network across complex networks, so maybe you
have multiple data centers or VPSs, you've got mobile devices or you've got
double carrier grade NAT, it'll smooth all of that out.
That's fantastic.
But then there's also a whole suite of tools that make it really convenient to
use sort of like airdrop for your entire tail net, including like your Android
device and your Linux devices.
You can send files around.
They will manage your SSH keys through your tail net for you.
So you can just log into all your individual devices. You don't have to manually copy keys
around like an animal. And they also offer more advanced features so you can set up ACLs
and really manage the system and lock down only certain things to certain people. And
when you try it now, when you go to tailscale.com slash unplugged, you get it for free up to
a hundred devices and three users. No credit.com slash unplugged, you get it for free up to 100 devices
and three users, no credit card required.
I mean, you can really cook with 100 devices
and then maybe you'll discover it's great to bring to work
to thousands of companies like Instacart, Hugging Face,
Duolingo, Jupiter Broadcasting, and many others
use tail scale and we love it.
Try one for yourself.
Go get yourself a little tail scale right now.
You're gonna love the way it tastes and you going to love how easy it is to get going.
If you've got five minutes, you'll probably get it running on three devices.
I have no inbound ports on any of my firewalls.
Tailscale dot com slash unplugged.
Well, like I mentioned, we are at Planet Knicks and scale right now.
But we did think it was sort of a perfect out of time episode because we really
first got excited about eBPF at scale back in 2019.
Yeah.
A million years ago.
And it was a great presentation that really just brought home to us how
powerful this was going to be.
Yeah, you heard from the man himself, Brendan Gregg,
observability, performance, tracing, guru.
EBPF was still kind of new then.
He's kind of well known.
You've maybe seen his famous video online using D-Trace
to show how hard drives don't like it when you yell at them.
Yep. So, you know, has deep insight into this area and was even in 2019 and earlier
was already getting excited about eBPF.
So it's kind of neat to look back now
as like a whole giant marketplace
of eBPF based observability tools now exist.
And the name is sort of a misnomer, right?
Because it sounds like a packet filter.
And so you think, well, how, okay, what,
is this a firewall guys?
No, no, no, that's why we wanted to play that intro clip.
It is so much more.
It is really a VM inside the kernel
that can run simple code that you can create
and craft that is protected.
And so since this is a prerecord,
we're not gonna have your boost this week,
but we do wanna know if you like this type of deep dive
into this particular topic.
So boost those in and we'll bank them from when we come back.
But let's talk about the Extended Berkeley Packet Filter.
Yeah, so it does do some networking stuff still to this day, but it did, as you say,
start out as a packet filter introduced in 1992 to efficiently filter network packets
in BSD operating systems.
MPFSense and other firewall products like Open Sense,
they've been using BPF as part of their core product.
That's one of the reasons I was an early PF Sense user.
And before too long, within the same decade,
it made its way over to Linux in the form of TCP dump.
And already you're seeing this thing, right,
where you can kind of use user space
to help better observe what's going on on your system.
And so if you've ever written sort of an expression
to filter things or look at packets using TCP dump,
while you're using a language that then gets compiled down
to a BPF bytecode and executed.
That bytecode right there is kind of the magic, right?
Because it turns out that this thing is essentially
capable of running this bytecode.
So it's not just a packet filter.
Yeah, and that's like how the implementation works.
As it started out, it was a very simple virtual machine,
and don't think virtual machine like QEMU necessarily
or like a full, simulating a full computer.
The point is it's like a very limited, restricted bytecode
that can only do certain things relevant to, at first,
filtering packets, and that lets you make sure that like,
it's not gonna do anything crazy,
it can't go into infinite loops,
all kinds of other nice things and optimize it, And then be able to load it in and have, you
know, you run the program, it supplies the packet and anything else that you need as
input to it. And then the machine executes it. Ultimately, that's how you tell like,
do I accept the packet or do I drop the packet? But at first it was a, you know was a very limited, I think I had like two registers to use,
super limited thing,
but eBPF, extended BPF, was introduced in Linux 3.18.
This was like 2014.
So BPF had been around for a while,
there'd been various developments,
but eBPF really kicked things off
in the Linux side of things.
BPF hadn't caught up with the times in some ways,
it was still 32-bit.
It had some of those register limitations.
So they upgraded to 64-bit registers,
added more instructions.
They added the verifier to the kernel,
which is a big part of it that lets you analyze BPF programs
to make sure they're safe, no infinite loops.
They don't do invalid memory access.
There's a checker process.
Right, because it's like you're loading something
from user space into the kernel. That's a big security concern. So, because it's like you're loading something from user space into the kernel.
That's a big security concern.
So you want to make sure that you have,
and just beyond security, operationally too, right?
You don't want it to be able to crash the kernel.
So we actually started to see this stuff land,
you know, progressively in Linux 3.18 and beyond.
So this is actually, like you said, 2014,
that's a, this has been landing for a while.
Yeah, definitely.
And then, so that was kind of like the raw stuff and then 2018 and beyond people started adding more tools
Like the BCC, but the BPF compiler collection
BPF trace which we'll talk about to the company psyllium and their their products have a bunch of like eBPF stuff for
kubernetes offerings
Plus we got better better compiler support.
There's something called co-re or compile once,
run everywhere.
So compilers have better support to be able to make,
you know, you can compile your BPF program
and not have to worry about as much necessarily,
depending on what you're doing with it,
about how compatible it'll be with the kernel,
where you compiled it versus where you're running it.
So the idea of being like,
it should be compatible
across multiple versions of Linux,
as long as they have this correct implementation.
Right.
Okay.
This has been so successful that
newer versions of D-Trace on Linux,
they're basically just some extra user space stuff
that uses eBPF and other kernel primitives under the hood.
Really?
Yeah. And eBPF and other kernel primitives under the hood. Really? Yeah.
Huh.
And eBPF was also ported to Windows.
Yeah, I heard about this.
They're getting a lot of good stuff over there.
One important thing with the extended part is they were also able to make the instruction
set be more sympathetic to modern hardware, and they implemented just-in-time compilation
as well.
So that meant eBPF can be really fast.
They've also made it so that there's relatively stable APIs
we've kind of been talking about,
so that as kernels, internals change,
you can use eBPF to hook into internals.
If you do that, there's no guarantees, right?
That's kind of on the tin.
But there is some stable interfaces
that you can use, which is nice.
Hmm.
So the other important point to know is there's kind of various things it can do.
There's XDP, which is Express Data Path, which intercepts packets basically at like the earliest
point that you can.
You have limitations on what you can do, but it can be super low level and performant.
And so you can see this sometimes maybe unlike responding to DDoS attacks where you're getting
flooded with traffic that you have, you know, maybe you can identify various ways that you can see this sometimes maybe unlike responding to DDoS attacks where you're getting flooded with traffic
that you have, maybe you can identify
various ways that you can program in here.
So you can essentially, ideally,
catch it before it begins to overwhelm the system.
Because you're catching it much further
in the driver stack.
Yeah, that's the idea.
And it doesn't do as much as the very general
and powerful, but full-featured Linux kernel
networking stack, you can kind of just be like,
well no, if it looks like this at all, cut it off. Yeah, if I'm getting D dust
I don't want the network stack trying to figure out what to do with all this traffic because that's what's gonna take me down
Yeah, one way to say is limited context, but maximum performance. I
like that
Okay, so then you can also do various types of k probes. I'm sorry. Yeah k probes
And this is dynamic instrumentation. That's why it sounds quite probing that you can hook almost any kernel function. Mmm
But you don't there's no like clear definition of what you're gonna get
You have to go look at the function you're hooking into it's all gonna be dependent on that
But no kernel internals any kernel function almost I'm sure there are some limitations. That's pretty a lot of them
Yeah, I mean that could be like from, you know, keyboard input to network traffic to disk I.O.
I mean, that's there's all kinds of things there. So that's where a lot of some of the power comes
from. Yeah. But that may or may not be stable. There's no guarantee about it being stable across
kernel versions can change at any time. People can update the signatures. That's one of the things
that's been happening in our in the rest discussion is the kernel. Many developers expect to be able to make a change like that
and because they have one big code base,
they can update it everywhere
and be able to do a refactoring like that.
And, but the other one, trace points.
This is an important one.
Trace points, predefined stable instrumentation.
Low overhead, but only available
where explicitly added by kernel developers.
So, OK, so a trace point would be like a way you put the spot you hook in and
start getting metrics or information out of.
Exactly. But the developer of the subsystem has to implicitly support that.
Yeah, you have to add that in versus totally dynamic with a CapeRope.
OK. But the upside is you get a structured data specific to each trace point.
Right. So you basically just get to work.
Tells you what it is. Yeah, yeah, okay.
And they're maintained across kernel versions,
so you can kind of rely on them for more long-term use.
I have a feeling this might be relevant later.
Okay, good to know.
Yeah, I think that's probably like the quick,
high-level version of what EBPF is
and kind of what it can do.
And it is so, so, so simple,
but yet so powerful,
is what I love about it.
And we have a couple of examples in the show,
and I think some hands-on stuff that people could take away.
And maybe we should start with BCC tools themselves.
Yeah, because we've both had a chance to play with those.
Yeah.
I was, you know, I was joking around with Wes,
and be like, it'd be kind of great
to know where in the system,
when I open up this directory that's a fuse mounted directory,
like where is the actual delay happening?
And what part of the system is sitting around waiting?
And through this process of trying to get to that,
I came across tools that let me look at my disk I.O. analysis
or let me look at the network traffic
and really kind of using these different tools, putting them together.
You can actually start to get a really good picture of where the delay was happening in the system.
And, you know, for me, it was like, this is going to be amazing.
I had I had discovered by running it on my desktop just as it when I was experimenting with this stuff that, oh, yeah, I've still got this errands app that we talked about on the show that I'm not currently using like today
But I guess when I close it, it doesn't actually close. It doesn't leave an icon
I had no idea was running but using these
Tools I started seeing this this this application that was hitting my disk every so often
I'm like, I don't recognize this and I discovered I actually had that running the entire time and it was kind of useful
Yeah And I discovered I actually had that running the entire time. And it was kind of useful. Yeah, it kind of skips through a lot of the boundaries and other limitations
that can sometimes pop up when you're trying to look at your system.
So it can be surprisingly insightful.
They do have a nice tutorial which we'll link.
They also, I like that they have a set of things to run first.
Like before you go to these tools,
make sure you check all the regular system monitoring tools first
because we'll see as a theme,
like your H-tops and B-tops and all kinds of things,
kind of get a broad look.
And you could see some specifics.
Whereas you can make broad eBPF programs.
But by default, they're going to be a lot more specific when
you're looking at one thing like disk latency or something
specific to the file system or networking.
Yeah, yeah, that's a good point.
All right, so we could talk about
some of the commands we tried.
I know that file top and TCP top we played around with.
I did not get a chance to play around with ext4 slower
and XFS slower, but this is an interesting way
you can use it too.
Uh-huh, yeah, it'll tell you when there's things happening
in your file system that are causing latency
more than a programmable amount
you can pass on the command line.
Or they've got ZFS dist, which traces ZFS reads,
writes, opens, fsyncs, and then summarizes the latency
of all that in histogram for you.
That's really cool.
Yeah, right?
And then two really useful basic ones are
exec snoop and open snoop. Yeah, open snoop. Isn't that kind of like, there's like a Mac
app that like monitors traffic. Is that what it was open snoop something else? Yeah, no.
So open snoop, you're thinking of there's a Linux one, I think, right? Open snitch and
little snitches. Snitch, snitch. that's yes. OpenSnoop tracks file opens.
Okay.
So you can run this and it hooks into the kernel via BPF
and then anything that's running on your system
that opens files, it'll just print out
in your console right in front of you.
Well, I did play with this one.
I did play with this OpenSnoop one, you're right, yeah.
That's really cool because I remember I launched,
you could launch particular things
and then just watch all the crap
that happens on your system.
Right, and especially especially you know you
can filter it you can have it look at specific processes or you could you know
grab output or whatever so you can filter on a busy system but I think it's
especially useful on a system you think should be you know relatively quiet and
just as a way to see like what is actually happening in the background
because you don't have to like and for like short-lived processes that might be
hard to see in a summary type program
that are still doing things,
it'll still print in OpenSnoop.
That's where ExecSnoop does a similar thing,
but for process opens.
So again, for things where it's like,
it's hard to see in a general tool,
but you really want to see it at a nitty gritty level,
like what is happening and being exec'd on this box,
ExecSnoop is handy for that.
I have a sense Exec exec snoop has come up
actually in real life for you as a handy tool.
Do you wanna share a war story with us?
I do, so a few years ago, I was adminning a VPS box,
I wasn't totally in charge of it,
but had some administration over,
and things were seemingly broken. I noticed some messed
up statistics first, right? The metrics were a little odd and I went on and I
started just kind of doing regular updates and poking around at the system
and I noticed that I wasn't able to actually update the InteramFS, right? I
was doing all the, it was an Ubuntu box, I had everything going but there was
some like dynamic library linking problem that was happening
after I started digging into the output and tried rebuilding it way too many times.
And I'm like, why?
Why can't I get my init RAM if that's updated?
Uh-huh.
The kernel's here.
Everything else in the update's fine.
But you can't boot in that new kernel if you can't get that updated.
So this was maybe closer to when I really started playing with these tools
a little bit more seriously and having them installed on more boxes. And so they were
already available. And I used open snoop, because I wanted to see exactly what was being opened and
touched by the update in the ramifest process. Right. And then that led me to look at some weird file paths. And I also started, then I did Execsnoop,
and that showed similar file paths running on the system.
Oh, uh-oh.
And then I was able to figure out that there was, in fact,
a crypto miner running on the box.
Oh no.
But it had loaded a custom kernel module
to hide itself from tools like PS and top and H top.
That's pretty slick.
But not from exec snoop.
Clever.
So that's why it was breaking when you were trying to update in NFS.
Yeah, something it did to the system.
I never quite figured out exactly what it had changed, but it had messed with some of
the files on the system in ways that meant that the linking was no longer cleanly happening.
So that's a good way to stack those two tools.
OpenSnoop and ExecSnoop are really a couple of nice tools
you can stack together.
And there's a whole bunch of the,
this is part of BCC, the BPF compiler collection,
which includes a whole bunch of other stuff,
but these are the tools that they ship by default,
which leverages the framework they've built
to implement these via eBPF.
And so there's all kinds of stuff, file
system-specific stuff.
They've got things for looking at network connections, TCP
connections.
They've got stuff for monitoring databases.
It's broad.
I don't know if it's necessarily best practice, but
you could say that with these tools, you could be fairly
confident that you had cleaned up the crypto miner and the
infection, because you can actually watch at a much more you could be fairly confident that you had cleaned up the crypto miner and the infection
You know because you can actually watch
At a much more intimate level what's happening at the system definitely useful? Yeah, I'm not saying it's probably a best practice You know, but probably just wipe the box
Yeah
But it is sort of nice if for some reason you don't have that option
You can use these tools to kind of confirm that stuff isn't coming back after a reboot, right and the more you know
depending on but you can see exactly where they're hooking
and modify the, because a lot of this is implemented via a combo of C because that's the part that
gets compiled and like, it's like a limited subset of C that does the BPF stuff that gets
converted into a loadable thing for the kernel. But there's a bunch of Python utilities around
it to wrap it. So you could fork that, copy it, modify it if you needed even more, or try to observe
specific things once you had identified a particular problem or threat.
Now that's really going deep.
One password dot com slash unplugged.
That's the number one password dot com slash unplugged.
All lowercase. And you're going to want to go there because this is something that if I had when I still worked in I.T.
I think it would have sustained me for many, many more years.
The reality is your end users don't.
And I mean, without exception, work on only company owned devices, applications and services.
Maybe you get lucky and they mostly do, but I don't find that to be the reality today.
And so the next question becomes,
how do you actually keep your company's data safe
when it's sitting on all of these unmanaged apps and devices?
Well, that's where OnePassword has an answer.
It's extended access management.
OnePassword extended access management
helps you secure every sign- on for every app on every device
because it solves the problems that your traditional IAMs and MDM's just can't touch and
it's the first security solution that brings unmanaged devices and
applications and identities like even vendor under your control
It ensures that every credential is strong and protected. Every device is known and healthy,
and every app is visible.
This is some powerful stuff,
and it's available for companies that use Okta or Microsoft Entra,
and it's in beta for Google Workspace customers too.
One, password changed the game for password management,
and now they're taking everything they've learned there
and expanding it to the login level and the application level.
You know what a difference it makes when people have proper password management.
Now let's have proper login management and authorization.
OnePassword also has regular third-party audits and the industry's largest bug bounty.
They exceed the standards set by others.
They really do.
So go secure every app, every device, and every identity. Even the unmanaged ones.
You just go to onepassword.com slash unplugged.
All over case.
That's onepassword.com slash unplugged.
Now Wes, it sounds like BCC is an abstraction to EBPF.
What's going on under the hood here?
Yeah, so there are tools that you can just run like we've been talking about, but how
do those tools come to be?
Well, that's where some of those abstractions and a framework comes in that allows basically
embedding C code directly within Python scripts.
And then BCC's tools also sort of handle making sure you've got the whole tool chain available.
So it uses LLVM and Clang to compile things, it handles verification of stuff, it handles loading
it into the kernel for you and attaching it to the right hooks, basically through like
Python method calls instead of you having to run the right commands in the back shell.
So you kind of get like a unified approach where you can write kernel level programs
without necessarily having to deal with like all the other stuff.
You know, I was thinking, gosh, this really seems like going way beyond what my skill which where you can write kernel level programs without necessarily having to deal with all the other stuff.
You know, I was thinking, gosh, this really seems
like going way beyond what my skill set
would be able to manage, but then I thought,
I mean, it's Python code.
I bet an LLM would get me 90% of the way there these days.
And then I could probably just finish it off.
And there's a fair amount of existing tools
that you can either feed into an LLM
and ask about or modify or try and hack on yourself.
Yeah.
And so here's a simple example that we can play with.
It uses XDP.
Okay.
And so there's a simple C program
that has a function called XDP drop all.
Okay.
And that's gonna attach to the XDP hook.
And so we get like a little data summary of the packet info
which we're not gonna care about.
All we're gonna do is return XDP drop which is a magic value that tells the kernel, hey, just drop this.
Just drop all.
So it's going to drop everything. And then in the Python, that's it C-wise. That's all
we do. And then in the Python world, there's a little setup to make, hey, I want a new
BPF thing or whatever. And then we tell it to load in our little blob of C. And then
we attach it to the interface we want. That's two lines of Python.
And then it prints a nice little message and starts working.
And what it should do is drop all network traffic?
Yeah, so we specified a specific interface.
So the...
Yes, you say on this interface, just drop everything.
And that's because we attached it to that specific interface.
And this is like a quick kill switch.
So what we're going to do as a test here is I've set up a ping
with an audible sound so we can see when we're getting a
result. So we can see we're pinging the box right now.
And then at some point, Wes is going to kill.
There's actually like a case going to run that kill script
and it's going to drop all traffic and we'll hear it drop
off whenever you're ready.
Mr. Payne.
Three two one.
There it goes.
Yeah, it takes almost no time at all.
It happens almost immediately.
Now, and of course, in my regular SSH session,
I can't stop it, but I do have a sneaky console here,
so we should be able to take it.
Wes always has a sneaky console.
All right, I'm leaving the ping going,
so if it resumes, we should hear it.
All right, and hit and control C now. Okay if it resumes, we should hear it. And hit and control C now.
Okay. Out there it goes. Look at that.
Yeah. So like that wasn't that much work. This is just an Ubuntu 24.04 box. So like, you know, all the stuff you needed was in the repos already.
I'm thinking, you know, like that's just a, it's not a total practical example, but it's a quick example of the power you have there,
and you're executing that from user space.
Yeah, you do need, you know, root permissions
to be able to load it into the kernel.
But then you're executing things inside the kernel
that are just immediately cutting off traffic
to that interface.
Yes, and I didn't have to build a per kernel module
with the right headers and like have to worry
about messing up my implementation in a way that's gonna like crash the box. You don't even have to
create like a IP tables like that rejects traffic because that would
actually be much further down in this in the processing stack if you're using
something like IP tables. Yeah. Yeah that's cool. That's fun. So in reality
you'd want right you would do some sort of filtering where you would look at the
data structure you get from XDP to figure out like, oh, is this one I want to block
or let it keep going through processing, but yeah.
This is your basic, it's not hello world, it's goodbye world.
Yeah, goodbye world, exactly.
But the main point to show that was that this BCC framework is one way
if you want to start developing custom tools.
But you can get even more ad hoc than that.
All right.
Because there's BPF trace.
Yeah, I want wanna talk about this.
And this is basically sort of like the closest thing
to D-Trace for Linux.
Okay.
If you don't wanna use the Oracle tool.
Okay.
Which I believe Gentoo now has.
Oh, all right.
So BPF trace is sort of, what I know of D-Trace
is like this ultimate tool when you're debugging
or trying to figure out where your system has gone sideways.
Again, like I was talking about,
like why is this one fuse directory taking so long to open?
And so I imagine this is a similar type of...
Yeah, one way to think about it is you basically get
like a nice targeted little scripting language
to write tracing programs
that use the trace points in the kernel.
Wow, okay, all right.
And there are some other options.
There's a program called Ply that I haven't tried but looks also nice, but BPF Trace is
quite popular. So we talked about, right, there's different, there's XDP, there's the K-P dash L or the BCC tools has TP list, which is also just lists all the trace points.
It also shows you can do other user space stuff, but we're not going to talk about that
today.
So there is, so TP list, I guess that makes it a lot easier.
If TP list is going to give you all of the kernel trace points so you know what you have
to work with, that's really useful.
Right.
And it tells you the shape of them too.
You get the sort of data structure.
Okay.
Let me pull it up here. He's got it there on the
machine. Yeah, so here's sort of like, here's some stuff about block dirty
buffer and it tells you you get a dev device, you get a sector and a size and
so you know those are the things you can work with and then it has a name and you
just tell it that you want to trace that thing. So this came up because you
know we talked a little bit
about BCC having file system specific tools.
Like, oh, I want to see watch for slow operations
from ButterFS, say, right?
Well, they haven't yet.
I suppose I should step up or someone should step up
and add these, but there isn't yet a BcacheFS version
of these tools, right?
There we go, that'd be cool.
Obviously, probably what you want to do
is just fork the existing ButterFS one and
modify it and make it work right for BcacheFS.
Sure.
But, if you want to be ad hoc about it, BPF Trace can handle this too. So, I did TP list
and then grepped that for things that said BcacheFS.
So you got all of the kernel interfaces regarding trace points, as I should say, trace points
regarding BcacheFS. So here's just like a list of some of what the trace points as I should say trace points regarding be cache FS
So here's just like a list of some of what the trace points look like I see okay
So now okay, so what I'm seeing here on West of screen is like an output
Be cache FS colon and I can get data update rebounds extend There's a lot of different options here that essentially just what is the file system up to and so these would all be things
That Kent or team have put in explicitly so that there's a way to like
Easily and with low overhead watch these things. So since these trace points already exist you can use
BCC to write your own kind of monitoring. Yeah or BPF trace would write be it. Yeah. Thank you
There's a lot of terms. There's a lot of BPF terms. So here's one that stood out
Copy GC wait and be cash FS is a copy-on-write file system and it does this bucket based allocation
And so it has what's called a copy and garbage collector
So as you're copying files, it'll handle moving things around so that it can then like get good bucket allocation and defragment
things on the fly, that kind of thing.
So copy GC wait is a trace point that tells you when your, when BCacheFS is waiting for
copy GC to complete.
So it can be a reason that your file system is being slow and not responding, especially
if you're moving or copying files around.
So you can see on your screen just a little tiny script, sudo bpf trace dash
e and then you pass it a little string and we tell it we want to use the
trace point bcachefs copy gc wait and then we have a little script here that
one of the inputs is the device and then it has a little stuff that extracts the
major and minor number from that you don't have to do that and then all it
does is print what device we're waiting on
and how much we're waiting.
Yeah.
And so it's, you know, I don't know, 10 lines of code.
And you get a nice kind of structured output.
It's easy to, human readable.
It tells you the total flushes, the total buffers flushed.
You get, you know, or the total output,
whatever the stat is you're watching.
And then so on my system, I'm just doing it on,
you know, one RuteFest disk system, I'm just doing it on, you know,
one RueDefest disk.
So I knew which disk it would be.
I was just kind of interested to see.
Like I went around and I DD'd some big files
and copied them around and I could see like,
oh yeah, right, the operation completes.
And then immediately the print tells me that,
oh, we waited that much.
And the practical use here is, you know, in a rate array,
it might be useful to discover that one of your disk
is significantly
underperforming the others.
It could be indicative of a larger problem, it could be indicative of why you're having
performance issues.
So like, it turns out that like this, there's actually probably a fair amount of situations
where single trace points just on their own might be something you want to look at, right?
So other ones for BKachFS, write buffer flush, that's an important event, or journal writes.
You might be wanting to know
stuff about how the journal works.
But you can also use the scripting language
and the fact that eBPF supports basic data structures
like maps and histograms to make
more complicated combined programs.
So I had an LLM, I fed it the output of that TP list stuff
that told it the available traceable.
All the traces you had, yeah.
And what the structures looked like, so you had and what the structures looked like
So I write the right code
That's clever and
BPF trace makes it easy to have stuff that runs in an interval, right?
So you can set up all the traces and then every five seconds
It can print out a little summary that it's done
And so each time it traces it can update a variable and then it can calculate latencies or deltas
And so each time it traces it can update a variable and then it can calculate latencies or deltas
So I made this is a quick one that it did that looks at that right buffer flush trace point
And it samples it over every five seconds and then it tells you how many flushes happened
How many buffers were flushed how many were skipped how many were fast path?
Flushes and then the total amount of data that was flushed. So just like a quick way to get an accurate little
like every five second little printout.
And what's great is like as far as I know,
this is really putting no measurable strain on the system.
It's one of the more performing ways to do it, yeah.
So you can really get deep insights
without some of the overhead you sometimes get
by that kind of monitoring.
So then to kind of further stress just how far
this whole having an AI help me out would go,
I had an idea. I did ultimately tone it back a bit, but basically like what if I wanted to monitor
some of the mesh network traffic that was going on? There's a lot of options for that,
but can this do this too? And so it built me a little script that you give it an interface name,
like tail scale zero, say, right? And then it'll do a similar thing where every five seconds it'll just look at that interface and it'll
count sent packets sent bytes received by packets received bytes and it'll do
it send latency histogram for you on the sent packets yeah so you get this you
actually I mean it's not a gooey but it's it's it's a bar graph on the
command line yeah right and so the combination of the built-ins at eBPF and then the built-ins into
BPF trace which has some of this stuff to do nice histogram display and stuff from their print command
It's you know, this is a little bit longer some of its kind of because there's a print statement for each thing
We're printing it's like five different things. We're measuring right so it takes up some space on the screen, but like it's a single page code here. So it's not a, it's not crazy to start trying to understand
it.
Right. I've never even seen some of this stuff, but I could like the trace points that all
make sense. They're all, it's just real plain English. Really. It's really easy. And then
you have the print and the time interval.
And so under the hood, it is tracing something called netdev start transmit.
So then it filters, it gets data, it filters on interface name from that data, and then
it has a little counter it's keeping, so it increments the counter.
And then it has a thread ID that you can use, and so it gets the current timestamp in nanoseconds
and stores that in a map based on its thread ID.
And then it also does a trace point for sent packets with netdev transmit, and then it
grabs the start time if it exists for its thread ID, and then it can compute how long
it took to send from that, and then that's where it can use the histogram stuff to print
out a nice little thing now that it's tracking latency.
And then it has another trace point to use net IF receive
SKB, which tracks incoming packets.
And then it just has a little bit of code here
to kind of tie it together every five seconds
and do the printout.
And then it clears all the counters
so that it can do the next cycle.
That's a neat little magic, like, super power,
pocket of power that's in the Linux kernel there for this.
Now, it does mean, right, like, you can't use it power that's in the Linux kernel there for this.
Now it does mean, right, like you can't use it if you don't have trace points you care
about, so you have to start understanding some kernel internals and like what trace
points might matter and what they mean.
But if you have a specific problem or a mystery on your system that you're trying to look
into and you're willing to try to chase down hypotheses using these tools, I think you
could get pretty far.
That's for sure. And then, you know, there are actual complete products out there that are dipping into this willing to try to chase down hypotheses using these tools, I think you could get pretty far.
Yeah, that's for sure.
And then, you know, there are actual complete products out there that are dipping into this
stuff.
I was reading online that there's actually several Kubernetes products that are tapping
into eBPF to do things under the hood.
Also Falco, which is a real-time security monitoring application you can run on Linux,
this is Falco uses eBPF to monitor system calls and network events directly in the kernel,
enabling rapid detection of anomalous behavior with low latency.
This kernel-level approach is faster and less resource intensive than the user space monitoring
tools.
There's also user space tracing you can do.
So if you set it up right, you can trace like JVM events or Ruby things or Python stuff. And so you can have, you have products now too that will use things
like OpenTelemetry or other tracing standards. And you can have a trace then that can have
both the kernel side via the EPPF stuff and the user space stuff without necessarily having
to do as much custom observability implementation in the code base because they can kind of do more dynamic.
And you might be able to then see stuff, right, where like,
oh, there was a problem on the kernel layer,
and then that's why we started seeing increased latency
in the application layer.
And, but you don't know, you know, you can use that.
There's a lot of open source products or open core things,
and there's just a lot of sort of primitives
and one layer above primitives that are probably already on your kernel.
I wonder if it's possible people out there listening have already been using eBPF for
a while and this is all old news to them.
Boost it and tell us, like have you used it and what applications and what utilities has
it been for you?
I mean we've gone through a few examples here but like Wes's story with the VPS and my story with tracking down
like a background desktop application,
there's just these also very practical
day-to-day uses for this stuff.
That's nothing more than just using
some of the pre-existing tools
that you just run and tap into this,
and you don't have to write any kind of scripting code.
Yeah, I mean, it's been around for long enough now
that it's well-packaged in most distros.
Can you use eBPF to figure out why I have so many tabs open?
Well maybe we can add a trace point to Firefox to keep track at least.
jupiterbroadcasting.com slash river. River is the most trusted spot in the U.S. for individuals
and businesses to buy, send, receive Bitcoin.
And they make it easy in three simple steps.
Jupyterbroadcasting.com slash river.
Hey, I just got set up with them and it was in fact very easy.
I think their best low key feature actually has to do with cash, just as an aside.
So Bitcoin is dipping as we record and if, oh man, this is, this was the moment.
So they have a 3.8% interest
cash savings account and they pay the interest out in sats so you put your
cash in there and it's FDIC insured 3.8% interest and then when the when the
price dips on Bitcoin you can use that balance to smash buy and so you can you
can essentially DCA with that 3.8% interest, and then you can smash buy when the price dips.
It's just brilliant.
And River is a really great company.
I had a call with their community management person
and was really impressed with what they said.
And if you're in Canada
and you're looking for another trusted source to get Sats,
bitcoinwell.com slash Jupiter, there you go.
That's her tip.
Well since this week we are hoarding our boost for next week's episode we decided to do a
little dive into the old mailbag. Zach sent us a little note here. I'm behind on listening
to episodes but the NixOS episode from last month, Linux Unplugged 5.9.8, was really interesting.
I wanted to throw out some thoughts responding to a comment that was sent in about wanting a NixOS-like system,
but being able to use the old traditional system admin tools.
For context, I've tried out NixOS, but quickly got to the point where I would need to dive in and figure out those flakes.
At that point, it just became one more thing on that list to do.
Since then, I've gotten into bootable containers.
I've specifically been using the Universal Blue ecosystem with Bootsy,
but have wanted to try out Vanilla OS and Common Arch that also use OCI images for delivery and customization.
Listening to that NixOS episode,
I felt like many of those talking points
that were given in favor of NixOS
were also very in line with the benefits
that I've seen in using OCI images
to build out customized OS images.
I don't use any of the headliner distros
like Aurora or Bluefin, although I have tried a few,
and overall they were quite well put together.
But I've taken their more low level image builds
and layered my own customizations on top of them.
It's been a fantastic way to manage everything
as when I need a new server or desktop,
all I have to do is install my image
and everything's ready to go.
I will add the caveat that there may be some learning
if you haven't already been
familiar with building OCI containers, but I live in a world where professionally and
for hobbies, that's what I'm doing.
So that wasn't a huge burden for me to learn.
Thanks for the show.
Really have been enjoying it and wanted to send this little message in just in case someone
else might find some use.
Well, cool. Thank you for sharing. We love success stories like this.
I wonder if you have any of it on GitHub or anything too, if people are curious.
Good question. I really feel like this nails something that we've been feeling
and chatting a lot about behind the scenes is the really, really brilliant thing
that the Universal Blue folks have tapped into is
this existing knowledge set around how OCI images work and how you can layer
things and really tapped into the whole DevOps workflow that people live every
day and now you can use it that skill set that you've learned you can use it
to customize your Linux desktop. I mean that is just really powerful. It's a
different approach than Nix, right?
Whereas Nix, you're building it from the ground up.
You would use Nix to maybe generate those OCI images
or something like that,
but both, like he says,
are essentially accomplishing similar goals.
Yeah, and a lot of the immutability side of things
and the image-based approach, right?
Like starting to think of your system as a cohesive whole
that you deliver in those coherent units
rather than ad hoc imperative updates.
Now, admittedly, I don't use them as much,
but Brent, when I think of like an image-based system,
I don't necessarily think of flexibility.
Yeah, I was kind of wondering about this.
I've had a few people tell me,
oh yeah, images is totally the way to go.
Of course, we've been playing with Nix and NixOS
at the same time, and the thing I keep tripping on
with images is you build them once
and then you use them many times.
But the way, Chris, I think you and I have been using
at least NixOS on the desktop is your Nix config
is like constantly evolving.
Therefore, the next time you go to install it somewhere else,
it's always that new updated copy.
And I would imagine one of the downsides
with building an image is,
well, you have that intermediate step.
You have to go ahead and build your image
every time you want that newest, freshest version.
Am I understanding that correctly?
Is that a proper downside here?
I know some of the silver blue types, right?
They're still using a lot of RPMOS tree,
which has ways to apply things live.
So it might depend a little bit on how exactly
what you're putting in those images,
or maybe you have like a dynamic switch image thing,
but yeah, I do wonder about that.
And then you also gotta remember,
it depends on what you're doing.
You know, it might just be maybe a lot of your applications
are flat packs and you're installing those
and updating those separately anyways.
Or you're using, you know, various containers or other things for your dynamic environments.
I'll take Liam's email here. He says, good day gents. In the most recent LUP, you mentioned
not having received much feedback about multi-monitor setup. So he sent us a picture of his. I've
got a laptop screen at 1080 at 144 hertz. That's my primary display, you know, for things like
Skype. Then I have two externals, a 32-inch 2K.
I do think that's the sweet spot, he says, at 60 hertz.
And then my rightmost monitor is in portrait mode.
That's how I, that's where I can see more lines of code.
All of this works without issue on XFCE.
In case you do navigate to the image,
yeah, that's a treadmill desk.
I'm on my second one since the start of the pandemic.
This is a nice setup.
Listeners since Side Byte, Lunduk,
and last days in early tech snap. Members since 2022, annual members since December, on my second one since the start of the pandemic. This is a nice setup. Listeners since Side Byte, Lunduk,
and last days in early tech snap.
Member since 2022, annual member since December.
That's awesome.
Thank you very much, Liam.
And I like your setup.
I am, as you boys know,
big fan of the vertical monitor for our show docs
or, you know, like reviewing a config file
or just having a terminal up there.
Journal D.
Yep.
The vertical monitor is a productivity hack if you can afford the luxury of...
Because it's not great, but one of the other things I will use the vertical monitor for is I stack two chats.
Like if I have two different work chats or something, I'll stack them on the vertical monitor.
Works really great.
Right, yeah. It doesn't fit every activity,
so it's sort of a less generally useful screen at times.
Yeah, yeah, it is more limited,
so it's more of a luxury.
I do have to admit that.
I have a bit of a question here for us.
Did we ever divulge what our current setups are?
Chris, I know you've gone into details,
but I don't think Wes and I have actually shared that.
Wes, I bet you kind of move around depending on,
because you mostly use laptops.
I do a lot of laptop.
I've got a dual monitor set up in the sort of main officey bit
and then I have like just a single,
where I do like laptop with one screen
in my living room sort of desk.
You got a monitor at the living room, that's nice.
That's for the like casual work where I want to do stuff
but with the TV on.
Yeah.
Okay, Brent, what's your setup?
Well, I don't actually do dual monitor anymore.
I am now doing the tri-monitor thing.
Chris, I know you're like a quad monitor guy,
but I did steal your little tip.
You've been trying to get me to use this for years.
I have same as Liam here on the right hand side, a vertical monitor.
It's just an old monitor I've had forever. It doesn't matter though, because it's just
chats or like currently I have our episode recorder there. I have the window on which
we're connected together. And then I also have the JB chat, even though we're not doing
it live for some reason, it just makes me feel better and it's perfect for that and so I've got three monitors running
usually for me off a laptop and got this big massive 20 well I say massive
although Liam beat me on this one but it's this 27 inch like 4k Dell monitor
right as my main laptop as my main monitor in front of me and that's been a
pretty sweet setup. I'm on to him. Yes this question because he wanted
Exactly, and I'm here. Yeah, I love it
You know, I've been I know I mentioned this recently on the show
But I have I definitely have always been a multi-monitor person for many many many years now, but I have been enjoying
the single extra extra ridiculously widescreen, it's actually a curved wide screen at home.
And I use GNOME there as well.
And I really like, I mean, you can lay out
three full windows side by side on the screen.
You just really, you get a lot of room.
And when you start getting that kind of room,
you don't really need a second screen as much.
And I will say it is a much easier way
to do desktop Linux.
I do think that'll be in one of my future monitor purchases for sure.
If you go with like an Intel or an AMD GPU right now, maybe one day in video, but Intel and AMD and a single monitor,
I promise you your life will be easier than if you try to be a quad monitor guy.
It's not an easy path. But what if I want to chain display port together? Yeah.
Okay. Yeah. Oh, man. And like, I don't know what my deal is,
because I've only been using computers
for like 30 plus years,
but like I still have major strugs
trying to make sure that the correct monitor
is the default like bio screen monitor.
Oh gosh.
And I don't know, maybe this isn't how it works,
but hand to the mixer if it doesn't change.
I get it just set up right,
and then like a couple of days go by,
and the next time I turn the machine on,
like the different monitor's lighting up.
And I've gone through and I've thought I did
the proper ordering, you know, in the plugs,
and it changes, I swear that happens.
I don't know, maybe I'm making it up.
You need to like use UU you IDs for your monitor setup.
Yeah. Well, it's at the BIOS level. Like, you know, props to plasma. By the time I get to plasma,
it's always fine. And I've been actually having really good success with locking my screens,
they go to sleep, they wake back up, everything's fine, all the orientations are correct.
And when it boots, it always nails it. So Plasma has been just doing great
with my multi-monitor setup.
But my BIOS, right?
Or like when the system's booting,
the console where you're just seeing the output,
like that is a moving target for some reason.
And I don't think that's technically how this works,
but maybe it's me moving monitors around, I don't know.
You could take the next year off and just rewrite your bios.
Oh, I thought maybe I'd do a study
where I just empirically write every move I make down,
okay, this one's in this plug, boot.
Yeah, you're right, that's step one.
I think maybe you should study the cameras
that you have in the studio,
because every time I show up,
I tend to shuffle all your cables around for your monitors.
That is true, That is true.
Okay, well, the pick this week has to be something that gets you up and running with BPF right away, right? Like, we don't want to sit here and talk about EPF this whole episode and not give you a
great tool to check out. And this week, it's Network Top. It helps you monitor traffic from
your network using BPF. Yeah, we should be clear. This one is classic BPF, but it's Network Top. It helps you monitor traffic from your network using BPF.
Yeah, we should be clear, this one is classic BPF,
but it's still handy, and that means it is broadly useful, too.
Yeah.
It's built in Rust.
Oh, it is?
Yeah, so it's super performant, it's easy to use,
and it means you get to use, like, the TCP dump-style syntax
in a nice little 2B.
And so I sent you an example.
What do you think of it?
Well, first of all, it's very readable, right?
Because it is, like you said,
you're getting charts in a terminal UI.
And I mean, it's immediately understandable.
I'm trying to figure out what you're doing here
because I see ginormous, like you have almost no traffic,
and then it jumps up to almost
26 point three megabytes a second and it hovers there for about
40 seconds and then it drops down to absolutely nothing
Yeah, so you'll you see how there's okay, there's an input section at the top. Mm-hmm, and then under that there's rules
Yeah, you'll see what it says all on the left and then to the right it says host and then a specific IP
So you're you're looking at all traffic from just this host. Yes You'll see it says all on the left, and then to the right it says host, and then a specific IP address.
So you're looking at all traffic from just this host?
Yes.
And so those are two separate rules.
So basically it's looking at one interface
and just watching it totally by default,
and then you can add BPF rules
that it'll add as additional things
that it'll monitor at the same time,
and then you can use the arrow keys
to toggle between the views for different rules.
Okay. And so in this case, there was no traffic time and then you can use the arrow keys to toggle between the views for different rules.
And so in this case, there was no traffic because I sshed to another machine here at
the studio. We hadn't really been talking, right? I exchanged a few packets for that,
but that was all. And then I did a net cap back to my laptop, just reading from random.
And so that was the big traffic spike. And then I killed it just to watch it all drop
off. But you know, like you can do, you can do like specific ports that you want to watch, you can do specific IPs,
host names, TCP states, like whatever you can do with TCP dump, basically you can put those rules
in here. And you're getting a command line to a terminal user interface. So it's really easy to
understand like these rules Wes are talking about are clearly delineated in the UI it's really simple syntax so
maybe there's something you're trying to hunt for you find it here and then you
can go do the TCP dump to capture the traffic and do some more analysis and it
is MIT licensed so it is free to use and then there is a bonus pick BPF tune yeah
well you kind of hinted at this a bit and I happen to
know you turned it on. I did. For at least one of your systems. I have it running on my home
workstation. Despite the fact that this is indeed a GPL 2.0 with Linux's call
note licensed Oracle open source project. Yeah, BPF Tune aims to provide
lightweight always on autouning of system behavior.
The key benefits it provides is by using BPF observability
features, it continuously monitors and adjusts system
behavior.
Because we can observe the system behavior
at a fine grain, we can then tune at a finer grain too.
So like individual socket policies,
individual device policies.
Yeah, right?
So think of all those things like C-T-L can tweak.
It's using BPF hooks to watch your system
and then automatically go tweak those.
Yeah, now I haven't really noticed a difference
but I've only had it on for a couple of days
but the idea is brilliant.
It's just so brilliant, like, here's the system,
I'm monitoring, okay, now I'll just go make adjustments here
to make it essentially maybe like ease pressure
on the system or something like that
I don't know if anybody has experience with this because I just started but it seems like a fantastic idea
BPF tune Chris love a link to these in the show notes and
In Nick's it's like just one. It's like, you know service enable BPF auto tune and then you just you got you good
There's more you can do, but it's really simple.
So I was like, all right, I'm just gonna turn this on.
And I just love the concept of the system auto-tuning itself.
Yeah, and they kind of make the case here, too,
you know, if you're doing the cattle-not-pets approach
and like, you know, how many of your systems
ever really even have a human person
who might be on it who could tune it.
Okay, maybe you can hand-tune the database,
but you're not gonna hand- hand tune all your dynamic web workers.
This could be great for my Odroid.
But, you know, if some of them are longer lived,
this game can make sure they're running all right.
I should put this on my Odroid.
I really should, little Odroid's doing a lot of work,
and I never really check in on it.
Just sitting there being a little yeoman about it.
Well, we will have links to this and everything else.
There's a lot of links this week.
Linuxunplugged.com slash six oh five.
And remember, we want to know if you enjoy these deep dives.
We can get into the weeds here a bit too much,
but if this is the kind of stuff you like, let us know.
Because there's, it's honestly for us as creators,
it's a little, creators, it's a little scary
to do topics like this.
I know that sounds stupid, but it is.
It's just a little scary.
So we always like to know what your thoughts are.
We'll be back at our regular live time,
Tuesdays as in Sunday, Sunday at 12 p.m. Pacific,
3 p.m. Eastern.
See you next week.
Same bat time, same bat station.
Now, if you want more show,
remember our members get the full bootleg,
which is clocking in at like an hour and 20
minutes or something right now.
And this is a short one.
This is a short one for them.
And of course, you can get details of that at LinuxUnplugged.com.
Thank you so much for joining us on this week's episode of the Unplugged program.
We just really appreciate your time for listening.
If you want to share it with somebody, we always like that too.
Word of mouth is the best advertising for a podcast.
We always appreciate that. Thank you so much for being here and we'll see you
right back here next Tuesday, as in Sunday, which isn't Tuesday at all. Thanks for watching!