The Changelog: Software Development, Open Source - The power of eBPF (Interview)
Episode Date: August 14, 2022eBPF is a revolutionary kernel technology that has lit the cloud native world on fire. If you're going to have one person explain the excitement, that person would be Liz Rice. Liz is the COSO at Isov...alent, creators of the open source Cilium project and pioneers of eBPF tech. On this episode Liz tells Jerod all about the power of eBPF, where it came from, what kind of new applications its enabling, and who is building the next generation of networking, security, and observability tools with it.
Transcript
Discussion (0)
This week on The Change Law, Jared Wansilla talking to Liz Rice about EBPF.
EBPF is a revolutionary kernel technology that has lit the cloud-native world on fire.
If you're going to have one person explain excitement, that person would be Liz Rice.
On this episode, Liz tells Jared about all the power of eBPF,
where it came from, what kind of new applications it's enabling, and who's building the next
generation of networking, security, and observability tools with it. Big thanks to our
friends and our partners at Fastly and Fly.io. Our pods are fast to download globally because
Fastly is fast globally. Learn more at Fastly.com. And Fly helps us deploy our app service closer to our users.
It's like a CDN, but for our entire application stack.
Try free at Fly.io.
This episode is brought to you by Influx Data, the makers of InfluxDB.
Increasingly, time series data is
all around us. It's in the cloud as applications and services scale out. It's in IoT as more and
more devices come online. Sensor data is time series data, and that's exactly where InfluxDB
comes into play. InfluxDB is the open source time series data platform that allows developers to
build and to integrate applications with time as a foundational component. InfluxDB is made for developers to build real-time
applications quickly and at scale, and they keep improving their platform to build those
applications with less time and less code. Recently, they launched their Edge data replication
feature. This new capability is built into the 2.2 open source version. It allows developers to replicate data from local instances into InfluxDB Cloud, enables
users to aggregate and store data for long-term management and analysis, and to
satisfy regulations. It brings the horsepower closer to the sensor and
gives developers and solution builders the ability to leverage their own
Elastic Compute Resources deployed at the edge. Edge data replication lets you decide strategically what data moves from edge to cloud,
how the data should be enriched and formatted.
Add to this, InfluxDB has ongoing efforts to unify APIs across all its database offerings.
They now provide a path to build once and deploy time-series applications anywhere.
Learn more about InfluxDB and this new feature at influxdata.com slash changelog.
Again, influxdata.com slash changelog. All right, today we're joined by Liz Rice, who is the Chief Open Source Officer with
the eBPF Pioneers, ISO Valent.
Welcome, Liz.
Thanks for having me, Jared. Nice to be here.
Nice to have you for sure. So we've been wanting to talk eBPF for a while,
and now we have you here. So perfect fit. I've heard a lot about eBPF, mostly from Shibit.
Gerhard Lezu has had you on the show, the folks from Parka. A lot of people are excited about eBPF. In fact, in his post, KubeCon EU
roundup, Gerhard said almost half of the people that he talked to are either working on it,
using it, or actively integrating with eBPF. So like lots of buzz, lots of interest. And you've
been working with this technology and talking about it for a couple of years now. Do you want
to catch people up? First of all, what is eBPF? And then we'll go from there.
Yeah, sure. So the letters eBPF stand for Extended Berkeley Packet Filter. And I usually just tell
people to forget that straight away because it's not terribly helpful. It tells us something about the history, but it doesn't tell us about what eBPF is today. What it allows us to do is to run programs within the kernel of the operating
system. We can dynamically load these eBPF programs into the kernel, and we can use that
to change the way the kernel behaves. And originally it was the Linux kernel. There is
now a Windows eBPF implementation happening. So I tend to just think about it from a Linux point
of view, but it is broader than that. And it means we can customize the kernel. We can change the way
that kernel features behave. We can use it to observe what's happening in the kernel.
And the really interesting thing or why it's so powerful is if you're an application programmer,
you probably don't think very much about what's happening in the kernel because you use programming
language abstractions that kind of hide that low level from you on a day-to-day basis. So every time you,
I don't know, open a file or write something to the screen, you've got some function that looks
like open or write or something like that. Underneath the covers, every time you interface
with hardware in any way, the kernel has to be involved. So every time you do any network access,
open any files, access memory, all of these things involve the kernel. And with eBPF,
we can insert programs into the kernel's behavior and we can use that to perhaps observe what you're
doing. Every time you open a file, we could see that happening. We could
see which processes are opening different files. Every time a network packet arrives, we can
manipulate that network packet. We can do all sorts of really powerful things to both
observe what's happening in the kernel and even change what's happening. And that kind of changing what's happening allows us
to build security tooling and it allows us to build network functionality as well. So those
kind of three areas, networking, security and observability are the, I would say, three areas
where eBPF is being used most commonly today. But it's super powerful because of that insight
across everything that's happening on the machine.
So I'm thinking about Docker
because this has been called a revolution.
I think Docker was a revolution in its time.
And I remember when Docker first came out
and Solomon Hikes and the.cloud team
and the app.net team that popularized that technology.
They're like, these containers have been
in the operating system for a while,
but they just weren't accessible.
Nobody knew about them.
They're hard to use.
And Docker really made that simple.
Is eBPF this long-lived feature of the kernel
that all of a sudden we realized was there
and could do things?
Or is it a brand new thing
that's been built into the kernel recently?
So it's a bit of both.
It's been built into the kernel recently? So it's a bit of both. It's been evolving for
years. I mean, I've mentioned the packet filtering element that's been around since I think it's the
90s. And what we call the extended parts, the kind of modern features that we can now use with
eBPF have been added in over the last few years, really. And the reason why it's
all suddenly taking off is kind of also relates to why eBPF is really powerful. So when you
use an operating system, you know, whatever Linux distribution you might be using, it's probably
using a version of the kernel that's four or five years old.
The distributions don't take the latest release of the kernel.
They wait for a while to make sure that it's stable and it's been sort of field hardened.
So when eBPF functionality and features have been added in over the course of several years, we have to sort of go back to a kernel that's maybe
four years old to see what people are really using in production today.
And those versions of the kernel are now new enough to have sufficient kind of eBPF capabilities
that we can do really, really useful things.
There's still innovation happening in the
kernel. There are still new things being added to eBPF, but those kind of core building blocks
are now available in pretty much every production Linux distribution. And that is why over the last,
let's say, 18 months, we've seen this huge explosion in interest because it's not just
niche kind of features for people running
cutting edge kernels it can be used by everybody but the reason why i said it's kind of it also
speaks to the power of the kernel is now that we have ebpf we don't necessarily have to wait
for a new version of the kernel to change its behavior because we can use eBPF to do it. Which is kind of mind-bending, but pretty cool.
So one of the, I think, really nice examples of how eBPF can be used
is for dynamically mitigating kernel security vulnerabilities.
So a really nice example of this is something called packet of death. So maybe there's a kernel vulnerability that is susceptible to some particularly formed
network packet.
For example, maybe there's supposed to be a length field.
And perhaps if you don't set that length field or you set it incorrectly, there's a bug in the kernel that
doesn't know how to handle it. There have been some instances of this in the past. It's not
just theoretical. And if the kernel receives a packet that's been formed to perhaps set that
length field incorrectly, kernel doesn't know how to handle kernel crashes. That vulnerability is exploited.
And in the traditional world, you would need to install a kernel patch and reboot your machine
to no longer have that vulnerability. But with eBPF, you can load an eBPF program dynamically
that recognizes,
ah, it's that kind of network packet that we know is a bad idea.
We need to just throw that packet away.
And you've mitigated that vulnerability without having to actually update the kernel.
You're just running that EVPF program.
You can load that EVPF program into all of your machines dynamically.
You don't have to affect any of your running applications.
It's really, really nice and really powerful.
That's cool.
You got me thinking about old kernels because, well, I used to run back when I first graduated from college, back in the early aughts, I ran a network of Linux machines, you know,
it's like mail servers and spam, I ran a network of Linux machines. You know, it was like mail servers
and spam, you know, all sorts of stuff.
And it was back in the days
when we treated our servers as pets
and not cattle, you know, that analogy.
So I had them all named and stuff.
You know, I use a MASH theme.
I'm not sure if you're familiar
with the show MASH.
So there was Hawkeye and Trapper
and Hot Lips Houlihan and Radar.
That was kind of actually the fun part.
It was like when we used to call ourselves sysadmins.
That was cool and all, and I would always patch them
and keep them upgraded and everything,
but the kernel itself, I would always let it get outdated,
not because I wanted to, but simply because it required a reboot,
and I wasn't about to reboot my production server.
You were talking about how now this has been in there for a while,
but people are getting to where their kernels are upgraded enough
that they have the features.
And I'm wondering if in the days of cattle,
of Elastic Compute and Kubernetes and stuff,
is the reason why people still run old kernels,
is it still that same old we don't want to reboot?
You think that you would just offload the capacity,
reboot the thing, upgrade, and then launch a new node
or whatever you're going to do? Or is it more about, reboot the thing, upgrade, and then launch a new node or whatever
you're going to do? Or is it more about, I mean, I understand like, well, you want to stay a couple
versions behind because like, this is your kernel, you don't want to be on the latest,
but they're generally stable. What are your thoughts on that? Is it still the old,
we don't want to reboot thing? Or do you think it's about security or stability?
I hope it's not that we don't want to reboot thing because...
I hope so too, because that
was a long time ago. And I used to feel that way. Yeah. That whole principle of, you know,
cattle, not pets is exactly that, that you're supposed to be able to, you know, destroy your
machines and recreate new ones and do it all programmatically so that the state of those new
machines is exactly what you intended
it to be. And there's no sort of human intervention that means you missed something
while you were bringing it up. And I think it's very good practice in this day and age too,
to make sure that you can destroy servers and replace them automatically. There's that
really great, I guess, phrase or saying about how, you know, unless you've restored from a backup, you don't know that you've got a backup.
And I think the same is true for unless you've tried destroying a server, you don't know what your recovery process is going to be.
So I think it's accepted good practice these days that you should be bringing new machines and updated machines into the deployment.
But that said, they're still going to be using, you know, they might be using the latest version
of a Linux distribution like Ubuntu or RHEL or CentOS or whatever.
And the distro itself, like Debian, for instance, stays very conservative on their packages.
Exactly.
And they will use a kernel version that is, yeah, you know, a few years, I would say old.
Yeah.
Just to make sure that it's stable.
Curious about your perspective on this related.
So from your perch, you know, related with the CNCF and where you are with your work
and barely being involved in the cloud native community, there's this whole switch to this new style of operations.
It's where the excitement is,
it's where a lot of the money is,
it's where the landscape is,
and you can get lost in the landscape, right?
Like, which service do I pick and all this?
The world moves much slower than that.
As changelog person Jared,
I see all the new shiny, the interesting.
We talk about leading edge technologies.
The rest of the world moves much slower.
And I'm curious, like from your perspective,
are the people still doing it the old school pets way?
Are there still a lot of those organizations
and enterprises?
Or has it kind of been to where like maybe like 80%
have moved over to a more modern
infrastructure? What's your perspective on that? Yeah, I suppose my perspective is colored by the
fact that I'm so involved in the cloud native world that I probably see those people who have
moved over. I certainly, you know, over the years that I've been involved in CNTF and this kind of
cloud native world, we've definitely gone
from you know a few years ago oh amazing we can find an end user to talk about a thing to well
there are loads of people who are using you know feature x project y you know there's the
you know it's hard to find a sort of big brand name that doesn't have, you know, some kind of modern cloud-based deployment these days, I think.
Well, that's good news.
Certainly.
I'm sure a lot of those people do also have legacy deployments as well.
And a lot of what I'm currently seeing, you know, I'm involved in the Cilium project.
Cilium is a networking solution.
I would say kind of mostly for Kubernetes,
but a lot of the challenges we see now are to do with allowing people to coordinate
between their lovely, shiny, new Kubernetes workloads
and their legacy workloads that are running on,
you know, a BGP network in a data center somewhere so uh there's
definitely uh um people haven't thrown away all those data centers yet right there's kind of like
a migration path but you have to straddle for probably years because you're not just going to
throw everything out and and start fresh that doesn't make any business sense it's probably a
bit like um so when i very first got into computing professionally, when I was doing my first job, we were doing things that emulated punch cards because people didn't, you know, was you know the world had invented a lot of things that were a lot more modern than punch cards but it just took people a very long time to migrate away from those really
old systems yeah well i'm nostalgic so i still pine for the days when we could name our servers
you know i like a good naming scheme i love to check the uptime on a server and be like this
server has been up for two and a half years.
That always felt good.
That's why I would never upgrade my kernels.
But I understand.
Things push forward.
You can't do it that way forever. And there's definitely way more reasons to do it the new way.
Think of all those security vulnerabilities
that are potentially in that old code.
All right, you convinced me. This episode is brought to you by our friends at Fly.
Fly lets you deploy full-stack apps and databases closer to users,
and they make it too easy.
No ops are required.
And I'm here with Chris McCord,
the creator of Phoenix Framework for Elixir,
and staff engineer at Fly.
Chris, I know you've been working hard for many years
to remove the complexity of running full-stack apps in production. So now that you're at Fly solving these problems at scale,
what's the challenge you're facing? One of the challenges we've had at Fly is getting people to
really understand the benefits of running close to a user. Because I think as developers,
we internalize as a CDN, people get it. They're like, oh, yeah, you want to put your JavaScript
close to a user and your CSS. But then for some reason, we have this mental block when it comes to our applications.
And I don't know why that is.
And getting people past that block is really important
because a lot of us are privileged that we live in North America
and we deploy 50 milliseconds a hop away.
So things go fast.
Like when GitHub, maybe they're deploying regionally now,
but for the first 12 years of their existence,
GitHub worked great if you lived in North America.
If you lived in Europe or anywhere else in the world, you had to hop over the ocean and it was actually a pretty slow experience.
So one of the things with Fly is it runs your app code close to users.
So it's the same mental model of like, hey, it's really important to put our images and our CSS close to users.
But like, what if your app could run there as well?
API requests could be super fast.
What if your data was replicated there?
Database requests could be super fast. But if your data was replicated, their database requests could be super fast. So I think the challenge for fly is to get people
to understand that the CDN model maps exactly to your application code. And it's even more
important for your app to be running close to a user because it's not just requesting a file.
It's like your data and saving data to this, especially data for this, that all needs to
live close to the user for the same reason that your JavaScript assets should be close to a user. Very cool. Thank you, Chris. So if you understand why you CDN your CSS
and your JavaScript, then you understand why you should do the same for your full stack app code.
And Fly makes it too easy to launch most apps in about three minutes.
Try it free today at fly.io. Again, fly.io. So I agree that this feature of being able to kind of like hot upgrade or patch, I guess,
your kernel without upgrading your kernel via eBPF, modify the way it works, protect yourself from that security
vulnerability today without major downtime or upgrades. I mean, that does seem like an amazingly
revolutionary feature. Is there anything about that, though, that's scary? It's like, hey,
go ahead and change the way that things work in user space like doesn't that
seem a little bit like you could also shoot yourself in the foot yeah people often you know
have that concern when they first hear about ebpf here here's this incredibly powerful platform that
can change the way your servers are operating and security is certainly a huge concern. So a couple of things to be aware of.
First of all, when you load these eBPF programs into the kernel, they go through what's called
the verifier, which checks that the program is safe to run. And this is one of the big advantages
compared to, let's say, a custom kernel module. Kernel modules are just
kernel code that just run. Nothing is checking whether they're buggy or not. With eBPF programs,
the verifier will make sure that it's going to run to completion, so it can't loop forever.
It will check to make sure that all pointer D references are safe. It will check to make sure that all uh pointer d references are safe it will check to make sure
that memory access is safe and you know while nobody who works in security is ever going to say
and that means it's completely secure but you know the verifier does a lot of work to make sure that
the program is as secure as possible and certainly can't crash your kernel. That's kind of
a guarantee. So that's one side of the security equation. The other is that you do have to treat
eBPF like root privileges. You don't want to allow random people to insert random eBPF programs into your service because
they do have the potential to see literally everything that's happening on that machine.
So treat eBPF like you treat root privileges. Be very careful about who you allow to run
eBPF programs. So with great power comes great responsibility as the comics say that makes sense
so how do you run an eBPF program or how would you facilitate not running you know who gets to
who doesn't get to I assume these are standard Unix user tools or how does that work? So eBPF
itself is say a feature within the kernel a bit like I don't know the TCP IP stack, a feature within the kernel, a bit like, I don't know, the TCP IP stack is a feature within
the kernel. Most people probably won't interact with it directly. They'll probably use tools that
take advantage of eBPF. I love to show people how things work. So I've done talks before that show,
you know, beginner's guide to eBPF programming, because I think it really helps people get a mental model if they can see some actual code.
That's certainly how I learn things.
I kind of have to see the real thing. you write ebpf programs you are interacting with the kernel and the kernel's data structures
and writing ebpf code does quite rapidly go from hello world which everybody can do to okay how
how do i safely interact with these data structures and what am i changing when I change this? So for that reason, I think most people are going to find eBPF accessible
through the use of sort of higher level abstractions, higher projects.
A few examples.
So Brendan Gregg, who was at Netflix, he's now at Intel,
he did lots of work to build some eBPF-based observability performance tracing
tools. And there's a whole array of, I think, literally dozens of tools for measuring anything
that you might want to measure about how your system's operating. And then we get into other abstractions projects like psyllium for networking and
observability like arca for seeing flame graphs of how your um or some continuous monitoring of
how your user space applications are running uh there's a tool called pixie that's also in the CNCF for observing your Kubernetes workloads.
Lots of different projects that are using the power of EVPF to give you really advanced capabilities,
but that are in a much more easy to consume fashion than messing with the kernel directly.
Gotcha. So most of us will benefit from eBPF
kind of transitively through tools and projects
that are using it under the hood
and providing some higher level functionality.
And those of us who are going to write
our own eBPF program as well,
you know who you are, right?
Like there's the self-selecting group
of people who are very interested at kernel level things,
are very good at them, or can at least learn and has a use case.
So we were talking about the security angle.
The other one that I think of when I think of something that allows you to hook into
low-level primitives or low-level kernel space is performance.
I feel like you could really slow things down if you do it wrong.
Is that the case?
Or are there also things in there that say it has to be performant, similar to the verifier?
How does that work?
Yeah, I mean, it would certainly be true.
It would certainly be possible to write pathological code that would slow things down.
Generally speaking, most eBPF programs tend to be small. There's a historical reason for that.
There used to be a limit of like 4,096 instructions. So a few years ago, you only could write small
eBPF programs. That limit has now been raised and you can, to all intents and purposes, write
pretty much anything you like
in in ebpf was that pretty constraining for folks yes yes definitely so everybody rejoiced when this
changed yes it's certainly it seems like that kind of constraint might actually be a benefit
at least maybe at first but now that people are starting to do more with it i can see where
they would feel constrained yes yes the fact that you're calling these ebpf
programs directly in the kernel can often lead to some really um good performance improvements
actually particularly for things like networking so as an example of this for psyllium providing
networking to kubernetes pods i need to just back up a bit
and talk a little bit about how container networking works all right let's do it when you
create a container you usually create a networking namespace for that container so the container has
its own networking stack effectively and you create a virtual Ethernet connection that connects your container to the host that it's running on.
And in Kubernetes, you typically have one of these network namespaces per pod.
What that means for a network packet that arrives, let's say a packet arrives to that machine from the outside world through a
physical network card into that machine. And in traditional networking, that packet's got to go
all the way through the networking stack on the host across that virtual Ethernet connection into
the network namespace for the container, and then go through the networking stack again to reach the application.
What we do in Cilium, using the power of eBPF, we're creating what we call endpoints,
a sort of logical endpoint for each pod. And when that network packet arrives, we can inspect it
before it goes through the kernel's networking stack and we can say oh well i know
where that you know the ip address that's associated with that pod i know i know where
it is i have its endpoint right here we can avoid going through the host networking stack and go
straight into the pods networking stack and while that might not sound very much it shortens the networking path dramatically
and when you add up however millions of packets are this is one of the really fun things about
infrastructure software is you know like the these things scale the impact scales up and you can see
real improvements significant improvements in latency by using eBPF to shortcut these
networking parts. There's an old commercial where a guy is running through his office
and he's holding a nickel and he's jumping up and down. I saved a nickel. I saved a nickel.
And he's just telling everybody he saved a nickel. And they're all just like, whatever,
George, or whatever, like they're rolling their eyes or like, you know, perturbed. And he runs past these people who are walking
through the hallway, like who are like C-level execs or VPs or whatever. And he's like, I saved
a nickel every time we do X, whatever X is. And the two guys look at each other and they say,
we do X 75,000 times a day. And you know, it hits you that all of a sudden this micro-optimization
at scale is a huge win.
It sounds like that's what you're describing.
Yeah, exactly.
Exactly, yeah.
Okay, so performance, if you do it right,
you're going to end up better off
with an eBPF-powered program than otherwise.
The other aspect of performance,
so things like observability tooling,
you can hook into these
events that might happen very very frequently but run this very small ebpf program to
count or you know take some information about those events store them in there's a thing called
ebpf maps it's a data structure that's shared between,
or it's in the kernel that the user space programs can access.
So you can store this data very efficiently in the kernel
and then retrieve it, I'm going to say on a leisurely basis,
you know, from user space.
Leisurely.
Because you don't have to kind of do that transition for every event.
You don't have to, perhaps you're collecting that information in user space every hundred events or every thousand events.
So you don't have to, usually the transition between kernel and user space is very costly performance wise.
But you can make it by not having to transition for every event.
It's much more performant.
Let me see if I'm understanding you correctly.
So in the context of monitoring or observing a program,
people would generally take like one out of every hundred
or they would sample because it's cost prohibitive.
You don't want to bog down the CPU that you're running the program on, right?
You want to observe it without affecting it.
And you're saying with
eBPF, because of the performance
savings without having to go back and forth between
kernel and user space,
it's so much faster
that you don't have to sample or maybe you sample
way more often without
incurring the performance cost. Is that what you're saying?
Yeah, that's exactly right.
Yes, yes.
Well, that sounds cool.
I can see where that would be great.
Yeah, so you can see some really powerful metrics
and make security checks for every single time
that a particular kind of operation happens.
And you can filter those events potentially in the kernel.
So maybe you want to police which processes
are allowed to access which files, say. And there's
been a kind of evolution in the way that eBPF programs do that kind of check. So it used to be
very much based around system calls. We're going to look at those system calls and see whether or
not we permit that open. People might have even come across this in the form of
SECOMP. So SECOMP stands for secure computing. It's a pretty old technology. Docker, you kind
of popularized it quite a lot. You had SECOMP profiles that you would associate with programs
to just limit a little bit of what system calls applications are allowed to call.
And that is actually based on BPF. It does use BPF to make those checks. But as eBPF has evolved,
we could start looking at things like not just is this application allowed to call open on any file,
but is it allowed to open this particular file? More recently, there's an interface called the Linux Security Module Interface
that typically has been used for kernel modules that added security checks.
But now we can hook eBPF programs to that security module interface
and we can make checks to say, is it okay if this user or that process
or whatever opens this file we've
been working on something called tetragon that takes this another step further really and allows
us to filter on the path name so the name of the file that we're gonna open we'll we'll filter those
events in the kernel so we're not making the check in user space for every single file that we're going to open, we'll filter those events in the kernel. So we're not
making the check in user space for every single file open. We're checking it in the kernel and
only filtering out the file opens that, you know, just as an example kind of event that match a
particular prefix, for example. So you can make these things, this internal filtering can make these security checks really performant. So let's speak for a minute to the person who earlier raised their hand when I
said, if you're going to be programming EBPF, you know who you are. To that person getting started,
or even like language requirements, are you, is it like a C interface? Can you use various
programming languages? Maybe just give the lay of the land for that person who's like,
would like to actually dive in and go for the hello world and maybe go beyond.
Maybe point to some of your talks or somewhere where they start.
Yeah.
So you typically need to write two pieces of code.
You write the eBPF program itself and that runs in the kernel.
And you're typically going to write some user space code
that can interact with that in some way. Maybe you're collecting metrics in the kernel and you're
going to have some user space code that will retrieve those metrics and show them to the user.
Or maybe the user space program is going to provide some configuration information to the eBPF program.
Some eBPF programs, particularly for networking, there's no user space part involved.
For example, if you wanted to do firewalling, you'd typically just load that into the kernel
and maybe you'd only be reporting a few metrics to user space.
Anyway, so you've got these two parts the kernel code it has to be compiled
into bpf bytecode and at the moment you can compile from c and you can now also compile from rust
so you'll need to be proficient or you know willing to at least take a stab at writing some C code or some Rust code.
For the user space part, you've got quite a lot more flexibility. And this is another kind of
area where there's quite a lot of evolution. There are quite a lot of different approaches,
different libraries, different frameworks. A lot of people start with a
framework called BCC, which has been around for a few years. And it does make it really easy to write
both the user space code and to kind of do things like loading that BPF code into the kernel. BCC
will take care of a lot of that for you. But the downside of BCC is that it actually compiles your BPF code kind of in real time.
So maybe you write your program in Python or at least the user space part in Python.
And when you execute that Python program, BCC will go away and compile your C code and load it into the kernel. And that means wherever you want to run it, you would need the C compiler tool chain, which is not necessarily what you really want.
And one of the reasons why they did that is because wherever you compile that code, you need
to have knowledge or the code is going to have to match the kernel data structures on the machine
where you're going to run it. And kernel data structures do change from version to version.
So if I build some eBPF code on machine A, how do I know that it's going to run on machine B?
And one of the big innovations in the last sort of recent years in eBPF is a thing called
compile once, run everywhere, which essentially allows you to compile the code on machine
A and sort of include the knowledge of what the kernel structures are on machine A.
And then when you take that compiled object to machine B, there's essentially some
automatic work that compares, oh, well, the kernel data structures are slightly different here. So I
might need to adjust the code to take account of that automatically. And that makes it much easier
to build the code and distribute it to users without them needing to have like the C compiler installed.
So that's made quite a big, made it a lot easier for people who do want to distribute eBPF based tools.
Which seems like it's most people, because like you said, you have this small group of
people slash teams who are building the tools and a whole bunch of users who are benefiting
from those tools.
Well, those tools have to get onto their machines and they have to work on their machines and so now you have this cross-platform problem only the
platform is the linux kernel and so you have these different versions different data structures it
seems like a definite real challenge and that sounds like a boon to to ebpf people for getting
their stuff out there absolutely it's it's a real real kind step change. I think we keep seeing these big improvements in
eBPF that just mean that it's more accessible or the tools based on it are more accessible
to the world at large. And that's fantastic.
What's still painful? Where could the next step change come?
Oh, that's a great question question some of this is still painful because
not everyone is running a modern enough kernel to have you know all the latest features especially
that instruction set change right the the max 4096 you said that was a recent thing yes that
that would be an example yeah so um if you have a tool that needs to exceed that limit then yeah you might need to
do some tricks to to make it run on older kernels right there are things like the way that you can
actually attach programs to different events in the kernel some of those have evolved and become more performant. So for example,
you'll see loads of examples of eBPF programs that attach to kprobes. Kernel probe, it's basically
a hook in the kernel. kprobes pre-existed eBPF, but it was for tracing or adding tracing probes
into the kernel. And it's essentially add a kernel,
add a kprobe at the entry to any function,
pretty much in the kernel.
Here's the function name.
I want to add some tracing there.
And you've been able to use eBPF programs,
hook those to kprobes for a long time.
And over time,
there's been some more and more performant ways of doing that.
So the current preferred approach to that is called F entry.
It doesn't make that, but it certainly doesn't matter to anybody who's just using the tool.
Pretty easy change for somebody who's writing the code, but it does just, and like we were saying before,
all those nickels,
every, you know, tiny improvement
in the speed of running that program
once it will add up
when you've done it a million times.
So we'll see things like
more performant hooks.
There's also, I think,
for EBPF,
for folks who are developing EBPF tools,
there's lots of innovation happening in things like testing and code coverage and sort of instrumenting your code. Getting your code
through the verifier is still something of an art. And there's, I think, probably more improvements to come in sort of making it easier for people to write those eBPF programs kind of without necessarily having to do such a dance with the verifier.
There's a really great quote I read that described the eBPF verifier as a, I think it was a fickle beast.
It's quite a nice phrase
sounds like something I'd like to stay away from
if at all possible
it's a challenge though This episode is brought to you by Square.
Millions of businesses depend on Square partners to build custom solutions
using Square products and APIs. When you become a Square solutions partner, you get to leverage the
entire Square platform to build robust e-commerce websites, smart payment integrations, and custom
solutions for Square sellers. You don't just get access to SDKs and APIs, you get access to the
exact SDKs and the exact APIs that Square uses to build the Square platform and all their applications.
This is a partnership that helps you grow.
Square has partner managers to help you develop your strategy, close deals, and gain customers.
There are literally millions of Square sellers who need custom solutions so they can innovate for their customers and build their businesses.
You get incentives and profit sharing.
You can earn a 25% status revenue share,
seller referrals, product bounties, and more.
You get alpha access to APIs and new products.
You get product, marketing, tech, and sales support.
And you're also able to get Square certified.
You can get training on all things Square
so you can deliver for Square sellers.
The next step is to head to changelog.com slash square
and click become a solutions partner.
Again, changelog.com slash square and click become a solutions partner again changelog.com
slash square and by our friends at retool retool helps teams focus on product development and
customer value not building and maintaining internal tools it's a low-code platform built
specifically for developers no more ui libraries no more hacking together data sources and no more
worrying about access controls.
Start shipping internal apps that move your business forward in minutes with basically
zero uptime, reliability, or maintenance burden on your team.
Some of the best teams out there trust Retool, Brex, Coinbase, Plaid, DoorDash, LegalGenius,
Amazon, Allbirds, Peloton, and so many more.
The developers at these teams trust Retool as their platform to build their internal tools,
and that means you can too.
It's free to try, so head to retool.com slash changelog.
Again, retool.com slash changelog. okay so we've talked a lot about what ebpf is i'm going to ask you a slightly different question
interpret it however you like who is ebpf oh interesting question so I'm going to answer that with a bit of a it talks about how it was using eBPF to create this really cool container networking.
And I thought, that is really cool.
And yet it's so foreign to like, nobody can possibly use this because it needs this cutting edge kernel at the time.
But I thought, that's interesting tech you know i'm gonna just
keep an eye on that i want to see how that works and then a few years later i was working for a
security company and somebody suggested using ebp they'd actually been doing a project outside of work using eBPF on Android for a sort
of security related project. And they were like, could we use eBPF to build security tooling? So
we worked on that for a while. And in the meantime, I was seeing more and more of this eBPF community
kind of building up more and more people using eBPF community building up, more and more people using eBPF
and different projects.
And Isovalent, which was the company that Thomas and Dan Vendlandt, Thomas who I'd seen
speaking at DockerCon, they founded Isovalent around the Cillian project.
And they were facilitating this this EBPF community. And, uh,
you know, I realized that that was, if I wanted to really immerse myself in EBPF,
that was the place to join. And that's why I joined IsoValent. And since I've been there,
one of the things that really I hadn't appreciated before i was there was the extent to
which psyllium and ebpf have actually been kind of developed in almost lockstep so there are two
maintainers of the ebpf subsystem in the kernel uh one of them is daniel balkman who works for
isabel and the other is alexei starovoytov who works for Meta. And they are the people who kind of drive eBPF's future.
And a lot of how eBPF has evolved, certainly on the networking side,
has been in order to allow Cilium to build some cool networking feature,
we need support in eBpf to enable that you know
maybe different hooks into different parts in the networking stack as an example so it was just
fascinating to me to see just how much of that the development in ebpf had really been done to
enable i mean to enable the platform as a whole, but particularly with this vision of how
EVPF could improve networking and facilitate all these really efficient networking features.
So for me, that was kind of why I was drawn to that team. The expertise is just,
you know, beyond comparison, I think, and a really exciting place to be.
That's cool.
So in terms of open source project
related to a corporate entity,
how does, I guess, where does Cilium stop
and Isovalent start with regards to financial arrangements
and stuff like that?
How does that all work?
So Cilium has always been open source.
And one of the things that we did
not long after I joined
was donate the Cilium project to the CNCF
so that it's got that foundation ownership
so that everyone can have confidence in it
as a community project.
And iSurveillance provides
an enterprise distribution of that.
And the way we approach this is that
Cilium works. Cilium open source works. provides an enterprise distribution of that so and the way we approach this is that you know
psyllium works psyllium open source works there are plenty of people who are using
psyllium at scale you know some house that you can go and take a look at the psyllium website
and there's a list as long as your arm of of household names who are using Cilium. And a good number of those are using it open source.
But some of them either need support, that's the kind of classic open source model, or some of them
need features that you only need if you're an enterprise, you know, a large enterprise.
For example, I mentioned before about, about you know integrating with legacy workloads
in data centers you know if you're operating your own data center you are the kind of organization
that spends money on software right you know you you want to license software you want to have
somebody who's going to provide some some around that software. So those kind of features that people really only need in an enterprise environment,
some really advanced UI, some really advanced security tooling features
that we add on top of the open source project for our enterprise customers.
And there are other people, because it's in the CNCF, there's, you know.
Other offering.
Other people who can use Cilium
or build products on top of Cilium.
Love that, because now you're competing
on a level playing field.
Of course, as the maintainers of Cilium,
you have that expertise, the street cred, so to speak.
So other people have to establish that.
But the fact that you can have multiple service providers
or licensors or offerings that are competing on
how well they do that and not competing over
the proprietariness of the software that they're running,
I mean, that's spectacular for everybody.
Yeah, absolutely.
I mean, I'm a big believer in the power of open source in general
and specifically for infrastructure
software just that you know the sheer number of people who will use open source code it creates
such field hardening that i think for that kind of core capability something like you know how
your networking is plugged together it's really an advantage for it to be open source. And then having this huge community of people
who also feel confident about contributing to it as well,
which I absolutely love.
Totally.
Well, if you look at the network stack or the OSI stack,
whichever one you prefer,
you want as much competition at the application layer as possible.
And collaboration at the lower layers.
You know,
if we're all reinventing these low level things,
then we're just,
we're just wasting efforts and you can find competitive advantages by doing
that at,
you know,
but they're going to be just isolated to you and have all those drawbacks or
everybody can collaborate at those levels,
have all the best minds working on the same thing,
pushing everybody forward and then competing at the application layer way more effective that way. I mean, just the way it
should work. 100%. Yeah, absolutely. And we can take lessons from history around this. So back
in the day, if you wanted to use TCP, you had to include a TCP library in user space.
And nowadays, we fully expect that you're going to run TCP.
You're just going to use the kernel services to get that TCP connection going.
And I think it's completely sensible to extrapolate
from that direction of travel and expect that more and more
of the infrastructure software will not just become that kind of travel and expect that more and more of the infrastructure software will
not just become that kind of commodity open source software but also that more and more of it will be
handled by the kernel especially now that the kernel itself the kernel authors don't have to
handle it right with ebpf you have more and more kernel based offerings that are happening by
people who are not you know Linux kernel, or we can talk
about Windows kernel as well, core maintainers. The innovation can happen in a much broader group
of folks because of eBPF. Yes, yes. And it gives us the ability to have, you know, people are using
Linux for, you know, just the broadest range of different purposes. And the Linux kernel
has to work on, you know, IoT devices and desktops and data centers and probably the moon. I don't
know. And in fact, I think Linux does run on the Mars. I'm sure it does. Yeah. One of the Mars
landers. Yeah. So Linux, you know, the kernel itself has to be super flexible and very backwards compatible,
but you can do much more sort of innovation
and bespoke things using eBPF,
which is a rich theme of innovation.
There's definitely a parallel between browser tech
and kernel tech in this way.
I know I've heard people compare eBPF
to be like the JavaScript of the Linux kernel.
Yes.
Just because of the JavaScript's relationship
to the browser.
And I can definitely,
when I first heard that,
I was like,
I don't know about that.
But the more I think about it,
the more that it does make sense as an analogy.
And all of the innovation that happens in the browsers
by people writing JavaScript libraries
that eventually those things prove themselves out,
like jQuery, for instance,
the way it does a lot of selecting and stuff.
All of a sudden, that stuff gets brought back in
to the browsers.
And so we could have a similar thing here
where you have the innovation in the eBPF world
and then the best ideas,
the most obvious ones in retrospect,
the ones that everybody needs,
well, that stuff is baked back into the kernel maybe.
That would be cool.
Absolutely.
So on the website eBPF. Well, that stuff is baked back into the kernel, maybe. That would be cool. Absolutely. So on the website, ebpf.io, it gives four kinds of applications, networking, security,
profiling, and observability. You mentioned three. We could probably bike shed a semantic debate
on is observability and profiling, I guess, different things, the same thing. Is tracing
part of observability? I don't know. It doesn't matter to me. But if we think about these three categories that you gave earlier,
networking, security, and observability, can you give examples of people doing cool stuff,
feel free to name names, or open source projects in each of these three, like if you're going to
say, okay, here's cool stuff that's happening. I know you've touched on them throughout the show.
But if we're just gonna say, here's a cool networking stuff, here's cool security stuff, here's cool observability
stuff. What would you, what would you mention for those three? Yeah. So for networking, I mean,
obviously I am very involved with Cillium. So that's the first name that comes to mind.
But there are other, you know, really uh users and projects so facebook now meta um have
a project called katran which is a load balancer and i'm trying to remember what the date is i want
to say 2016 let's let's say 2016 and i'll apologize if it's if i'm not quite right there but okay basically every single
packet since that date that goes to facebook has been through ebpf every single packet has been
processed by an ebpf program wow if that doesn't convince people about the scalability of ebpf i
don't know what would right cloudflare also um using ebF to do things like DDoS protection and load balancing.
And yeah, lots of really cool blog posts that Cloudflare have written about their use.
If we turn to observability, I mentioned the work that Brendan Gregg had done
and this whole series of tools.
And he developed that at Netflix where they were using it for again
really scalable you know performance measurements and and yeah whether we call it tracing or
observability or metrics or or whatever it's all about you want to give a hot take do you have a
do you have an opinion on this uh is is it worth distinguishing or no i think there is a a bit of
a distinction where observability allows you to make sense of data.
So things like, I mean, I think we would say that metrics were different from logs, were different from traces.
And then maybe observability is about how do I take all that information and actually ask questions of it in a sensible way.
An umbrella term, sort of.
Yeah, yeah.
Fair enough.
It's definitely, I uh there's an overlap
definitely i know it's been the subject of many go time unpopular opinions whether or not
observability is a thing or not really a thing so it's fun for nerds to talk about yeah i do
quite like it as that umbrella term for i want to know something about what's happening in my system in my cluster
in my deployment yeah it's quite a nice term yeah so observability projects i i think i mentioned
pixie which is cncf sandbox project parker is another one that's really interesting for
observing your applications behavior psyllium has a project
called hubble or sub project called hubble that shows you things like your service graph in
kubernetes so how your services are communicating with each other i can also show you individual
network packets which is pretty cool if you're trying to debug yeah Yeah, debug DNS because it's always DNS. Yeah, right.
Yeah.
Other networking problems are available.
By request.
Yeah.
And then on security side,
so Falco was probably the first security project,
certainly in the CNTF, that was using eBPF.
This project from my former colleagues at aqua
called tracy and then in psyllium we have a sort of psyllium family we have a sub project called
tetragon which is allowing you to create low level security primitives almost in yaml form and apply them to your kubernetes cluster
and you can do really cool things with with touch gone i i get a bit overexcited about this because
if your kernel is modern enough you can not just detect that something is you know potentially
malicious behavior you know process processes opening the wrong file
or connecting to a cryptocurrency miner
or whatever malicious thing that you've detected,
you can kill that process synchronously
from within the kernel.
And what that means is the process get killed.
It's not like you have to go and tell somebody
and then eventually your process gets killed it's happening right there
and then and it stops the attack before it happens which is super fun to demo i love it i bet that
makes it for a great demo very good that helps out a lot especially for people who are interested in
cool things being built with psyllium so we've been I've been ferociously grabbing links as you talk.
So those will all be in the show notes for the listeners.
As we wrap up, Liz, let's talk about the future,
where things are going.
You mentioned Windows kernel.
I assume that's like a burgeoning thing,
or is it available?
And what's coming down the pipeline in the eBPF world?
Yeah, so the Windows eBPF,
I know that they have got as far as being able to demo Cilium running on Windows.
So whether it's in production Windows, I don't know.
But it's certainly some significant progress being made there to implement it on Windows.
I'm sure we will hear more about that um and also more about sort of the future of ebpf
more generally at the community conference that's coming up that um i'm part of the team organizing
called ebpf summit which happens september 28th and 29th put it in your diary um and we are going
to have amongst the speakers we've got both I mentioned before, the two kernel maintainers who work on eBPF.
Both of them, Alexa and Daniel, are both going to be speaking at eBPF Summit this year.
So we should get a pretty good insight into what the future of eBPF is from a platform perspective.
I think that will be super interesting.
And we're also going to hear from lots of end users,
lots of people working on different projects.
We're in the process of going through the session proposals at the moment,
and there are so many good proposals.
It's going to be really difficult to choose.
But last year, we had a lot of fun.
We had a lot of people on Slack kind of doing things
like capture the flag with us interactively. And it was tons of fun. So hopefully this year's
EVP Summit will be even bigger and better and more fun. Very cool. And that is fully virtual.
So access from anywhere with an internet connection. Yeah. Excellent. Well, anything we left uncovered?
Anything else you want to talk about here before we call it a day?
Not that I can think of, no.
Yeah, we've pretty much covered everything.
Excellent, excellent.
Well, listener, all the links to all the things are in your show notes.
Liz, thanks so much for joining us on the show.
Thanks for your excitement and your ability to so well explain these difficult concepts
and get other people excited.
It sounds like you're a great advocate for this technology and the power that it's unlocking
for so many of us through people building cool tools and stuff that probably we haven't
even thought about yet.
We have these three major buckets, but I'm guessing there's maybe a fourth bucket out
there, maybe things that we don't even know.
So I'm excited about the future that eBPF is affording us.
Absolutely. Me too. It's an exciting space to watch and be part of.
Absolutely. Well, we may have to have you back, maybe put a marker a year from now,
maybe have you back, have a catch up, see what's going on,
see what people have invented in the meantime. That'll be fun.
That would be awesome.
All right. Thanks, Liz. Thanks, everybody. on see what people have invented in the meantime that would be fun that would be awesome all right thanks liz thanks everybody thank you for tuning in let us know in the comments what you think
about ebpf the link is in the show notes thanks again to our friends at fastly and fly everywhere
around the globe our pods and our app are fast and that's because Fastly and Fly are fast everywhere.
Check them out at fastly.com and fly.io.
Thanks also to Breakmaster Cylinder for making our awesome beats.
And last but not least, thanks to you for listening to the show all the way to the very end.
Tell a friend if you love the show.
Send them to changelog.fm and tell them to subscribe.
That's it. We're done.
We'll see you next week Thank you. Teksting av Nicolai Winther Thank you. Game on.