Disseminate: The Computer Science Research Podcast - Matt Butrovich | Tigger: A Database Proxy That Bounces With User-Bypass | #45
Episode Date: December 18, 2023Summary: In this episode, we chat to Matt Butrovich about his research on database proxies. We discuss the inefficiencies of traditional database proxies, which operate in user-space, causing overhead... due to buffer copying and system calls. Matt introduces "user-bypass" which leverages Linux's eBPF infrastructure to move application logic into kernel-space. Matt then tells us about Tigger, a PostgreSQL-compatible DBMS proxy, showcasing user-bypass benefits. Tune in to hear about the experiments that demonstrate how Tigger can achieve up to a 29% reduction in transaction latencies and a 42% reduction in CPU utilization compared to other widely-used proxies.Links: Matt's homepageVLDB'23 paperTigger's Github repo Hosted on Acast. See acast.com/privacy for more information.
Transcript
Discussion (0)
Hello and welcome to Disseminate the Computer Science Research Podcast. I'm your host, Jack Wardby.
The usual reminder that if you do enjoy the show, please consider supporting us through Buy Me a Coffee.
It really helps us to keep making the show.
Today, I've got with me Matt Butchevich, who will be telling us everything we need to know about database proxies.
Now, this specifically is some work that Matt published at VLDB last year, and it's a paper called Tigger, a database proxy that balances with user bypass.
Welcome to the show, Matt.
Yeah, thanks for having me.
Pleasure's all ours.
Cool. So let's get started then.
So can you tell us more about yourself, Matt, and how you became interested in databases and database management research?
Sure. So I mean, so for context, I'm currently a PhD student at Carnegie Mellon University
in Pittsburgh, Pennsylvania, hopefully in my last year.
And my background is sort of like,
I've sort of worked my way up the computing stack in some ways.
So my undergrad education and work experience
was all in embedded devices, mostly.
So specifically storage devices,
things like hard disk drives, SSDs.
And so thinking about like,
how does your data become durable
at like a physical device level?
And then when I started my master's at CMU,
sort of working my way up the stack,
thinking about things like file systems,
distributed file systems.
And then I got to my advisor, Andy Pavlo's
graduate course on database systems. And then I got to my advisor, Andy Pavlo's graduate course
on database systems. And his enthusiasm was just infectious. And so I was really sort of interested
in the systems that touch on so many aspects of computer science fundamentals, theory,
programming languages, storage systems, network systems, and relating that to things that I had
sort of seen before. So a database systems write
a headlog is just a file systems journal. And I had seen things like asset properties and two-phase
commit before. So I had a lot of context to sort of step into the database system world and start
working with Andy. And a year later, I was his PhD student. And's such a, a great, uh, problem space to, to work within the database
system community because you have so much potential impact. Like every major application
is backed by a database system. So there are so many opportunities for people to bring their sort
of diverse backgrounds, different, different expertise, um, to bear in this sort of huge
problem space and have a real impact on applications that people
interact with every day. Yeah, for sure. There's plenty of problems to be getting stuck into for
our careers and we're probably always going to have a job as well, right? Because it's that
important in most modern stacks these days. Cool. So we're going to be talking about proxies today,
but maybe we could start off with some background and you can tell a listener how modern applications actually connect to databases these days and kind of
what are the general problems with the approaches that they take today? Right. So, I mean, there are
a couple of trends that stand out with the way sort of modern cloud applications, I guess, like
connect to database systems these days. And on the one extreme, you have this independent scaling at the application layer, this sort of elastic compute that software applications designed for the cloud use now when they need to throw more resources at the problem.
You suddenly have dozens or hundreds more servers, and these are opening hundreds or thousands more connections to the backend database system. So you have this challenge of persistent connections
and this huge connection scaling challenge is created by this sort of elasticity at the
application layer. And at the other end, you have serverless computing, which is creating
the opposite problem or in some ways like a contrasting problem to this sort of high
connection count scale, persistent connections scaling issue of very short lived ephemeral connections that go through all the trouble of
connecting to the database system, authenticating, often going through like an SSL handshake,
only to submit a single query and then disappear. And these two extremes are challenging for a
couple reasons for database systems at the persistent connection issue, when you get into large numbers of connections, the
database system just sort of runs slower and slower.
So we can see query latencies go from single milliseconds to hundreds of milliseconds because
the database system has to apply concurrency control logic to more and more connections
in order to successfully commit transactions.
Even if the connections aren't even doing anything, each idle connection in something
like Postgres takes on the order of megabytes of memory. And that's before each connection
starts populating its caches. So if you have tens of thousands of connections, you're suddenly
using gigabytes of memory to do nothing. And then at the other extreme,
with these short-lived serverless connections, again, I'll use Postgres as an example.
Postgres uses the process model for parallelism. And so every connection that shows up,
it has to fork a process. And this is a pretty heavyweight operation for the operating system
to deal with. And every time a connection goes away, it has to reap that process.
It has to, especially because Postgres relies on shared memory, there's just a lot that goes
into forking a single process. And this repeated forking and reaping creates a lot of challenges
for database systems to keep up with. So the solution to this is a proxy. Can you tell us
about what we actually mean by a proxy in this context and the
benefits that they provide? We're talking about proxies that understand the actual application
protocol, in this case, a database systems network protocol, like wire protocol. So they speak,
again, for example, like the Postgres wire protocol. They sit right in the middle between
the front-end clients and the back-end database systems. Clients connect to them as if they're connecting to the database system. So they speak the same
authentication protocol. The client's sort of oblivious that they're talking to a proxy at all.
They issue their transactions, their queries, they get the results back as if they were just
interacting with the database system. So the way proxies actually help sort of the problems we
described before, because it seems a little counterintuitive that putting a box in the
middle, like a middleware in between a client and a backend database system
would actually help your application run better. The big thing that they do is, at least when we're
talking about like a high number of connections, is something called connection pooling. So you
can stick 10,000 clients in front of a proxy. They all connect to the proxy and the proxy can
actually reduce the number of persistent connections to the backend database system. Assuming sort of like these 10,000 clients aren't like maxing out their
throughput and completely like saturating their connections as fast as possible. These proxies
can actually multiplex the transactions from the front end to a smaller number of backend
connections. So like you may be able to serve 10,000 frontend clients over 40 backend connections by just sort of interleaving the operations on the persistent
connections to the database system. So by having a fewer number of connections at the database
system, you sort of get the benefits of serving 10,000 clients, but you don't actually have the
database system reasoning about that many clients and it's able to operate more efficiently because
those data structures are smaller. So we've seen, you know, on the order of like single digit milliseconds
going up to tens to hundreds of milliseconds, when you, when you have tens of thousands of
connections, you can bring that latency back down by, by putting a proxy in between by actually
doing connection pooling. At the other extreme with, with sort of serverless setups, you know,
I mentioned things like the SSL handshake,
the cost of forking a process, that's all painful on database systems. The proxy sort of absorbs all that cost. And you're sort of pushing the computational cost to a different application,
oftentimes a different box, to absorb the CPU cycles that have to be spent on connection setup
and connection teardown,
authentication, and logging in and all those other things. And the backend database system
just gets to focus on running queries, which is really all we wanted to do.
So yeah, like I said, it seems a little counterintuitive to putting a box in the
middle can actually make your system run faster. But as sort of like a simple example,
I think in the paper we showed very early on running a
workload with 10,000 clients and running like 2,000 transactions per second on a standard
benchmark, I think it was YCSB. You could bring something like the P99 latency down, it cuts it
more than half, which is a huge number when people are reasoning about P99 latency. And you pay a
minuscule increase in minimum latency
because you're doing an extra network hop.
But if you can pull in that P99,
in general, what we saw is putting a proxy in between
would compress the distribution of transaction latencies
because the backend database system
wasn't having to reason about as many connections
and its concurrency control protocols
could run more efficiently.
I guess the other thing I'll say about proxies is the application isn't limited to just connection pooling. People are doing a lot
more interesting things with them with respect to like automatic sharding. So if you think of
like traditional middleware stuff like Vitesse, they're doing, you know, that was a way to scale
out MySQL. People use them for workload mirroring to sort of replicate a workload across
production and staging environments. Other people use it just for security reasons,
credential management, transparent upgrades of your database system in the background.
You put a proxy in the front and you can suddenly just switch the traffic transparently between
an older version and a newer version. I think one of the engineers at Meta talked to me at VLDB and
said, yeah, they have more people working on their MySQL gateway, effectively their proxy, than they do actually have working on MySQL. It's such a
core component of getting performance out of MySQL for them that they actually have way more
engineering on their proxy. Well, yeah. I mean, that reduction in P99 latency there is, I mean,
that just makes for such a better user experience as well right that sort of reliable
compression that distribution is just makes the usability and the the experience of the system a
lot nicer straight away so i can see why in this scenario the indirect the indirection layer is a
massive win um and that's so interesting that kind of got more people working on their on their
gateway than than their um than their actual sort of i guess like deeper down the sort of database
stack um a question on that though is how much of this and the need for these proxies is down to the
fact that we are basically using database systems that are 30 40 years old like and kind of database
systems that have been built today kind of have having this sort of this all these sort of features
within them already or are they they kind of also making the same design decisions that postgres made that mysql made like how do
they look compared to the old systems the new systems yeah it's interesting to see how the
need for proxies is changing when you talk about a newer system i mean everything i keep talking
about is like in the context of postgres which like you said is an old design and and and a lot
of systems have inherited those designs at decisions when we look at how many systems have forked Postgres. But in modern times,
it's actually getting interesting, like, what is a proxy these days? Because a lot of these cloud
native database systems, you look at like a modern Postgres replacement, like Neon, who's trying to
do serverless Postgres.
They've got a proxy in the front, but they're using it to sort of figure out which serverless compute system they want to like, you know, the user submits a query.
That's sort of the persistent connection that the client sees.
You submit your app, you submit your query to it, and then they need to route that to
an appropriate serverless, you know, ephemeral compute node to run your query.
I think we're increasingly seeing
more things, which is sort of why probably Facebook call or Meta calls it their gateway,
right? Like this sort of persistent connection that the client connects to or the application
connects to submits queries. And it sort of obfuscates what's happening on the back end of
these database systems, how you choose to,
you know, what sort of back end you send that query to, how many compute nodes you spin up.
You know, I could argue something like, you know, how you interact with a modern cloud
data warehouse like Snowflake. It's a web interface. You submit a query to it. You get
your results back. Is that a proxy? I don't know. The lines start to get very blurred
between what is this persistent front end that clients connect to, submit queries to, and then
they get routed to some sort of obfuscated cloud backend. And I think we're going to see more and
more of this sort of stuff. Actually, it seemed like there was an interesting resurgence.
Poly stores are back in a big way. That was sort of like a big discussion. And I feel
like at the LDB this year of just like, there's all these heterogeneous cloud database systems
that companies are relying on to solve their problems. And people are like, well, how do we
consolidate this into a front end that figures out, well, which back end would actually be the
most efficient way to run this query? So if anything, I think we're going to see,
I think we're going to call them proxies less and less. I think proxy implies like a very simple application that's not applying a lot of logic. And it's just sort of this front end that is sort
of the persistent, you know, to steal the phrase gateway that the users submit their queries to,
to interact with the database system. And sort of what happens behind the scenes is sort of obscured from the user.
So yeah, to your point, proxies probably were brought into the industry to solve sort of
scaling problems for systems like MySQL and Postgres.
And they still solve that problem in a lot of ways.
And it still lets a lot of people get off the ground
with sort of less elaborate database system solutions,
and then you can start with a proxy to start to scale things out.
But we're going to see the capabilities continue to grow
as people use them for things like automatic sharding
and putting maybe query caches in there, stuff like that.
Yeah.
Interesting how they come kind of more, I guess, general purpose or having a lot of functionality kind
of absorbed into them.
What are a few of the, just to kind of drop some names in, some of the most common proxies
at the moment out there that people use?
So on the Postgres side, pgBouncer is the one that people always start with.
It's been around for ages.
There was pgP pool two as well. And PG pool two
made its name because, you know, it was around before Postgres had native replication. So if
you wanted to do like PG pool two was sort of solving the problem of, of bringing sort of
distributed durability to Postgres systems before Postgres sort of had a lot of those
native capabilities on the, on the MySQL side,SQL side, you've got things like proxy SQL. I think I mentioned, again, the test markets
itself as more of a middleware. But again, it was a proxy to help them scale out MySQL.
And then there's a number of folks that have sort of looked at PgBouncer and started to either
rewrite it or fork it or do different things with it. So the folks at Yandex actually
wrote their own Postgres proxy they call Odyssey, which is meant to be like a much higher performance
version of PG Bouncer because they found it a little too restrictive. And then the folks behind
Postgres ML have a Rust-based proxy that they've been working on for a year or two called PGCAT
that they mostly use for sharding,
but it also does connection pooling as well.
So there's a number of these out there because ultimately the application logic is pretty simple,
depending on how many features you want to put in there.
They're pretty simple applications.
Yeah, nice.
Cool.
So I guess some of these proxies,
some of the current proxy implementations, you identified there's some pitfalls in them and they're not perfect, hence the paper. So what are the problems with the sort of current family or the current iteration of these proxies? you know, in particular, someone like Yandex, they were looking at something like PG Bouncer and saying like, the performance just isn't there. We need to write a multi-threaded implementation.
PG Bouncer is a single threaded application. It's written in C. It's a very simple application that
just kind of looks at the Postgres protocol and manages connection pooling. But like,
if you want to get better performance out of it, particularly parallel performance,
you have to rely on things like running multiple instances of PG balancer on
the same box, either put them on different ports with a sort of L3, L4, like HA proxy in front to
sort of load balance across multiple PG bouncers. You see people with that sort of setup. Other
folks can use, you know, I think the reuse port option in Linux now lets you run
multiple applications off the same port and it'll do load balancing. So you can run multiple PG
bouncers on the same box, listening on the same port and sort of Linux manages that port reuse
for you. But in general, like PG bouncer setups got very complicated as people wanted to get
performance out of them. And there was one blog post, I want to say it was from Figma,
where they were looking at where's PGBouncer spending its CPU cycles when this thing's maxed
out. And they actually found it was burning all of its CPU cycles effectively on system calls,
mostly on mem copies, when it's doing SSL read and SSL write. That's where it's burning all its
CPU cycles. And that's sort of a new
challenge for us as we look at building high performance applications, whether they're
database systems, whether they're network applications, or something kind of in the
middle, like a database proxy, of what do you do when your application logic is so lightweight
in something like pgBouncer, that sort of getting access to the data from the you know through the
operating system through the os services it sort of becomes your bottleneck and and what how do you
how do you design applications in that sort of problem space awesome cool so kind of when you're
thinking about you kind of come up with this alternative architecture for solving this problem
which you call user bypass so can you tell us more about user bypass and how it helps maybe solve some of this kind of the bottleneck that the operating
system can be in this scenario? Right. I mean, so for a little context on user bypass, I want to
talk about kernel bypass for a second, right? So like I'm describing this problem of, okay,
your application logic is very lightweight. You want access to your IO data and you're paying
this huge cost to use the OS services.
So, you know, something that's been around for decades now at this point was this notion
of, okay, do kernel bypass.
So it's like for networking, if you're, if you have an Intel NIC, you can get Intel's
DPDK framework and you can design an application around a kernel bypass and the DPDK framework that lets you sort of
read data directly from the network device, bypass the OS stack, get it to user space as
fast as possible. It's sort of the idea behind kernel bypass. Get your data up to the user
space application logic as quickly as possible and then send it back out. And you can do that.
There are applications in production that use
kernel bypass. In practice, it's been, you know, we've looked at it. It's been tough to use
in general because the API has generally not been stable. So putting something into production has
its pitfalls of like your application release is going to be tied to a specific driver release.
If a new driver comes out, you need a new application release and vice versa.
Getting good performance out is really tough, depending on how you design the work queue and multi-threading in your application layer.
Are you pulling on a single core?
Are you event driven?
Things like that.
So we sort of argued kernel bypass is maybe not the solution here, which is where we said, okay, well, instead of bringing your application data up to user space as quickly as possible, what if you could sort of push your application logic down to the data as closely as possible?
And so we've said, okay, well, that's user bypass, effectively bypass user space. And this sort of isn't, it's not a new idea. This notion of kernel offload,
extensible operating systems, spin OS in the 90s, exokernel, people have been talking about
operating system extensibility and embedding more application logic in the operating system for a
long time. If you look at the extreme end, you have unikernels. So where we argue, we're reasoning
about things a little bit differently is we're relying on these new extensibility features in the Linux operating system called eBPF.
And people debate on whether it still even stands for the extended Berkeley packet filter.
People just say it's eBPF now.
That's the technology.
It's not an acronym anymore.
Whatever, let them argue about it. But the idea is we can use this extensibility feature in the
Linux kernel that allows you to write these sort of safe event-driven programs that aren't quite as
invasive as loading something like a kernel module, which has security concerns, or even
compiling your own kernel. It sits in this nice middle ground that actually allows us to extend
the operating system in a way we really couldn't before with these earlier efforts, either due to lack of a standard API, lack of adoption with
things like, again, like spin OS, exokernel, like they were academic efforts, but we never really
saw anything in the real world that we could use to sort of achieve this idea of user bypass and
designing applications that either sort of take a hybrid approach
to some operations happening in user space, some operations happening in kernel space,
or just entirely pushing all of our application logic into the kernel and never having to
go to user space at all.
Yeah, nice.
I just want to ask you a quick question about eBPF.
And you said before some of the old sort of attempts in this space weren't very stable and the API wasn't stable. Is that a different story with eBPF. And you said before some of the old sort of attempts in this space
weren't very stable and the API wasn't stable. Is that a different story with eBPF? That is
something that you can use in production with a lot more sort of guaranteed and that you aren't
going to be tied to a particular version, like you said, or anything like that.
Yeah. I mean, I'll be honest. It's a little bit of a cop-out on my part. eBPF has absolutely
been a moving target the last 10 years. I think EPPF was officially
branded that in 2013. So it's been a decade now. In the six years I've been a grad student,
the ecosystem has changed wildly. Now it's getting better. Strictly for the better,
the way I had to write EPPF programs five years ago is very different than how I write them now.
And the tooling and everything has gotten better.
But that doesn't mean it hasn't been a moving target and APIs weren't stable for a few of those years.
I think we've reached a point where I could actually, you know, endorse it in a production environment.
And like for what little that carries as an academic graduate student saying you should use eBPF.
But like, I mean, it is in use now.
Like, you know, Meta is running hundreds of eBPF programs on every server. but like it's i mean it is in use now like yeah you know meta's
running hundreds of ebpf programs on every server netflix uses it google uses it it seemed pretty
wide adoption now and the people pushing these techno this technology uh have really thrown their
weight behind it yeah there's a little bit of cop out to say like ebpf you know implying that ebpf
would have like a stable api over the last five years because it hasn't. But it's, it's, I think it's, it's the direction forward for when we look at
how to do extensible operating systems in production right now.
Nice, cool. So you've, you've used this and you've designed a DBMS proxy, which I guess is the star
of the paper, Tigger. And I think it's a great name as well i like the names of things and i and i guess i can guess why it's called tigger right but yeah and i'd like to know
more about the actual naming discussion and who came up with that and whose idea it was but yeah
tell us about the design and implementation of together there we go the floor's yours well we
can we can do a quick derail on the name that was because of my wife she's a she's a big winnie the
pooh fan and i think i i mean we've seen these weird winnie the pooh movies like i read a couple years ago like winnie
the pooh went into the public domain uh so that's why we see these weird like winnie the pooh horror
movies now like just they're just making weird stuff with that license now and i think when i
first submitted the paper i was just like oh winnie the pooh's in the public domain i can use the name
tigger they won't sue me they being disney i guess um and i realized later that that i guess tigger was
in a later book like a year or two after so tigger the character wasn't actually in the public domain
yet so i'm not actually it's not actually clear to me where that stands right now so we're just
we're just not going to tell disney about of it. But hopefully they're not watching the database research community.
You never know.
Wait until that email from the lawyer, right?
I hope not.
I can't afford those legal bills.
So, I mean, when we designed the proxy, the basic idea here was let's use this idea of user bypass,
of avoiding going to user space to sort of fast path the common
operations that we see out of these database proxies. Like let's just take pgbouncer. It's,
it's like I said, it's, it's actually a pretty simple code base to read. It's, it's single
threaded. It's written in C. I'm, I was pretty familiar with the Postgres code base. So C was
comfortable to me. And so it was the protocol and take that you sort of surgically insert some some
short little little ebpf uh hooks and and and you know events that that need to fire in order to to
create the to get these ebpf programs run but the idea was like okay for these common operations
like connection pooling where effectively what the proxy is doing is it's reading a socket
it's applying postgres protocol specific knowledge seeing like what's the transaction status? What is
the message that actually showed up? Is it a query? Are we changing the settings? What is it?
Figure out which backend to send it to, send it to the correct backend.
And then on the server connection, when a response comes back, it's figuring out who does that belong
to? What is this message? It's usually a query result. Send it back to the correct client. That's not that in the kernel and you hook that into like the networking
stack of the operating system. So for those sorts of messages, hopefully the common ones, queries,
query results, you never have to slip those to user space. You never have to wake up a user thread.
They don't have to call a system call and do extra buffer copies and sort of all the trouble that
goes along with running a user space
application. You can just handle that in the network layer of the kernel when it runs on a
software interrupt, when the message arrived, and send it back out to where it's supposed to go.
And then for the less common messages, like changing settings or something like that,
you just slip that to user space and you let the user space component sort of handle that stuff. So Tigger is sort of a hybrid user bypass application where you take pgBouncer and then
you embed all this application logic in the kernel with eBPF specifically to handle things
like connection pooling. And then we also implemented another feature I think was workload
mirroring to be able to replicate a workload between multiple backends.
But so, yeah, we started from PGBouncer and that worked out really nicely for us.
Nice. Yeah. And a quick question on the hybrid aspects of it there.
And is there any way, again, I don't know if this is actually a valid question,
but say you implemented for all of the operations, the opportunity to either use to go straight through the OS or to use the user bypass path.
Is there a way that you can sort of like, okay, after a certain threshold, switch to using user bypass because it's now more, you're going to get more like savings in some way than this.
Just send it to the OS, right?
Or in an ideal world, would you just skip the OS altogether and everything would run through the user bypass?
You just chose not to choose for some of the less common operations
because there was just no point for the workload.
Or is it some sort of dynamism?
What's the word?
I can't think of the word I'm trying to say.
But yeah, maybe we'll have my questions come across.
Yeah, I mean, so you could certainly figure out
if there's like a sweet spot for how much of the work to do
in letting it slip to user space. And in fact, like, the resources are finite in the kernel,
in the sense that like, you sort of have to say, when you start the system, like,
how many how many sockets are we actually going to let like, run these sort of user bypass programs.
And at some point, that will become saturated. And even queries and the responses
could slip to user space if like we couldn't handle it in kernel space. Because like the
kernel adds these constraints about like, you can't wait, you can't yield, like everything
needs to like happen right now. So if the resource isn't available to do something when that event is
firing, the only thing you can do is slip it
to user space and sort of let it actually sit in a queue or a buffer and wait for it to actually
be pulled off that buffer. So user space is sort of this fallback slow path when you can't actually
service it in time. If every software interrupt handler in the kernel is already busy running one
of these sort of user bypass programs to handle
the connection pooling, you have no choice but to sort of slip it to user space. So in that sense,
you sort of get dynamism for free. But we also, you know, you sort of alluded to this,
we sort of went after queries in the results, because we're like, well, that's gonna we're
gonna get the most bang for your buck,. There are certain other message types that the protocol will drop in, but we're less
interested in those.
And then there are some actual limitations where certain message types you just can't
handle with user bypass in kernel space.
You just couldn't do it.
So certain forms of authentication messages, like if you're using something more complicated
that requires hashing with a specific hashing algorithm for passwords or something like that, a lot of those algorithms you either can't express in a program that will satisfy.
There's this BPF verifier that sort of like the kernel has to make sure your program doesn't run for too long.
So either the hashing algorithm would either isn't going to run in time for the to satisfy the the verifier, or it's too complicated and you can't link in external libraries to a BPF
program. So I couldn't just grab an off the shelf hashing library or authentication library and run
that in BPF. So we really just focused on queries and the results and slip everything else to user
space. One from an engineering standpoint to for, uh, you know, bang for your buck on most of these
workloads. Yeah yeah nice on the
engineering effort there how long did this sort of take you to implement and how easy was it to sort
of work with eb ebpf and the person you familiar approach was anyway and pg balancer so maybe that
kind of eased the ease the job a little bit but yeah how difficult was it so it it wasn't too bad
it was it was probably probably only a couple months of engineering it wasn't too bad. It was probably only a couple months of engineering. It wasn't
too bad. I don't know what that means to different people based on when you say a couple months of
engineering, because some people are working 100-hour weeks, other people are working 35-hour
weeks. It's hard to say what that means. The hardest part is, is, um, when your BPF program
fails it, particularly with the verifier, it's very opaque why it failed. So the verifier is
this, this, when you load a BP, an eBPF program, and I'm going to go back and forth between eBPF
and BPF, just cause they're the same thing, but, but one's easier to say than the other, I guess.
But like what the verifier does when you load your program
is it's trying to verify
that your program is going to terminate.
It's not going to access memory in a bad way.
It's not using APIs.
It shouldn't.
And when it thinks you've done something wrong,
it just yells at you in assembly, basically.
It just throws a bunch of BPF bytecode,
which just looks like assembly, basically.
And you have to try to figure out like, oh, no, what did I change?
And what went wrong?
And why is it angry at me?
So that's where you lose most of your time with writing BPF programs.
The logic itself, like, yeah, like I said, the Postgres protocol I was pretty familiar
with already there's there's
there's okay blog posts and like youtube videos where people discuss bpf but the documentation
is actually quite poor um in the and that's part of the moving target thing even the official like
man pages and linux kernel documentation is pretty rough your best hope is to actually look at
samples in the linux kernel code repo. Like
the samples are pretty decent and because they have to compile and still run every time they
do a new kernel release, they should still work. So you spend a lot of time reading kernel samples
to understand like how the APIs actually work because the documentation is bad.
Right. Cool. Yeah. So, right. Awesome.
Let's talk some numbers.
And so you've dropped a few numbers in throughout the chat so far,
but can you tell us more about your experiments,
how you went evaluating Tigger and yeah, what your setup was,
what you compared it against and yeah, some of the results.
Right. So, so for the evaluation that ended up in the paper,
we ran, I think everything in AWS EC2. For us, putting it in
sort of a cloud-native environment, I guess, helped strengthen the argument that this is
really a problem that it's a cloud-scale issue, right? And even in our case, when we're talking
like 10,000 connections, some people would argue like that's not a cloud-scale problem.
But for us, we put it in EC2 so So we could we could reason about, you know, data center latencies and tens of thousands of connections we use, you know, at the time, whatever their modern compute instances are like the C6Is, I think, everything was run in Ubuntu Linux, but really any modern Linux kernel should support like the Bps features that we needed and we compared against so i mean i've
discussed pg bouncer uh that's the standard option when you're talking about um a postgres proxy
um we also compared against i also alluded to yandex's odyssey and again like odyssey is this
it's a complete rewrite of of of pg bouncer they they kind of throw every trick in the book of like
getting a fast user space
application that they could that they could reason about in terms of like how some people argue you
should design modern parallel applications so they rely on tricks like um user space co-routines uh
to to get a lot of parallelism and user space and it's the, the code is, it's, it's impressive, but tough to read.
Like they wrote their own assembly to handle. They're not using like the C++ because I don't
think that coroutine stuff went into C++ till C++ 20. So I think they wrote their own coroutine
stuff that, that, you know, they have all the assembly instructions there to save the stack
and swap that stuff in and out. It's a complicated piece of software, but they argued it was necessary to get the performance that they needed.
And then today, I think we'll only talk about YCSB, which is the Yahoo Cloud Service Benchmark.
It's effectively exercising a key value store over the Postgres wire protocol. This wasn't
running on local host via stored procedures. So we actually used three servers in every experiment with an asterisk.
The one experiment where we don't run three servers, we run two servers because we wanted
to see how it compared to Unix domain sockets.
So we wanted to actually put the proxy and the database system on the same box, which
is actually what Yandex does.
So people deploy these proxies in a lot of different ways in the real
world. Some people are, yeah, putting them on the same box and talking over Unix sockets with the
database system. But there's a lot of design options there. But for us, we typically ran
three boxes. One beefy box to be the workload generator running bench base to run things like
YCSB and TPCC. The proxy was generally on a more constrained two-core box because people typically aren't throwing a ton of resources at their database system proxies.
And then we had Postgres running on a separate dedicated server, again, with a bunch of CPU cores and memory to hopefully make it such that Postgres wasn't often the bottleneck.
Because that was a bit of a challenge with sort of, if you want to benchmark proxies, you got to make sure your database system isn't the bottleneck, right? Like you want to try to actually look at the
performance characteristics of these proxies. So that was a little bit of a challenge.
Yeah, nice. So what were the results then? Give us the highlights, the headlines.
Yeah. So I mean, one of the results that stood out to us was, so if we just run the YCSB benchmark at like
max throughput and just say, okay, how, what can these proxies actually push for like a two core
box in EC2, PG Bouncer and Odyssey could do about 32, 33,000 transactions per second.
And Tigger could do 45,000. So over 40% increase in throughput.
And the reason I highlight that one is this was a severely CPU constrained scenario. There's only
two CPU cores. So we're sort of seeing the benefits of the CPU efficiency of Tigger there.
And so one thing we did in that experiment is we actually scaled the size of the proxy box. We threw more compute resources. And in particular, this would benefit Odyssey
more than anything because Odyssey expresses parallelism. So we were just kind of curious,
like how much CPU do you have to throw at this problem for Odyssey to catch up? I mean,
it turned out you needed eight times the number of CPU cores at eight times the cost to actually get equivalent performance
as Tigger's sort of baseline. So it was a huge compute cost for Odyssey in order to be able to
catch up to what Tigger was able to do, which with a much, much less expensive box. So we sort of
wanted to try to understand that, like what's actually going on here. So one of the things we
did is we sort of profiled all
these applications and and it's it's it's difficult to sort of describe charts and figures
in an audio setting so it's all try to try to keep it so like okay let's let's run ycsb and
what we want to do is we fix the throughput so fix the throughput at the proxy. Sort of nothing's really bottlenecked in this scenario.
The workload application isn't bottlenecked.
The proxy isn't bottlenecked and the database isn't bottlenecked.
Everyone's able to hold like this 2,000 transactions per second sort of throughput with like 10,000
clients on the front end.
And then we did some CPU profiling and said, okay, where are these
proxies actually spending their CPU cycles? And as expected, PGBouncer and Odyssey had like the same
amount of time spent in the kernel and in software interrupts, which is just sort of another way of
classifying like doing network work in the kernel. They're handled by these sort of software
interrupt handlers. And Odyssey actually used more time in user space than PG
Bouncer did. It's sort of this complicated user space co-routine setup actually ended
up using more CPU cycles in order to accomplish the same amount of work as PG Bouncer.
Now, if you throw a bunch of compute at it in parallel, obviously it can pull ahead. Like
that's worth the cost for some folks. But then when you compare against Tigger,
there was no user space CPU time, unsurprisingly, for this sort of workload, all the work's done in
kernel space. And there's actually even less time spent in the kernel and software interrupts,
which is a little counterintuitive, but it's able to handle all of its work in a single software
interrupt. It's able to handle it all on the receive end. Something arrives at Tigger,
it immediately figures out what to do with it and sends it right back out. You don't actually have
to handle another sort of software interrupter, more kernel
code on the transmission side. So you're actually reducing the total number of CPU cycles even spent
in the kernel to sort of apply something like user bypass. Awesome. So it sounds like a win
on all fronts then. So I mean, I always have to ask this question and I got told off once for
asking, are there any problems with it? And who was it? I can't remember who it was. I'll have to ask this question, and I got told off once for asking, are there any problems with it?
Who was it? I can't remember who it was.
I'll have to edit this bit out.
But I asked them, are there any, what do they say,
scenarios in which your tool, whatever it was,
doesn't perform well?
And they said, I should ask,
is there any scenarios in which Tigger is suboptimal,
the way I should ask these questions?
So, yeah.
Yeah, so, I mean, I alluded to some of the problems
with Tigger and maybe user bypass.
I'm just like, eBPF is still not the friendliest environment.
Like someone can't just say, I'm going to take my application, I'm going to apply user
bypass the way Tigger did.
Like, could someone use Tigger in production as like their proxy?
Absolutely.
Would it make sense for everyone?
I don't know.
There's a lot of reasons people are still, like I mentioned, sort of these big companies that are all in on eBPF. It's not clear to me if you need those
sorts of resources in order to adopt eBPF though. Like could these small companies who are just
deploying pgBouncer to sort of scale out Postgres, like would they feel comfortable deploying an
eBPF backed application? I don't know. Like, did they have the compute resource or the engineering
resources to vet these BPF programs and make the judgment call of like, should this be put into
production for us? Because like, these programs have to run as root or you effectively need like
sysadmin capabilities in order to load eBPF programs. So the engineering decision of whether
you would actually want to take this approach, I'm not sure what that looks like in the real world in terms of like what companies are
making the value call.
Obviously, the big companies are making the value call saying we're in on eBPF.
It's worth the engineering complexity to get these performance wins that I kind of
describe here of like you can solve the same problem using far fewer CPU cycles if you
can get the BPF program written and are comfortable putting
it into production. The other thing is like you're trading off, you're trading off like
complexity of understanding your user space application for understanding the Linux kernel.
Now, to be fair, to get really good performance out of something like Odyssey or any other
network application or even database system, you do have to tune things like IRQ affinities in the Linux kernel
and understanding how soft IRQs are handled and how it handles top half and bottom half of interrupts
and deferred aspects of interrupts. There's all this complicated inner workings of how
things like interrupts are handled in the Linux kernel. And if you really want to build a fast
application like Tigger to apply user bypass for network IOs or even storage IOs, you're going to end up reading a lot of kernel code to understand like how to, one, where to sort of surgically insert your solution into the OS stack and how to get the best performance out of it.
Because if you don't tune the kernel right, you're still going to get quite poor performance out of this, I think.
Cool, I guess, yeah.
So where, you said that you could probably use Tigger now
in production if you wanted to.
So where do you go next with Tigger
and what's next on the research agenda?
Is it productization of it
or is there other features you want to work with?
I know obviously you're coming to the end of your PhD,
so you've probably got other sort of pressing matters,
but where would you go next with it?
I think there's kind of two directions
you could go off of sort of this paper. One is just sort of pressing matters, but where would you go next with it? I think there's kind of two directions you could go off of sort of this paper. One is just sort of,
I think we alluded to this in sort of future work is like, and I alluded to this earlier
in our conversation, there's a lot of features that the proxies can support things like sharding,
things like query rewriting, query result caching. Does it make sense to push some of those sorts of things, more features? How far can you go with eBPF to satisfy the verifier and storing things in the
kernel? You can sort of think as proxies, as sort of, if they're this sort of middle box sitting
between clients and the database system front end. We started thinking as well about like,
well, in these distributed cloud data warehouse systems,
you have these shuffle nodes that are sort of ephemerally holding onto data as well, right?
They start crunching on some results and they do one query operator and then they forward it to a
shuffle node, which sort of redistributes it to another set of workers. It's sort of the old
MapReduce thing we've had around for ages now, it seems like. And these shuffle nodes
are sort of ephemerally holding onto data as well. Like, does that make sense to be a sort of
application class that you push into the kernel? You don't have to go to user space for things like
shuffle nodes. The problem is like, in the same way that what is a proxy isn't standardized,
what is a shuffle node isn't standardized either. So like, I think BigQuery has dedicated shuffle
nodes, Redshift, it's just another operator
to them.
So like special casing shuffle node logic may or may not make sense depending on the
cloud system you're talking about.
And the other thing that I think would be interesting that came out of this work, it
would, I think, benefit the academic community to sort of just have a survey paper that looks
at like, how do you deploy these
database system proxies? And what are the trade-offs? You know, we tried to do the best
we could with the limited background section, the space that we had to sort of introduce this
concept of database proxies to the community. But it raised a lot of questions of just people like,
well, I have this workload, would a proxy help me here? I have this many front-end connections,
and I have this many backend servers, and would a proxy help, you know, my performance here? I have this many front end connections and I have this many backend servers and would a proxy help, you know, my, my performance here. And it's, it's like,
I don't know, you kind of got to just got to try it, but it would be interesting to see like a,
a sensitivity analysis and a lot of different dimensions to understand like
how much compute should you throw at your proxy? How many, how many persistent connections should
it have for a given workload? And you start getting into like,
you could get into queuing theory stuff of just like, okay, well, you could describe exactly,
you could probably prescriptively say exactly how many persistent backend connections your proxy should have given X number of front end clients submitting work at X rate.
Yeah, you could solve that problem, you know, apply some queuing theory, but like
something prescriptive to say like these
things exist and and they have all these interesting properties and capabilities but there's these all
these different trade-offs like an experiments and analysis paper would be really interesting to see
there yeah for sure we need that sort of canonical paper right that we can reference and rather than
i mean obviously you kind of say oh yeah just go and look at and see and find out what whatever
works for you but yeah we would definitely um benefit in the community would benefit from having that sort of work done because i mean i
mean you obviously you didn't you've done a lot of extensive background so how much work has there
been in the database community from the research side on this sort of in this sort of area it feels
to me sort of very untouched still i mean this feels like an initial sort of foray into it but
it feels very much like a nice wide open area of research possibly.
Yeah, I mean, effectively none.
So if we can, I think you were going to ask a question,
I think coming up that's just sort of like,
what's the most interesting lesson I learned while working on Tigger?
I'll get into that now.
So like there's been very little academic discussion,
particularly on database proxies.
The network community talks about proxies and you go look at an SDI.
People are doing proxies for different things, oftentimes application layer proxies, but
sometimes not.
Sometimes people are just talking about like, how do you do faster TCP proxy?
But you alluded to like, there's not a lot of discussion in the community.
And that's a bit of a double-edged sword from a researcher standpoint.
Like, we were really drawn to this problem because we were like, it seems like everybody's
deploying proxies in production when they need to scale out, particularly MySQL or Postgres.
But like, you know, database systems you start with as you're doing your startup and your
business grows.
And the first thing people grab is a proxy to help scale out their backends.
But no one's talking about it in the academic community, like completely oblivious to it.
So on the one hand, as a researcher, you're like, this is super attractive. I can't wait to work in
this area that like no one else is thinking about. The downside to that is no one knows what you're
talking about. Like the researchers have, excuse me, The reviewers have no context and it's on you
to write that context and make them care about the problem and get them to understand this is
a real problem in industry. People are using these things, they're solving a problem, and
it can be good to plant a flag on a problem, but it's your challenge to make the community
care about it. So like,
I alluded to like, I would, I think we would be served by this sort of survey paper, this,
this sort of, I wish I had written that first, because then you could have cited it. Like here, we had to do everything in the background section and, and making the community care about it was,
was really challenging with the limited sort of page count that you usually have in a, in a
background section of a paper. So like, it's a very double-edged sword to sort of see this
verdant empty pasture of research and go running towards it and realize that the responsibilities
on you to make the community care yeah and get them to actually understand that it's it's it's
sort of a a real problem yeah that's definitely a fun a fun a fun challenge
i guess but i do feel like there often is that kind of disconnect between what's happening
in the real world and what's happening in research right it's really hard to sort of
bridge that gap and then get people in either space to care about the thing the other person
cares about right but yeah no i definitely agree with you there kind of going on kind of kind of
going on from that a little bit is can you tell us more about the backstories this paper and so
how did you come across this being a problem you said like i observed it like those people doing
it no one was kind of in the research world was thinking about it how did you kind of get to that
point so it was it was sort of a bunch of different interests coming together so i had already been
exposed to BPF
for some previous work I had done where we really relied on BPF for observability.
And so I had sort of dipped my toes into that community, learning about BPF. Another was sort
of just reading what people were saying about scaling out their database systems, where they're
talking about like, hey, here's how I had to deploy pgBouncer to solve this problem. And what are
the trade-offs of pgBouncer, MySQL? Some of it was sort of like, you know, off the record
conversations of just, you know, Andy is working, he's got his startup on the side. And he's just
like, man, a lot of people are using proxies these days. Just sort of this anecdotal observation that
seeing these things a lot in production and an aspect of database systems that we hadn't really
reasoned about as an academic community. And increasingly just thinking like, these are
interesting applications. There are interesting problems you could solve by putting a box in the
middle. And because suddenly you have observability of the entire workload, you can permute the the workload if you want like there's all these tricks you can do if you have this sort of
middle box so that got us really excited about this idea of like proxies are are sort of accepted
by the industry community as like a way to solve database system problems and then my background
in just sort of performance engineering of trying to figure out like why is code slow how could you
make it faster and it's like oh wait code is slow, just because we're interacting with
the operating system. And because we're going to user space to do the work, you know, do proxies
sit in this perfect, like, you know, the Venn diagram of like, the application logic isn't
that complicated, it sits in the network layer. So you know, we could use BPF to sort of try to
solve this problem. And there's enough, you know, people in industry,
they're not papers we can cite,
but people have written enough online that they can at least say like they're
used and that there's a little bit of credibility there of just like,
this is an actual real problem space.
Yeah. Nice.
When you're kind of working on Tigger then,
I'm not sure if we've covered this so far or not,
but were there sort of any,
because I like to ask this because obviously research is nonlinear, right?
There's ups and downs, you hit dead ends.
Were there some things that you tried along the way that you kind of,
that failed?
And yeah, some of them are war stories kind of with Tigger.
Yeah, I mean, I think part of the experiment process
and part of the design process was trying to figure out like, for these BPF programs, there's a lot of different layers you can hook in, in the Linux networking stack, like how far down do you want to push your application logic at an extreme end, you can push it all the way down to if your NIC supports it down to the hardware, so that you could basically be interrupted, you know, intercepting network communications.
It's called, they're called XDP BPF programs. And depending on what your OS and what your NIC supports, you can push your logic really far down. So we started actually looking all the way
down there. Like, it's like, could you handle this at the XDP layer, but you lose some of the
abstractions that you, that we rely on. I'm just like, okay, well,
suddenly you're below TCP. So we looked at trying to figure out like, could you make this work
without doing TCP? No, you really got to maintain your own TCP state machine. There have been other
efforts to use BPF to sort of fast path network communications and people just say, oh, we use
UDP because it's a much easier protocol
to reason about and sort of maintain our own state for but like we spent a little bit of time trying
to figure like could we do our own tcp state machine down there would you get the performance
benefits like no move up a layer let the os handle tcp for you um it also gets you access to things
like you could use um kernel tls so you you know you could still use s things like you could use kernel TLS. So you, you know, you could still
use SSL, like people could still use encryption to talk to the database system. Like if you push
all the way down to the XDP layer, you sort of lose those capabilities. So we spend a lot of
time trying to figure out like, and some false starts of like writing code at the XDP layer,
different, different layers in the Linux kernel. And this is where I said, like the,
one of the challenges of BPF is like, you spend a lot of time like the internals of the linux kernel in the software stack itself of like
just being proficient figuring out how to write an application that gets you the performance you
want you actually have to understand a lot of the linux kernel which is kind of a pain um so
yeah like like some some was sort of experimental like trying to make it work at xtp others was
just spending a lot of time understanding the ins and outs of the of the kernel stack and uh arguably losing time on that
but like that's part of the process i guess yeah for sure and you never know at what point in the
future that knowledge you've gained by going down that path is going to help you out with something
else right so it's it's always not lost positive yeah. You've learned something, right, which is always good.
Cool, yeah.
So, I mean, obviously, Tigger's very sort of pioneering in the sense of the technology it's using to sort of solve this problem.
But what kind of impact do you think Tigger can have,
maybe on like someone's day-to-day life as a software engineer
or a data engineer?
And what sort of wider ramifications do you think this work can kind of have for the database community as well yeah
so i like to think that tigger is just more of a almost a case study or example for for like the
loftier goal this idea of user bypass so that's actually sort of like what my dissertation will
actually end up being on this notion of like, we want to design software that
sort of contrast this kernel bypass approach and instead push application logic into the kernel.
We can do this now because of things like EVPF. We don't have to load kernel modules. We don't
have to compile our own kernel. So for like the average person, like a lot of people don't need
a database proxy, but are people doing applications or writing applications or relying on applications where like, you know, particularly network or really stuff related to IOs, where you're bringing
data all the way to user space, doing very little with it and either discarding it or
sending it back out.
Those may be opportunities for something like user bypass to have an impact.
You don't have to go to all the way to user space and sort of like at a,
at a high level. You know, there's, there's this quote from, from Mike Stonebreaker from
over 40 years ago, where he's like, the bottom line is that operating system services
in many existing systems are too slow or inappropriate. And that sort of created this
mindset in the database community that like the OS is not our friend.
It's very adversarial between database system designers and operating system designers.
And I feel like I've always taken issue with that in my time as a graduate student.
And like I will argue with Andy that like we can coexist and we can have this symbiotic relationship because of things like particularly eBPF.
Because before that, the syscall interface
was a little too rigid. You couldn't quite get the kernel to do what you wanted it to do. But now we
can actually sort of tweak the kernel just enough to get what we want from it as database system
designers, or general application designers, network applications, whatever it is. I'm not
going to argue with a Turing award winner you know with mike um
but like the landscape is is changing enough now in the os community that application developers
high performance application developers if if they want to push their logic down into the kernel
apply user bypass i'll keep saying that phrase because i want to coin that but uh the
that i think triggers more of the case study
for something like that working um and showing the benefits of it and absolutely there are trade-offs
but i think in in the right places it can be pretty impactful yeah well i think the flag has
definitely been planted so yeah hopefully it can be uh have a lot of impact going forward
cool and i know you've worked on um lots of other
cool things across across your um your phd so maybe you can kind of just give the listener a
rundown of some of the other things you've worked on over your time as a grad student
yeah so i mean when i first started doing research at cmu we were very heavily invested and and we
still have students working on these problems of sort of these self-driving or autonomous database
systems sort of machine learning for systems um So when I sort of alluded briefly earlier that
I'd done BPF for observability, that was actually to collect training data for some of the autonomous
database system, self-driving database system research we were doing. So I've spent a lot of
time working on, you know, thinking about these ML for systems problems. How do you efficiently collect training data for these sorts of systems? That actually required me to help
build an entire system from scratch. We have an in-memory database system called NoisePage that
sort of took a lot of inspiration from Hyper with how we do MVCC and concurrency control and logging and stuff like that.
And then, so, you know, going forward, we continue to do sort of this self-driving database
system research, looking at things like, again, training data collection has this huge overhead,
looking at things like, how do you create a standardized API for this problem?
Something like a GIM from the OpenAI community as it relates to the problems
we have in database systems and sort of trying to bring ML to solve some of the problems
there.
And then personally sort of going forward, looking at, again, how do we do user bypass
in more contexts?
And maybe the crazy question of like, well, what would it, what would an embedded database system
for eBPF programs look like? What sort of rich applications would that open up? Um, so those are
the sort of the problems we're thinking about now, if you can imagine something like a, like a,
a rocks DB or a Berkeley DB, if eBPF had access to something like that, what sort of problems
could you solve? Yeah. Nice. Cool. Um, awesome. So yeah, so yeah i i my regular listeners will will know that this
this next question is my favorite question and um it's all about the creative process so i'd like to
know more about how you approach idea generation matt and how you then go about selecting the
projects from the ideas you've generated yeah so your creative process what is it what's the secret i i'm
lucky enough i mean a lot of this comes down to and it's sort of a cop-out answer in graduate
school right like it comes down to your relationship with your advisor right like so for me i have a
lot of freedom to sort of um think about the problems that that that are interesting to me
sort of a lot of it is uh obviously reading but it's not just papers like, sort of a lot of it is, uh, obviously reading, but it's not just
papers. Like, like I said, a lot of it was reading what people are saying about database systems in
industry, keeping in touch with sort of wanting to keep grounded with what I think are real
problems that people are trying to solve with database systems and what are, what are they
facing in, in, on the industry side and, and balancing sort of the research problems with the industry problems.
And thankfully, I'm surrounded by just a lot of smart people in the community, whether
that's at CMU, whether that's in the broader database systems research community, or even
broader just computer science research community.
Like there's so many people that we can just sort of bounce ideas off of.
So I get to sort of like write
toy implementations and think about a problem for like a month or two and try to prototype a
solution for, and, and like, it's very data driven and trying to figure out like, what are the
workloads that make sense? And does this actually solve a problem there? And what are the trade-offs?
And then it, the paper just sort of grows from there. Usually the research project of like,
okay, this, this, this uh this looks promising and
then you end up usually rewriting the whole thing and actually going all in and you know you throw
away your toy implementation and and start actually okay let's do this right now so it's
very iterative it's a lot of toy and code experimental code early prototyping um always
making sure that like everything's backed up by by, by numbers early on. Um, and not
just going with, with, um, your hunch, like, like again, sort of looking at it from a performance
engineering standpoint, like make sure you can measure the problem because like, if you can't,
you're going to have a hard time justifying its existence. And that that's true in the research
community. That's true on the industry side. Like if you can't measure the problem, you're
going to have a hard time justifying that it's a problem at all and that your solution is you know deserves merit
yeah all right that's a really nice answer to that question it's another one for my collection
thank you um great stuff yeah so now you've got a another opportunity to call you phrase so what's
the the one takeaway you want the listener to get from this podcast today? I'll say that the operating system is not your enemy.
As much as Andy will side-eye me every time I say that,
because he says these operating system services are not written for you,
and there are comments from Linus Torvalds
denigrating things about
database system developers as well, when we ask for things in the kernel. It's a contentious
relationship. But I do think there's a, we as high performance application writers and them as the
operating system community, we can coexist and we're getting the tools and the common language
to sort of solve problems together without having such a blunt instrument of, well, I'm going to go
full kernel bypass and take everything away from the operating system and do it all myself in user
space or go the opposite direction and write like a unikernel that isn't really able to be put into
production either. There is a happy middle
ground that actually solves a lot of problems i think yeah awesome that's funny when you've
been talking about this i can't get the image of you know an amkman whether that the rival
news groups kind of come together like a right fight that's the image that's been playing in
my head over and over again but anyway yeah that uh i think we could do fun uh that could be a fun
prompt for mid journey i think we could find some. That could be a fun prompt for mid journey.
I think we could find some way to do something fun with that, actually.
Cool. Great stuff. Well, let's end it there.
Thanks so much. Thanks so much, Matt, for coming on the show.
It's been a fantastic, fantastic chat.
If the listener is interested in knowing more about Matt's work,
we'll put links or anything in the show notes.
You can go and check that out. And Matt, what are your socials?
Where can we find you on the, on X Twitter, whatever it's called these days? I'm, I'm, I think I still
have a, have an X slash Twitter account. I don't use it. I just got blue sky. I'm not a social
media person. I'm on LinkedIn. People can find me if they want to talk about my work there,
shoot me an email. I'm happy to chat anytime. If people, it doesn't have to be about my work. It
can be about grad school. It can be about anything uh i'm always happy to hear from folks awesome
stuff great stuff and yeah they will i guess with that we'll see you all next time for some more
awesome computer science research Thank you.