Disseminate: The Computer Science Research Podcast - Matt Butrovich | Tigger: A Database Proxy That Bounces With User-Bypass | #45

Starting point is 00:00:00 Hello and welcome to Disseminate the Computer Science Research Podcast. I'm your host, Jack Wardby. The usual reminder that if you do enjoy the show, please consider supporting us through Buy Me a Coffee. It really helps us to keep making the show. Today, I've got with me Matt Butchevich, who will be telling us everything we need to know about database proxies. Now, this specifically is some work that Matt published at VLDB last year, and it's a paper called Tigger, a database proxy that balances with user bypass. Welcome to the show, Matt. Yeah, thanks for having me. Pleasure's all ours.

Starting point is 00:00:54 Cool. So let's get started then. So can you tell us more about yourself, Matt, and how you became interested in databases and database management research? Sure. So I mean, so for context, I'm currently a PhD student at Carnegie Mellon University in Pittsburgh, Pennsylvania, hopefully in my last year. And my background is sort of like, I've sort of worked my way up the computing stack in some ways. So my undergrad education and work experience was all in embedded devices, mostly.

Starting point is 00:01:23 So specifically storage devices, things like hard disk drives, SSDs. And so thinking about like, how does your data become durable at like a physical device level? And then when I started my master's at CMU, sort of working my way up the stack, thinking about things like file systems,

Starting point is 00:01:41 distributed file systems. And then I got to my advisor, Andy Pavlo's graduate course on database systems. And then I got to my advisor, Andy Pavlo's graduate course on database systems. And his enthusiasm was just infectious. And so I was really sort of interested in the systems that touch on so many aspects of computer science fundamentals, theory, programming languages, storage systems, network systems, and relating that to things that I had sort of seen before. So a database systems write a headlog is just a file systems journal. And I had seen things like asset properties and two-phase

Starting point is 00:02:10 commit before. So I had a lot of context to sort of step into the database system world and start working with Andy. And a year later, I was his PhD student. And's such a, a great, uh, problem space to, to work within the database system community because you have so much potential impact. Like every major application is backed by a database system. So there are so many opportunities for people to bring their sort of diverse backgrounds, different, different expertise, um, to bear in this sort of huge problem space and have a real impact on applications that people interact with every day. Yeah, for sure. There's plenty of problems to be getting stuck into for our careers and we're probably always going to have a job as well, right? Because it's that

Starting point is 00:02:54 important in most modern stacks these days. Cool. So we're going to be talking about proxies today, but maybe we could start off with some background and you can tell a listener how modern applications actually connect to databases these days and kind of what are the general problems with the approaches that they take today? Right. So, I mean, there are a couple of trends that stand out with the way sort of modern cloud applications, I guess, like connect to database systems these days. And on the one extreme, you have this independent scaling at the application layer, this sort of elastic compute that software applications designed for the cloud use now when they need to throw more resources at the problem. You suddenly have dozens or hundreds more servers, and these are opening hundreds or thousands more connections to the backend database system. So you have this challenge of persistent connections and this huge connection scaling challenge is created by this sort of elasticity at the application layer. And at the other end, you have serverless computing, which is creating

Starting point is 00:03:56 the opposite problem or in some ways like a contrasting problem to this sort of high connection count scale, persistent connections scaling issue of very short lived ephemeral connections that go through all the trouble of connecting to the database system, authenticating, often going through like an SSL handshake, only to submit a single query and then disappear. And these two extremes are challenging for a couple reasons for database systems at the persistent connection issue, when you get into large numbers of connections, the database system just sort of runs slower and slower. So we can see query latencies go from single milliseconds to hundreds of milliseconds because the database system has to apply concurrency control logic to more and more connections

Starting point is 00:04:41 in order to successfully commit transactions. Even if the connections aren't even doing anything, each idle connection in something like Postgres takes on the order of megabytes of memory. And that's before each connection starts populating its caches. So if you have tens of thousands of connections, you're suddenly using gigabytes of memory to do nothing. And then at the other extreme, with these short-lived serverless connections, again, I'll use Postgres as an example. Postgres uses the process model for parallelism. And so every connection that shows up, it has to fork a process. And this is a pretty heavyweight operation for the operating system

Starting point is 00:05:20 to deal with. And every time a connection goes away, it has to reap that process. It has to, especially because Postgres relies on shared memory, there's just a lot that goes into forking a single process. And this repeated forking and reaping creates a lot of challenges for database systems to keep up with. So the solution to this is a proxy. Can you tell us about what we actually mean by a proxy in this context and the benefits that they provide? We're talking about proxies that understand the actual application protocol, in this case, a database systems network protocol, like wire protocol. So they speak, again, for example, like the Postgres wire protocol. They sit right in the middle between

Starting point is 00:06:01 the front-end clients and the back-end database systems. Clients connect to them as if they're connecting to the database system. So they speak the same authentication protocol. The client's sort of oblivious that they're talking to a proxy at all. They issue their transactions, their queries, they get the results back as if they were just interacting with the database system. So the way proxies actually help sort of the problems we described before, because it seems a little counterintuitive that putting a box in the middle, like a middleware in between a client and a backend database system would actually help your application run better. The big thing that they do is, at least when we're talking about like a high number of connections, is something called connection pooling. So you

Starting point is 00:06:36 can stick 10,000 clients in front of a proxy. They all connect to the proxy and the proxy can actually reduce the number of persistent connections to the backend database system. Assuming sort of like these 10,000 clients aren't like maxing out their throughput and completely like saturating their connections as fast as possible. These proxies can actually multiplex the transactions from the front end to a smaller number of backend connections. So like you may be able to serve 10,000 frontend clients over 40 backend connections by just sort of interleaving the operations on the persistent connections to the database system. So by having a fewer number of connections at the database system, you sort of get the benefits of serving 10,000 clients, but you don't actually have the database system reasoning about that many clients and it's able to operate more efficiently because

Starting point is 00:07:23 those data structures are smaller. So we've seen, you know, on the order of like single digit milliseconds going up to tens to hundreds of milliseconds, when you, when you have tens of thousands of connections, you can bring that latency back down by, by putting a proxy in between by actually doing connection pooling. At the other extreme with, with sort of serverless setups, you know, I mentioned things like the SSL handshake, the cost of forking a process, that's all painful on database systems. The proxy sort of absorbs all that cost. And you're sort of pushing the computational cost to a different application, oftentimes a different box, to absorb the CPU cycles that have to be spent on connection setup and connection teardown,

Starting point is 00:08:05 authentication, and logging in and all those other things. And the backend database system just gets to focus on running queries, which is really all we wanted to do. So yeah, like I said, it seems a little counterintuitive to putting a box in the middle can actually make your system run faster. But as sort of like a simple example, I think in the paper we showed very early on running a workload with 10,000 clients and running like 2,000 transactions per second on a standard benchmark, I think it was YCSB. You could bring something like the P99 latency down, it cuts it more than half, which is a huge number when people are reasoning about P99 latency. And you pay a

Starting point is 00:08:42 minuscule increase in minimum latency because you're doing an extra network hop. But if you can pull in that P99, in general, what we saw is putting a proxy in between would compress the distribution of transaction latencies because the backend database system wasn't having to reason about as many connections and its concurrency control protocols

Starting point is 00:09:02 could run more efficiently. I guess the other thing I'll say about proxies is the application isn't limited to just connection pooling. People are doing a lot more interesting things with them with respect to like automatic sharding. So if you think of like traditional middleware stuff like Vitesse, they're doing, you know, that was a way to scale out MySQL. People use them for workload mirroring to sort of replicate a workload across production and staging environments. Other people use it just for security reasons, credential management, transparent upgrades of your database system in the background. You put a proxy in the front and you can suddenly just switch the traffic transparently between

Starting point is 00:09:36 an older version and a newer version. I think one of the engineers at Meta talked to me at VLDB and said, yeah, they have more people working on their MySQL gateway, effectively their proxy, than they do actually have working on MySQL. It's such a core component of getting performance out of MySQL for them that they actually have way more engineering on their proxy. Well, yeah. I mean, that reduction in P99 latency there is, I mean, that just makes for such a better user experience as well right that sort of reliable compression that distribution is just makes the usability and the the experience of the system a lot nicer straight away so i can see why in this scenario the indirect the indirection layer is a massive win um and that's so interesting that kind of got more people working on their on their

Starting point is 00:10:19 gateway than than their um than their actual sort of i guess like deeper down the sort of database stack um a question on that though is how much of this and the need for these proxies is down to the fact that we are basically using database systems that are 30 40 years old like and kind of database systems that have been built today kind of have having this sort of this all these sort of features within them already or are they they kind of also making the same design decisions that postgres made that mysql made like how do they look compared to the old systems the new systems yeah it's interesting to see how the need for proxies is changing when you talk about a newer system i mean everything i keep talking about is like in the context of postgres which like you said is an old design and and and a lot

Starting point is 00:11:03 of systems have inherited those designs at decisions when we look at how many systems have forked Postgres. But in modern times, it's actually getting interesting, like, what is a proxy these days? Because a lot of these cloud native database systems, you look at like a modern Postgres replacement, like Neon, who's trying to do serverless Postgres. They've got a proxy in the front, but they're using it to sort of figure out which serverless compute system they want to like, you know, the user submits a query. That's sort of the persistent connection that the client sees. You submit your app, you submit your query to it, and then they need to route that to an appropriate serverless, you know, ephemeral compute node to run your query.

Starting point is 00:11:44 I think we're increasingly seeing more things, which is sort of why probably Facebook call or Meta calls it their gateway, right? Like this sort of persistent connection that the client connects to or the application connects to submits queries. And it sort of obfuscates what's happening on the back end of these database systems, how you choose to, you know, what sort of back end you send that query to, how many compute nodes you spin up. You know, I could argue something like, you know, how you interact with a modern cloud data warehouse like Snowflake. It's a web interface. You submit a query to it. You get

Starting point is 00:12:20 your results back. Is that a proxy? I don't know. The lines start to get very blurred between what is this persistent front end that clients connect to, submit queries to, and then they get routed to some sort of obfuscated cloud backend. And I think we're going to see more and more of this sort of stuff. Actually, it seemed like there was an interesting resurgence. Poly stores are back in a big way. That was sort of like a big discussion. And I feel like at the LDB this year of just like, there's all these heterogeneous cloud database systems that companies are relying on to solve their problems. And people are like, well, how do we consolidate this into a front end that figures out, well, which back end would actually be the

Starting point is 00:13:00 most efficient way to run this query? So if anything, I think we're going to see, I think we're going to call them proxies less and less. I think proxy implies like a very simple application that's not applying a lot of logic. And it's just sort of this front end that is sort of the persistent, you know, to steal the phrase gateway that the users submit their queries to, to interact with the database system. And sort of what happens behind the scenes is sort of obscured from the user. So yeah, to your point, proxies probably were brought into the industry to solve sort of scaling problems for systems like MySQL and Postgres. And they still solve that problem in a lot of ways. And it still lets a lot of people get off the ground

Starting point is 00:13:46 with sort of less elaborate database system solutions, and then you can start with a proxy to start to scale things out. But we're going to see the capabilities continue to grow as people use them for things like automatic sharding and putting maybe query caches in there, stuff like that. Yeah. Interesting how they come kind of more, I guess, general purpose or having a lot of functionality kind of absorbed into them.

Starting point is 00:14:09 What are a few of the, just to kind of drop some names in, some of the most common proxies at the moment out there that people use? So on the Postgres side, pgBouncer is the one that people always start with. It's been around for ages. There was pgP pool two as well. And PG pool two made its name because, you know, it was around before Postgres had native replication. So if you wanted to do like PG pool two was sort of solving the problem of, of bringing sort of distributed durability to Postgres systems before Postgres sort of had a lot of those

Starting point is 00:14:42 native capabilities on the, on the MySQL side,SQL side, you've got things like proxy SQL. I think I mentioned, again, the test markets itself as more of a middleware. But again, it was a proxy to help them scale out MySQL. And then there's a number of folks that have sort of looked at PgBouncer and started to either rewrite it or fork it or do different things with it. So the folks at Yandex actually wrote their own Postgres proxy they call Odyssey, which is meant to be like a much higher performance version of PG Bouncer because they found it a little too restrictive. And then the folks behind Postgres ML have a Rust-based proxy that they've been working on for a year or two called PGCAT that they mostly use for sharding,

Starting point is 00:15:28 but it also does connection pooling as well. So there's a number of these out there because ultimately the application logic is pretty simple, depending on how many features you want to put in there. They're pretty simple applications. Yeah, nice. Cool. So I guess some of these proxies, some of the current proxy implementations, you identified there's some pitfalls in them and they're not perfect, hence the paper. So what are the problems with the sort of current family or the current iteration of these proxies? you know, in particular, someone like Yandex, they were looking at something like PG Bouncer and saying like, the performance just isn't there. We need to write a multi-threaded implementation.

Starting point is 00:16:09 PG Bouncer is a single threaded application. It's written in C. It's a very simple application that just kind of looks at the Postgres protocol and manages connection pooling. But like, if you want to get better performance out of it, particularly parallel performance, you have to rely on things like running multiple instances of PG balancer on the same box, either put them on different ports with a sort of L3, L4, like HA proxy in front to sort of load balance across multiple PG bouncers. You see people with that sort of setup. Other folks can use, you know, I think the reuse port option in Linux now lets you run multiple applications off the same port and it'll do load balancing. So you can run multiple PG

Starting point is 00:16:49 bouncers on the same box, listening on the same port and sort of Linux manages that port reuse for you. But in general, like PG bouncer setups got very complicated as people wanted to get performance out of them. And there was one blog post, I want to say it was from Figma, where they were looking at where's PGBouncer spending its CPU cycles when this thing's maxed out. And they actually found it was burning all of its CPU cycles effectively on system calls, mostly on mem copies, when it's doing SSL read and SSL write. That's where it's burning all its CPU cycles. And that's sort of a new challenge for us as we look at building high performance applications, whether they're

Starting point is 00:17:29 database systems, whether they're network applications, or something kind of in the middle, like a database proxy, of what do you do when your application logic is so lightweight in something like pgBouncer, that sort of getting access to the data from the you know through the operating system through the os services it sort of becomes your bottleneck and and what how do you how do you design applications in that sort of problem space awesome cool so kind of when you're thinking about you kind of come up with this alternative architecture for solving this problem which you call user bypass so can you tell us more about user bypass and how it helps maybe solve some of this kind of the bottleneck that the operating system can be in this scenario? Right. I mean, so for a little context on user bypass, I want to

Starting point is 00:18:15 talk about kernel bypass for a second, right? So like I'm describing this problem of, okay, your application logic is very lightweight. You want access to your IO data and you're paying this huge cost to use the OS services. So, you know, something that's been around for decades now at this point was this notion of, okay, do kernel bypass. So it's like for networking, if you're, if you have an Intel NIC, you can get Intel's DPDK framework and you can design an application around a kernel bypass and the DPDK framework that lets you sort of read data directly from the network device, bypass the OS stack, get it to user space as

Starting point is 00:18:52 fast as possible. It's sort of the idea behind kernel bypass. Get your data up to the user space application logic as quickly as possible and then send it back out. And you can do that. There are applications in production that use kernel bypass. In practice, it's been, you know, we've looked at it. It's been tough to use in general because the API has generally not been stable. So putting something into production has its pitfalls of like your application release is going to be tied to a specific driver release. If a new driver comes out, you need a new application release and vice versa. Getting good performance out is really tough, depending on how you design the work queue and multi-threading in your application layer.

Starting point is 00:19:35 Are you pulling on a single core? Are you event driven? Things like that. So we sort of argued kernel bypass is maybe not the solution here, which is where we said, okay, well, instead of bringing your application data up to user space as quickly as possible, what if you could sort of push your application logic down to the data as closely as possible? And so we've said, okay, well, that's user bypass, effectively bypass user space. And this sort of isn't, it's not a new idea. This notion of kernel offload, extensible operating systems, spin OS in the 90s, exokernel, people have been talking about operating system extensibility and embedding more application logic in the operating system for a long time. If you look at the extreme end, you have unikernels. So where we argue, we're reasoning

Starting point is 00:20:23 about things a little bit differently is we're relying on these new extensibility features in the Linux operating system called eBPF. And people debate on whether it still even stands for the extended Berkeley packet filter. People just say it's eBPF now. That's the technology. It's not an acronym anymore. Whatever, let them argue about it. But the idea is we can use this extensibility feature in the Linux kernel that allows you to write these sort of safe event-driven programs that aren't quite as invasive as loading something like a kernel module, which has security concerns, or even

Starting point is 00:20:57 compiling your own kernel. It sits in this nice middle ground that actually allows us to extend the operating system in a way we really couldn't before with these earlier efforts, either due to lack of a standard API, lack of adoption with things like, again, like spin OS, exokernel, like they were academic efforts, but we never really saw anything in the real world that we could use to sort of achieve this idea of user bypass and designing applications that either sort of take a hybrid approach to some operations happening in user space, some operations happening in kernel space, or just entirely pushing all of our application logic into the kernel and never having to go to user space at all.

Starting point is 00:21:38 Yeah, nice. I just want to ask you a quick question about eBPF. And you said before some of the old sort of attempts in this space weren't very stable and the API wasn't stable. Is that a different story with eBPF. And you said before some of the old sort of attempts in this space weren't very stable and the API wasn't stable. Is that a different story with eBPF? That is something that you can use in production with a lot more sort of guaranteed and that you aren't going to be tied to a particular version, like you said, or anything like that. Yeah. I mean, I'll be honest. It's a little bit of a cop-out on my part. eBPF has absolutely been a moving target the last 10 years. I think EPPF was officially

Starting point is 00:22:06 branded that in 2013. So it's been a decade now. In the six years I've been a grad student, the ecosystem has changed wildly. Now it's getting better. Strictly for the better, the way I had to write EPPF programs five years ago is very different than how I write them now. And the tooling and everything has gotten better. But that doesn't mean it hasn't been a moving target and APIs weren't stable for a few of those years. I think we've reached a point where I could actually, you know, endorse it in a production environment. And like for what little that carries as an academic graduate student saying you should use eBPF. But like, I mean, it is in use now.

Starting point is 00:22:44 Like, you know, Meta is running hundreds of eBPF programs on every server. but like it's i mean it is in use now like yeah you know meta's running hundreds of ebpf programs on every server netflix uses it google uses it it seemed pretty wide adoption now and the people pushing these techno this technology uh have really thrown their weight behind it yeah there's a little bit of cop out to say like ebpf you know implying that ebpf would have like a stable api over the last five years because it hasn't. But it's, it's, I think it's, it's the direction forward for when we look at how to do extensible operating systems in production right now. Nice, cool. So you've, you've used this and you've designed a DBMS proxy, which I guess is the star of the paper, Tigger. And I think it's a great name as well i like the names of things and i and i guess i can guess why it's called tigger right but yeah and i'd like to know

Starting point is 00:23:30 more about the actual naming discussion and who came up with that and whose idea it was but yeah tell us about the design and implementation of together there we go the floor's yours well we can we can do a quick derail on the name that was because of my wife she's a she's a big winnie the pooh fan and i think i i mean we've seen these weird winnie the pooh movies like i read a couple years ago like winnie the pooh went into the public domain uh so that's why we see these weird like winnie the pooh horror movies now like just they're just making weird stuff with that license now and i think when i first submitted the paper i was just like oh winnie the pooh's in the public domain i can use the name tigger they won't sue me they being disney i guess um and i realized later that that i guess tigger was

Starting point is 00:24:10 in a later book like a year or two after so tigger the character wasn't actually in the public domain yet so i'm not actually it's not actually clear to me where that stands right now so we're just we're just not going to tell disney about of it. But hopefully they're not watching the database research community. You never know. Wait until that email from the lawyer, right? I hope not. I can't afford those legal bills. So, I mean, when we designed the proxy, the basic idea here was let's use this idea of user bypass,

Starting point is 00:24:41 of avoiding going to user space to sort of fast path the common operations that we see out of these database proxies. Like let's just take pgbouncer. It's, it's like I said, it's, it's actually a pretty simple code base to read. It's, it's single threaded. It's written in C. I'm, I was pretty familiar with the Postgres code base. So C was comfortable to me. And so it was the protocol and take that you sort of surgically insert some some short little little ebpf uh hooks and and and you know events that that need to fire in order to to create the to get these ebpf programs run but the idea was like okay for these common operations like connection pooling where effectively what the proxy is doing is it's reading a socket

Starting point is 00:25:23 it's applying postgres protocol specific knowledge seeing like what's the transaction status? What is the message that actually showed up? Is it a query? Are we changing the settings? What is it? Figure out which backend to send it to, send it to the correct backend. And then on the server connection, when a response comes back, it's figuring out who does that belong to? What is this message? It's usually a query result. Send it back to the correct client. That's not that in the kernel and you hook that into like the networking stack of the operating system. So for those sorts of messages, hopefully the common ones, queries, query results, you never have to slip those to user space. You never have to wake up a user thread. They don't have to call a system call and do extra buffer copies and sort of all the trouble that

Starting point is 00:26:24 goes along with running a user space application. You can just handle that in the network layer of the kernel when it runs on a software interrupt, when the message arrived, and send it back out to where it's supposed to go. And then for the less common messages, like changing settings or something like that, you just slip that to user space and you let the user space component sort of handle that stuff. So Tigger is sort of a hybrid user bypass application where you take pgBouncer and then you embed all this application logic in the kernel with eBPF specifically to handle things like connection pooling. And then we also implemented another feature I think was workload mirroring to be able to replicate a workload between multiple backends.

Starting point is 00:27:06 But so, yeah, we started from PGBouncer and that worked out really nicely for us. Nice. Yeah. And a quick question on the hybrid aspects of it there. And is there any way, again, I don't know if this is actually a valid question, but say you implemented for all of the operations, the opportunity to either use to go straight through the OS or to use the user bypass path. Is there a way that you can sort of like, okay, after a certain threshold, switch to using user bypass because it's now more, you're going to get more like savings in some way than this. Just send it to the OS, right? Or in an ideal world, would you just skip the OS altogether and everything would run through the user bypass? You just chose not to choose for some of the less common operations

Starting point is 00:27:46 because there was just no point for the workload. Or is it some sort of dynamism? What's the word? I can't think of the word I'm trying to say. But yeah, maybe we'll have my questions come across. Yeah, I mean, so you could certainly figure out if there's like a sweet spot for how much of the work to do in letting it slip to user space. And in fact, like, the resources are finite in the kernel,

Starting point is 00:28:09 in the sense that like, you sort of have to say, when you start the system, like, how many how many sockets are we actually going to let like, run these sort of user bypass programs. And at some point, that will become saturated. And even queries and the responses could slip to user space if like we couldn't handle it in kernel space. Because like the kernel adds these constraints about like, you can't wait, you can't yield, like everything needs to like happen right now. So if the resource isn't available to do something when that event is firing, the only thing you can do is slip it to user space and sort of let it actually sit in a queue or a buffer and wait for it to actually

Starting point is 00:28:50 be pulled off that buffer. So user space is sort of this fallback slow path when you can't actually service it in time. If every software interrupt handler in the kernel is already busy running one of these sort of user bypass programs to handle the connection pooling, you have no choice but to sort of slip it to user space. So in that sense, you sort of get dynamism for free. But we also, you know, you sort of alluded to this, we sort of went after queries in the results, because we're like, well, that's gonna we're gonna get the most bang for your buck,. There are certain other message types that the protocol will drop in, but we're less interested in those.

Starting point is 00:29:29 And then there are some actual limitations where certain message types you just can't handle with user bypass in kernel space. You just couldn't do it. So certain forms of authentication messages, like if you're using something more complicated that requires hashing with a specific hashing algorithm for passwords or something like that, a lot of those algorithms you either can't express in a program that will satisfy. There's this BPF verifier that sort of like the kernel has to make sure your program doesn't run for too long. So either the hashing algorithm would either isn't going to run in time for the to satisfy the the verifier, or it's too complicated and you can't link in external libraries to a BPF program. So I couldn't just grab an off the shelf hashing library or authentication library and run

Starting point is 00:30:13 that in BPF. So we really just focused on queries and the results and slip everything else to user space. One from an engineering standpoint to for, uh, you know, bang for your buck on most of these workloads. Yeah yeah nice on the engineering effort there how long did this sort of take you to implement and how easy was it to sort of work with eb ebpf and the person you familiar approach was anyway and pg balancer so maybe that kind of eased the ease the job a little bit but yeah how difficult was it so it it wasn't too bad it was it was probably probably only a couple months of engineering it wasn't too bad. It was probably only a couple months of engineering. It wasn't too bad. I don't know what that means to different people based on when you say a couple months of

Starting point is 00:30:55 engineering, because some people are working 100-hour weeks, other people are working 35-hour weeks. It's hard to say what that means. The hardest part is, is, um, when your BPF program fails it, particularly with the verifier, it's very opaque why it failed. So the verifier is this, this, when you load a BP, an eBPF program, and I'm going to go back and forth between eBPF and BPF, just cause they're the same thing, but, but one's easier to say than the other, I guess. But like what the verifier does when you load your program is it's trying to verify that your program is going to terminate.

Starting point is 00:31:30 It's not going to access memory in a bad way. It's not using APIs. It shouldn't. And when it thinks you've done something wrong, it just yells at you in assembly, basically. It just throws a bunch of BPF bytecode, which just looks like assembly, basically. And you have to try to figure out like, oh, no, what did I change?

Starting point is 00:31:50 And what went wrong? And why is it angry at me? So that's where you lose most of your time with writing BPF programs. The logic itself, like, yeah, like I said, the Postgres protocol I was pretty familiar with already there's there's there's okay blog posts and like youtube videos where people discuss bpf but the documentation is actually quite poor um in the and that's part of the moving target thing even the official like man pages and linux kernel documentation is pretty rough your best hope is to actually look at

Starting point is 00:32:23 samples in the linux kernel code repo. Like the samples are pretty decent and because they have to compile and still run every time they do a new kernel release, they should still work. So you spend a lot of time reading kernel samples to understand like how the APIs actually work because the documentation is bad. Right. Cool. Yeah. So, right. Awesome. Let's talk some numbers. And so you've dropped a few numbers in throughout the chat so far, but can you tell us more about your experiments,

Starting point is 00:32:51 how you went evaluating Tigger and yeah, what your setup was, what you compared it against and yeah, some of the results. Right. So, so for the evaluation that ended up in the paper, we ran, I think everything in AWS EC2. For us, putting it in sort of a cloud-native environment, I guess, helped strengthen the argument that this is really a problem that it's a cloud-scale issue, right? And even in our case, when we're talking like 10,000 connections, some people would argue like that's not a cloud-scale problem. But for us, we put it in EC2 so So we could we could reason about, you know, data center latencies and tens of thousands of connections we use, you know, at the time, whatever their modern compute instances are like the C6Is, I think, everything was run in Ubuntu Linux, but really any modern Linux kernel should support like the Bps features that we needed and we compared against so i mean i've

Starting point is 00:33:45 discussed pg bouncer uh that's the standard option when you're talking about um a postgres proxy um we also compared against i also alluded to yandex's odyssey and again like odyssey is this it's a complete rewrite of of of pg bouncer they they kind of throw every trick in the book of like getting a fast user space application that they could that they could reason about in terms of like how some people argue you should design modern parallel applications so they rely on tricks like um user space co-routines uh to to get a lot of parallelism and user space and it's the, the code is, it's, it's impressive, but tough to read. Like they wrote their own assembly to handle. They're not using like the C++ because I don't

Starting point is 00:34:31 think that coroutine stuff went into C++ till C++ 20. So I think they wrote their own coroutine stuff that, that, you know, they have all the assembly instructions there to save the stack and swap that stuff in and out. It's a complicated piece of software, but they argued it was necessary to get the performance that they needed. And then today, I think we'll only talk about YCSB, which is the Yahoo Cloud Service Benchmark. It's effectively exercising a key value store over the Postgres wire protocol. This wasn't running on local host via stored procedures. So we actually used three servers in every experiment with an asterisk. The one experiment where we don't run three servers, we run two servers because we wanted to see how it compared to Unix domain sockets.

Starting point is 00:35:13 So we wanted to actually put the proxy and the database system on the same box, which is actually what Yandex does. So people deploy these proxies in a lot of different ways in the real world. Some people are, yeah, putting them on the same box and talking over Unix sockets with the database system. But there's a lot of design options there. But for us, we typically ran three boxes. One beefy box to be the workload generator running bench base to run things like YCSB and TPCC. The proxy was generally on a more constrained two-core box because people typically aren't throwing a ton of resources at their database system proxies. And then we had Postgres running on a separate dedicated server, again, with a bunch of CPU cores and memory to hopefully make it such that Postgres wasn't often the bottleneck.

Starting point is 00:35:59 Because that was a bit of a challenge with sort of, if you want to benchmark proxies, you got to make sure your database system isn't the bottleneck, right? Like you want to try to actually look at the performance characteristics of these proxies. So that was a little bit of a challenge. Yeah, nice. So what were the results then? Give us the highlights, the headlines. Yeah. So I mean, one of the results that stood out to us was, so if we just run the YCSB benchmark at like max throughput and just say, okay, how, what can these proxies actually push for like a two core box in EC2, PG Bouncer and Odyssey could do about 32, 33,000 transactions per second. And Tigger could do 45,000. So over 40% increase in throughput. And the reason I highlight that one is this was a severely CPU constrained scenario. There's only

Starting point is 00:36:54 two CPU cores. So we're sort of seeing the benefits of the CPU efficiency of Tigger there. And so one thing we did in that experiment is we actually scaled the size of the proxy box. We threw more compute resources. And in particular, this would benefit Odyssey more than anything because Odyssey expresses parallelism. So we were just kind of curious, like how much CPU do you have to throw at this problem for Odyssey to catch up? I mean, it turned out you needed eight times the number of CPU cores at eight times the cost to actually get equivalent performance as Tigger's sort of baseline. So it was a huge compute cost for Odyssey in order to be able to catch up to what Tigger was able to do, which with a much, much less expensive box. So we sort of wanted to try to understand that, like what's actually going on here. So one of the things we

Starting point is 00:37:43 did is we sort of profiled all these applications and and it's it's it's difficult to sort of describe charts and figures in an audio setting so it's all try to try to keep it so like okay let's let's run ycsb and what we want to do is we fix the throughput so fix the throughput at the proxy. Sort of nothing's really bottlenecked in this scenario. The workload application isn't bottlenecked. The proxy isn't bottlenecked and the database isn't bottlenecked. Everyone's able to hold like this 2,000 transactions per second sort of throughput with like 10,000 clients on the front end.

Starting point is 00:38:20 And then we did some CPU profiling and said, okay, where are these proxies actually spending their CPU cycles? And as expected, PGBouncer and Odyssey had like the same amount of time spent in the kernel and in software interrupts, which is just sort of another way of classifying like doing network work in the kernel. They're handled by these sort of software interrupt handlers. And Odyssey actually used more time in user space than PG Bouncer did. It's sort of this complicated user space co-routine setup actually ended up using more CPU cycles in order to accomplish the same amount of work as PG Bouncer. Now, if you throw a bunch of compute at it in parallel, obviously it can pull ahead. Like

Starting point is 00:39:00 that's worth the cost for some folks. But then when you compare against Tigger, there was no user space CPU time, unsurprisingly, for this sort of workload, all the work's done in kernel space. And there's actually even less time spent in the kernel and software interrupts, which is a little counterintuitive, but it's able to handle all of its work in a single software interrupt. It's able to handle it all on the receive end. Something arrives at Tigger, it immediately figures out what to do with it and sends it right back out. You don't actually have to handle another sort of software interrupter, more kernel code on the transmission side. So you're actually reducing the total number of CPU cycles even spent

Starting point is 00:39:32 in the kernel to sort of apply something like user bypass. Awesome. So it sounds like a win on all fronts then. So I mean, I always have to ask this question and I got told off once for asking, are there any problems with it? And who was it? I can't remember who it was. I'll have to ask this question, and I got told off once for asking, are there any problems with it? Who was it? I can't remember who it was. I'll have to edit this bit out. But I asked them, are there any, what do they say, scenarios in which your tool, whatever it was, doesn't perform well?

Starting point is 00:39:53 And they said, I should ask, is there any scenarios in which Tigger is suboptimal, the way I should ask these questions? So, yeah. Yeah, so, I mean, I alluded to some of the problems with Tigger and maybe user bypass. I'm just like, eBPF is still not the friendliest environment. Like someone can't just say, I'm going to take my application, I'm going to apply user

Starting point is 00:40:12 bypass the way Tigger did. Like, could someone use Tigger in production as like their proxy? Absolutely. Would it make sense for everyone? I don't know. There's a lot of reasons people are still, like I mentioned, sort of these big companies that are all in on eBPF. It's not clear to me if you need those sorts of resources in order to adopt eBPF though. Like could these small companies who are just deploying pgBouncer to sort of scale out Postgres, like would they feel comfortable deploying an

Starting point is 00:40:40 eBPF backed application? I don't know. Like, did they have the compute resource or the engineering resources to vet these BPF programs and make the judgment call of like, should this be put into production for us? Because like, these programs have to run as root or you effectively need like sysadmin capabilities in order to load eBPF programs. So the engineering decision of whether you would actually want to take this approach, I'm not sure what that looks like in the real world in terms of like what companies are making the value call. Obviously, the big companies are making the value call saying we're in on eBPF. It's worth the engineering complexity to get these performance wins that I kind of

Starting point is 00:41:17 describe here of like you can solve the same problem using far fewer CPU cycles if you can get the BPF program written and are comfortable putting it into production. The other thing is like you're trading off, you're trading off like complexity of understanding your user space application for understanding the Linux kernel. Now, to be fair, to get really good performance out of something like Odyssey or any other network application or even database system, you do have to tune things like IRQ affinities in the Linux kernel and understanding how soft IRQs are handled and how it handles top half and bottom half of interrupts and deferred aspects of interrupts. There's all this complicated inner workings of how

Starting point is 00:41:57 things like interrupts are handled in the Linux kernel. And if you really want to build a fast application like Tigger to apply user bypass for network IOs or even storage IOs, you're going to end up reading a lot of kernel code to understand like how to, one, where to sort of surgically insert your solution into the OS stack and how to get the best performance out of it. Because if you don't tune the kernel right, you're still going to get quite poor performance out of this, I think. Cool, I guess, yeah. So where, you said that you could probably use Tigger now in production if you wanted to. So where do you go next with Tigger and what's next on the research agenda?

Starting point is 00:42:34 Is it productization of it or is there other features you want to work with? I know obviously you're coming to the end of your PhD, so you've probably got other sort of pressing matters, but where would you go next with it? I think there's kind of two directions you could go off of sort of this paper. One is just sort of pressing matters, but where would you go next with it? I think there's kind of two directions you could go off of sort of this paper. One is just sort of, I think we alluded to this in sort of future work is like, and I alluded to this earlier

Starting point is 00:42:52 in our conversation, there's a lot of features that the proxies can support things like sharding, things like query rewriting, query result caching. Does it make sense to push some of those sorts of things, more features? How far can you go with eBPF to satisfy the verifier and storing things in the kernel? You can sort of think as proxies, as sort of, if they're this sort of middle box sitting between clients and the database system front end. We started thinking as well about like, well, in these distributed cloud data warehouse systems, you have these shuffle nodes that are sort of ephemerally holding onto data as well, right? They start crunching on some results and they do one query operator and then they forward it to a shuffle node, which sort of redistributes it to another set of workers. It's sort of the old

Starting point is 00:43:40 MapReduce thing we've had around for ages now, it seems like. And these shuffle nodes are sort of ephemerally holding onto data as well. Like, does that make sense to be a sort of application class that you push into the kernel? You don't have to go to user space for things like shuffle nodes. The problem is like, in the same way that what is a proxy isn't standardized, what is a shuffle node isn't standardized either. So like, I think BigQuery has dedicated shuffle nodes, Redshift, it's just another operator to them. So like special casing shuffle node logic may or may not make sense depending on the

Starting point is 00:44:11 cloud system you're talking about. And the other thing that I think would be interesting that came out of this work, it would, I think, benefit the academic community to sort of just have a survey paper that looks at like, how do you deploy these database system proxies? And what are the trade-offs? You know, we tried to do the best we could with the limited background section, the space that we had to sort of introduce this concept of database proxies to the community. But it raised a lot of questions of just people like, well, I have this workload, would a proxy help me here? I have this many front-end connections,

Starting point is 00:44:42 and I have this many backend servers, and would a proxy help, you know, my performance here? I have this many front end connections and I have this many backend servers and would a proxy help, you know, my, my performance here. And it's, it's like, I don't know, you kind of got to just got to try it, but it would be interesting to see like a, a sensitivity analysis and a lot of different dimensions to understand like how much compute should you throw at your proxy? How many, how many persistent connections should it have for a given workload? And you start getting into like, you could get into queuing theory stuff of just like, okay, well, you could describe exactly, you could probably prescriptively say exactly how many persistent backend connections your proxy should have given X number of front end clients submitting work at X rate. Yeah, you could solve that problem, you know, apply some queuing theory, but like

Starting point is 00:45:22 something prescriptive to say like these things exist and and they have all these interesting properties and capabilities but there's these all these different trade-offs like an experiments and analysis paper would be really interesting to see there yeah for sure we need that sort of canonical paper right that we can reference and rather than i mean obviously you kind of say oh yeah just go and look at and see and find out what whatever works for you but yeah we would definitely um benefit in the community would benefit from having that sort of work done because i mean i mean you obviously you didn't you've done a lot of extensive background so how much work has there been in the database community from the research side on this sort of in this sort of area it feels

Starting point is 00:45:57 to me sort of very untouched still i mean this feels like an initial sort of foray into it but it feels very much like a nice wide open area of research possibly. Yeah, I mean, effectively none. So if we can, I think you were going to ask a question, I think coming up that's just sort of like, what's the most interesting lesson I learned while working on Tigger? I'll get into that now. So like there's been very little academic discussion,

Starting point is 00:46:22 particularly on database proxies. The network community talks about proxies and you go look at an SDI. People are doing proxies for different things, oftentimes application layer proxies, but sometimes not. Sometimes people are just talking about like, how do you do faster TCP proxy? But you alluded to like, there's not a lot of discussion in the community. And that's a bit of a double-edged sword from a researcher standpoint. Like, we were really drawn to this problem because we were like, it seems like everybody's

Starting point is 00:46:50 deploying proxies in production when they need to scale out, particularly MySQL or Postgres. But like, you know, database systems you start with as you're doing your startup and your business grows. And the first thing people grab is a proxy to help scale out their backends. But no one's talking about it in the academic community, like completely oblivious to it. So on the one hand, as a researcher, you're like, this is super attractive. I can't wait to work in this area that like no one else is thinking about. The downside to that is no one knows what you're talking about. Like the researchers have, excuse me, The reviewers have no context and it's on you

Starting point is 00:47:26 to write that context and make them care about the problem and get them to understand this is a real problem in industry. People are using these things, they're solving a problem, and it can be good to plant a flag on a problem, but it's your challenge to make the community care about it. So like, I alluded to like, I would, I think we would be served by this sort of survey paper, this, this sort of, I wish I had written that first, because then you could have cited it. Like here, we had to do everything in the background section and, and making the community care about it was, was really challenging with the limited sort of page count that you usually have in a, in a background section of a paper. So like, it's a very double-edged sword to sort of see this

Starting point is 00:48:09 verdant empty pasture of research and go running towards it and realize that the responsibilities on you to make the community care yeah and get them to actually understand that it's it's it's sort of a a real problem yeah that's definitely a fun a fun a fun challenge i guess but i do feel like there often is that kind of disconnect between what's happening in the real world and what's happening in research right it's really hard to sort of bridge that gap and then get people in either space to care about the thing the other person cares about right but yeah no i definitely agree with you there kind of going on kind of kind of going on from that a little bit is can you tell us more about the backstories this paper and so

Starting point is 00:48:49 how did you come across this being a problem you said like i observed it like those people doing it no one was kind of in the research world was thinking about it how did you kind of get to that point so it was it was sort of a bunch of different interests coming together so i had already been exposed to BPF for some previous work I had done where we really relied on BPF for observability. And so I had sort of dipped my toes into that community, learning about BPF. Another was sort of just reading what people were saying about scaling out their database systems, where they're talking about like, hey, here's how I had to deploy pgBouncer to solve this problem. And what are

Starting point is 00:49:27 the trade-offs of pgBouncer, MySQL? Some of it was sort of like, you know, off the record conversations of just, you know, Andy is working, he's got his startup on the side. And he's just like, man, a lot of people are using proxies these days. Just sort of this anecdotal observation that seeing these things a lot in production and an aspect of database systems that we hadn't really reasoned about as an academic community. And increasingly just thinking like, these are interesting applications. There are interesting problems you could solve by putting a box in the middle. And because suddenly you have observability of the entire workload, you can permute the the workload if you want like there's all these tricks you can do if you have this sort of middle box so that got us really excited about this idea of like proxies are are sort of accepted

Starting point is 00:50:14 by the industry community as like a way to solve database system problems and then my background in just sort of performance engineering of trying to figure out like why is code slow how could you make it faster and it's like oh wait code is slow, just because we're interacting with the operating system. And because we're going to user space to do the work, you know, do proxies sit in this perfect, like, you know, the Venn diagram of like, the application logic isn't that complicated, it sits in the network layer. So you know, we could use BPF to sort of try to solve this problem. And there's enough, you know, people in industry, they're not papers we can cite,

Starting point is 00:50:48 but people have written enough online that they can at least say like they're used and that there's a little bit of credibility there of just like, this is an actual real problem space. Yeah. Nice. When you're kind of working on Tigger then, I'm not sure if we've covered this so far or not, but were there sort of any, because I like to ask this because obviously research is nonlinear, right?

Starting point is 00:51:10 There's ups and downs, you hit dead ends. Were there some things that you tried along the way that you kind of, that failed? And yeah, some of them are war stories kind of with Tigger. Yeah, I mean, I think part of the experiment process and part of the design process was trying to figure out like, for these BPF programs, there's a lot of different layers you can hook in, in the Linux networking stack, like how far down do you want to push your application logic at an extreme end, you can push it all the way down to if your NIC supports it down to the hardware, so that you could basically be interrupted, you know, intercepting network communications. It's called, they're called XDP BPF programs. And depending on what your OS and what your NIC supports, you can push your logic really far down. So we started actually looking all the way down there. Like, it's like, could you handle this at the XDP layer, but you lose some of the

Starting point is 00:52:02 abstractions that you, that we rely on. I'm just like, okay, well, suddenly you're below TCP. So we looked at trying to figure out like, could you make this work without doing TCP? No, you really got to maintain your own TCP state machine. There have been other efforts to use BPF to sort of fast path network communications and people just say, oh, we use UDP because it's a much easier protocol to reason about and sort of maintain our own state for but like we spent a little bit of time trying to figure like could we do our own tcp state machine down there would you get the performance benefits like no move up a layer let the os handle tcp for you um it also gets you access to things

Starting point is 00:52:41 like you could use um kernel tls so you you know you could still use s things like you could use kernel TLS. So you, you know, you could still use SSL, like people could still use encryption to talk to the database system. Like if you push all the way down to the XDP layer, you sort of lose those capabilities. So we spend a lot of time trying to figure out like, and some false starts of like writing code at the XDP layer, different, different layers in the Linux kernel. And this is where I said, like the, one of the challenges of BPF is like, you spend a lot of time like the internals of the linux kernel in the software stack itself of like just being proficient figuring out how to write an application that gets you the performance you want you actually have to understand a lot of the linux kernel which is kind of a pain um so

Starting point is 00:53:17 yeah like like some some was sort of experimental like trying to make it work at xtp others was just spending a lot of time understanding the ins and outs of the of the kernel stack and uh arguably losing time on that but like that's part of the process i guess yeah for sure and you never know at what point in the future that knowledge you've gained by going down that path is going to help you out with something else right so it's it's always not lost positive yeah. You've learned something, right, which is always good. Cool, yeah. So, I mean, obviously, Tigger's very sort of pioneering in the sense of the technology it's using to sort of solve this problem. But what kind of impact do you think Tigger can have,

Starting point is 00:53:59 maybe on like someone's day-to-day life as a software engineer or a data engineer? And what sort of wider ramifications do you think this work can kind of have for the database community as well yeah so i like to think that tigger is just more of a almost a case study or example for for like the loftier goal this idea of user bypass so that's actually sort of like what my dissertation will actually end up being on this notion of like, we want to design software that sort of contrast this kernel bypass approach and instead push application logic into the kernel. We can do this now because of things like EVPF. We don't have to load kernel modules. We don't

Starting point is 00:54:34 have to compile our own kernel. So for like the average person, like a lot of people don't need a database proxy, but are people doing applications or writing applications or relying on applications where like, you know, particularly network or really stuff related to IOs, where you're bringing data all the way to user space, doing very little with it and either discarding it or sending it back out. Those may be opportunities for something like user bypass to have an impact. You don't have to go to all the way to user space and sort of like at a, at a high level. You know, there's, there's this quote from, from Mike Stonebreaker from over 40 years ago, where he's like, the bottom line is that operating system services

Starting point is 00:55:17 in many existing systems are too slow or inappropriate. And that sort of created this mindset in the database community that like the OS is not our friend. It's very adversarial between database system designers and operating system designers. And I feel like I've always taken issue with that in my time as a graduate student. And like I will argue with Andy that like we can coexist and we can have this symbiotic relationship because of things like particularly eBPF. Because before that, the syscall interface was a little too rigid. You couldn't quite get the kernel to do what you wanted it to do. But now we can actually sort of tweak the kernel just enough to get what we want from it as database system

Starting point is 00:55:56 designers, or general application designers, network applications, whatever it is. I'm not going to argue with a Turing award winner you know with mike um but like the landscape is is changing enough now in the os community that application developers high performance application developers if if they want to push their logic down into the kernel apply user bypass i'll keep saying that phrase because i want to coin that but uh the that i think triggers more of the case study for something like that working um and showing the benefits of it and absolutely there are trade-offs but i think in in the right places it can be pretty impactful yeah well i think the flag has

Starting point is 00:56:37 definitely been planted so yeah hopefully it can be uh have a lot of impact going forward cool and i know you've worked on um lots of other cool things across across your um your phd so maybe you can kind of just give the listener a rundown of some of the other things you've worked on over your time as a grad student yeah so i mean when i first started doing research at cmu we were very heavily invested and and we still have students working on these problems of sort of these self-driving or autonomous database systems sort of machine learning for systems um So when I sort of alluded briefly earlier that I'd done BPF for observability, that was actually to collect training data for some of the autonomous

Starting point is 00:57:15 database system, self-driving database system research we were doing. So I've spent a lot of time working on, you know, thinking about these ML for systems problems. How do you efficiently collect training data for these sorts of systems? That actually required me to help build an entire system from scratch. We have an in-memory database system called NoisePage that sort of took a lot of inspiration from Hyper with how we do MVCC and concurrency control and logging and stuff like that. And then, so, you know, going forward, we continue to do sort of this self-driving database system research, looking at things like, again, training data collection has this huge overhead, looking at things like, how do you create a standardized API for this problem? Something like a GIM from the OpenAI community as it relates to the problems

Starting point is 00:58:06 we have in database systems and sort of trying to bring ML to solve some of the problems there. And then personally sort of going forward, looking at, again, how do we do user bypass in more contexts? And maybe the crazy question of like, well, what would it, what would an embedded database system for eBPF programs look like? What sort of rich applications would that open up? Um, so those are the sort of the problems we're thinking about now, if you can imagine something like a, like a, a rocks DB or a Berkeley DB, if eBPF had access to something like that, what sort of problems

Starting point is 00:58:39 could you solve? Yeah. Nice. Cool. Um, awesome. So yeah, so yeah i i my regular listeners will will know that this this next question is my favorite question and um it's all about the creative process so i'd like to know more about how you approach idea generation matt and how you then go about selecting the projects from the ideas you've generated yeah so your creative process what is it what's the secret i i'm lucky enough i mean a lot of this comes down to and it's sort of a cop-out answer in graduate school right like it comes down to your relationship with your advisor right like so for me i have a lot of freedom to sort of um think about the problems that that that are interesting to me sort of a lot of it is uh obviously reading but it's not just papers like, sort of a lot of it is, uh, obviously reading, but it's not just

Starting point is 00:59:26 papers. Like, like I said, a lot of it was reading what people are saying about database systems in industry, keeping in touch with sort of wanting to keep grounded with what I think are real problems that people are trying to solve with database systems and what are, what are they facing in, in, on the industry side and, and balancing sort of the research problems with the industry problems. And thankfully, I'm surrounded by just a lot of smart people in the community, whether that's at CMU, whether that's in the broader database systems research community, or even broader just computer science research community. Like there's so many people that we can just sort of bounce ideas off of.

Starting point is 01:00:02 So I get to sort of like write toy implementations and think about a problem for like a month or two and try to prototype a solution for, and, and like, it's very data driven and trying to figure out like, what are the workloads that make sense? And does this actually solve a problem there? And what are the trade-offs? And then it, the paper just sort of grows from there. Usually the research project of like, okay, this, this, this uh this looks promising and then you end up usually rewriting the whole thing and actually going all in and you know you throw away your toy implementation and and start actually okay let's do this right now so it's

Starting point is 01:00:34 very iterative it's a lot of toy and code experimental code early prototyping um always making sure that like everything's backed up by by, by numbers early on. Um, and not just going with, with, um, your hunch, like, like again, sort of looking at it from a performance engineering standpoint, like make sure you can measure the problem because like, if you can't, you're going to have a hard time justifying its existence. And that that's true in the research community. That's true on the industry side. Like if you can't measure the problem, you're going to have a hard time justifying that it's a problem at all and that your solution is you know deserves merit yeah all right that's a really nice answer to that question it's another one for my collection

Starting point is 01:01:14 thank you um great stuff yeah so now you've got a another opportunity to call you phrase so what's the the one takeaway you want the listener to get from this podcast today? I'll say that the operating system is not your enemy. As much as Andy will side-eye me every time I say that, because he says these operating system services are not written for you, and there are comments from Linus Torvalds denigrating things about database system developers as well, when we ask for things in the kernel. It's a contentious relationship. But I do think there's a, we as high performance application writers and them as the

Starting point is 01:02:00 operating system community, we can coexist and we're getting the tools and the common language to sort of solve problems together without having such a blunt instrument of, well, I'm going to go full kernel bypass and take everything away from the operating system and do it all myself in user space or go the opposite direction and write like a unikernel that isn't really able to be put into production either. There is a happy middle ground that actually solves a lot of problems i think yeah awesome that's funny when you've been talking about this i can't get the image of you know an amkman whether that the rival news groups kind of come together like a right fight that's the image that's been playing in

Starting point is 01:02:36 my head over and over again but anyway yeah that uh i think we could do fun uh that could be a fun prompt for mid journey i think we could find some. That could be a fun prompt for mid journey. I think we could find some way to do something fun with that, actually. Cool. Great stuff. Well, let's end it there. Thanks so much. Thanks so much, Matt, for coming on the show. It's been a fantastic, fantastic chat. If the listener is interested in knowing more about Matt's work, we'll put links or anything in the show notes.

Starting point is 01:03:01 You can go and check that out. And Matt, what are your socials? Where can we find you on the, on X Twitter, whatever it's called these days? I'm, I'm, I think I still have a, have an X slash Twitter account. I don't use it. I just got blue sky. I'm not a social media person. I'm on LinkedIn. People can find me if they want to talk about my work there, shoot me an email. I'm happy to chat anytime. If people, it doesn't have to be about my work. It can be about grad school. It can be about anything uh i'm always happy to hear from folks awesome stuff great stuff and yeah they will i guess with that we'll see you all next time for some more awesome computer science research Thank you.

Your Ad Here

Disseminate: The Computer Science Research Podcast - Matt Butrovich | Tigger: A Database Proxy That Bounces With User-Bypass | #45

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.