Postgres FM - Connection poolers

Episode Date: July 14, 2023

Nikolay and Michael discuss Postgres connection poolers — when and why we need them, the king that is PgBouncer, and the many new pretenders to the throne. Here are links to a few things t...hey mentioned: max_connections https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-MAX-CONNECTIONS Improving Postgres Connection Scalability: Snapshots (blog post by Andres Freund) https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462 PgBouncer https://github.com/pgbouncer/pgbouncerOdyssey https://github.com/yandex/odysseyPgCat https://github.com/postgresml/pgcat Adopting PgCat: A Nextgen Postgres Proxy https://www.instacart.com/company/how-its-made/adopting-pgcat-a-nextgen-postgres-proxy/ Supavisor https://github.com/supabase/supavisor pgagroal https://github.com/agroal/pgagroalPgBouncer is useful, important, and fraught with peril (blog post from JP Camara) https://jpcamara.com/2023/04/12/pgbouncer-is-useful.html ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, hello, this is Postgres FM, episode number 54, and today is my turn to announce the title, although, like, we mixed everything after vacation, so Michael chose this topic, I wouldn't do it, but I need to announce it. Connection Pullers. Yeah, well, I think this is a really important topic for Postgres in general, so I'm keen to talk about it, but I also am aware that I need to get back into my reputation of picking topics that you find boring. So that was my main goal here. It's not as boring as others. Oh, damn. Yeah.
Starting point is 00:00:38 And I think it's not only important to Postgres, it's also important to any database system. And it's also important to any database system, and it's also important to how applications work with database system. Yeah, so should we talk a little bit about why first before getting into some of the details? Well, I think today we don't need it because we have application pullers on application side. Yeah, so the question is why do we need something on the Postgres side in addition?
Starting point is 00:01:04 And actually, I know we often something on the Postgres side in addition? Let's start from the application side. Why do we need it there? Because to create a connection is very expensive. Yeah, there's a few different types of overhead, aren't there? You know, in terms of latency, in terms of server resources. And yeah, in combination, I guess at the the beginning you could argue you don't if you only have a few users maybe five users of your little application if you have just 100 users for example yeah working simultaneously then you don't need anything no application puller you
Starting point is 00:01:38 don't need a database puller continue happily with your simple setup and probably don't do anything. If you have 100 users and 90 of them have very slow connection, for example, some internet in California, and they work remote, like from running application on their home internet connecting to your database somewhere, and you have only like eight cores probably you already need pg bouncer or something good point actually slow queries as well even if it's um even if even if it's not necessarily slow connection but really long running queries
Starting point is 00:02:17 is an interesting point as well but yeah in fact actually mentioning, a lot of application frameworks come with poolers by default. So even if you don't do anything, there's a chance that you're using one. Because we know Postgres is slow, right? I mean, creation connection is slow, not Postgres is slow. Let me apologize. Postgres is very fast, but connection creation is slow. Yeah, and until recently, the overhead of each connection was relatively high in terms of...
Starting point is 00:02:52 Until Postgres 14, right? Yeah, so is there a little bit of a, what's the advice, what's the standard before that version and a different standard afterwards? Does it just change the threshold that you need? Yeah, like some kind of rule. Take your number of cores and multiply by 2, 3, 4, 5, and this should be your max connections.
Starting point is 00:03:15 Don't go above it. For example, if you have an Intel server with 96 cores or 120-something, 28 cores, probably you shouldn't go above 500 but it was before posgis 14 which like posgis 14 now has improved work with snapshots and connection scalability and probably you can go well i i saw before people went to 1,000, 2,000. I saw 3,000 with servers like 96 cores. I told them, no, it doesn't feel good. They said, but we are fine.
Starting point is 00:03:52 I said, okay, let's just run PgBench in its silly default behavior stress testing when it tries to max out and consume all resources. And if you have additional 1,000 idle connections, you will see how overhead affects you. And it's an easy test. You see around PGA Bench, CTPS latencies. In this case, TPS, usually the main metric.
Starting point is 00:04:18 And then additional 1,000 connections, and you see like 20%, 30% penalty. I don't remember details, but it was something like that. So this is your price you're paying constantly. You make your server do additional work it could avoid. And interesting that in that particular case, they resisted installing PitchBouncer because it also said we have a Java application.
Starting point is 00:04:43 We have ConnectionPooler. It was something interesting name. I don't remember. It was interesting name. There is a pool on Java side, on Java application side, so we don't need it. But you know the problem, right? Not only if your application works very far and the connection is slow. This is one of the cases.
Starting point is 00:05:03 But usually in a good project, this is not a problem. Usually application code is quite, at least in the same region as your database. But the problem is different. When they scale stateless nodes, they add more and more nodes and forget to... Okay, they say, we can scale. Let's multiply number of application nodes by two. But they don't decrease pool sizes by two. So more idle connections are created because active connections don't change when they just add nodes, right?
Starting point is 00:05:35 Not immediately, no. Yeah. Well, they can grow over time or if there's some marketing campaign, they can spike, of course, load can spike. But if they just add more application nodes with the same usage in terms of users doing some work, asking for database to do some work, this scalability efforts for those who are responsible for application nodes lead to significant increase in number of idle connections. And this is how you can end up having 2000 connections on Postgres. And then you try to convince them to decrease pools.
Starting point is 00:06:13 They also resist, like it doesn't feel safe for us. And like, okay, this is time when we probably need connection pooler on database side. Yeah, awesome. So you've already mentioned pg bouncer that from my experience that's feels like very much the de facto standard and it has been for a long time i actually looked it up do you know which year it was first released in well i can suspect it was around 2000 probably six seven or so yeah great guess uh 2007 that's the date I saw. I remember Asko Oya and Marko Kreen from, maybe I pronounced it wrong,
Starting point is 00:06:50 I invited them to conference in 2007 or 2008. The developers of Sky Tools and so on, because Skype was hot in terms of Postgres usage at that time because it was a big company with goals like we need scalability to build on users. So it was impressive. Not only PgBouncer, but of course PgBouncer is probably the most successful product they created.
Starting point is 00:07:17 Yeah. And still to this day, pretty much the standard. It does seem though, in recent years, we've had a proliferation, hard word to say, of other tools. There is a new generation of pullers. It's related to many reasons and one of the reasons is PG Bouncer became true Postgres product. It can pass five years if you have major functionality proposed. Well, I guess to defend it a tiny bit i feel like pg bounces a similar stability level to postgres i feel like issues are as rare and if you need something that is safe that has been battle tested for years a pg bouncer for me is still the number one choice now if you there are
Starting point is 00:08:04 obviously newer ones that people are testing in very high, not necessarily testing, but have developed for themselves in extremely high throughput environments that are clearly working for their case. But if I needed one tomorrow and didn't have the resources to go and test all the others properly, I would still pick PG Bouncer myself. And which mode you would put it
Starting point is 00:08:25 on all right so this is okay so let's get into the because it's not free right there are downsides to having a puller no maybe maybe let's mention like we discussed this like there's new generation let's mention some names yeah okay great because first i think in this generation was Odyssey from Yandex team. It was several years ago. And I know how exactly they decided to create it. It's also written in C. And the idea was our pull requests are not accepted fast enough in PgBouncer. And also some things we would do differently.
Starting point is 00:08:58 I think they use threads because PgBouncer is very similar to Postgres process. And I personally bumped into the issue of single CPU usage not once, and it's very painful. You don't expect it if you don't monitor this single process saturation risks. Usually it's like after you passed like 10,000 TPS on single, or maybe 15,000. Single PGA Bouncer process is not enough. And at that time, SO reuse port feature wasn't supported by PGA Bouncer, so you need to run PGA Bouncer on different ports, and then you need to teach your applications to load balance, basically.
Starting point is 00:09:37 Or you need to put, some people put HA proxy, for example, as additional layer. It's like everything sounds not good. SO reuse port, it's a very good feature. You know it, right? No, but it makes sense. So SORU's port, it's a Linux feature which allows multiple processors to listen to the same port.
Starting point is 00:09:56 So a few years ago, finally, PgBouncer started to support it. And now you can just run multiple PgBouncers configured to listen to the same port, and Linux will decide how to balance it. So you can go beyond 10,000, 20,000 TPS, and more and more and utilize all cores you want to utilize. If you run PgBouncer on the same machine, you of course
Starting point is 00:10:23 take some resources from Postgres. This is also an interesting topic. We should probably touch it, where to run it. Because when we say closer to Postgres, it might be on the same machine or on a different machine. A lot of what I hear is that putting on a different machine is smart as long as it's close to the database. Some people say we lack structure in our podcast. I now see why, right? Because we jump, jump, jump.
Starting point is 00:10:46 Well, let's return to this topic as well. But mentioning new players, Odyssey, right? It's quite interesting. It has interesting features, really interesting features. I remember they presented it at PGCon a few years ago and so on. And it's also quite battle-tested in, I think, thousands of databases already. But I saw also complaints about some bugs. You are right.
Starting point is 00:11:12 Puller should be very reliable. It's like a network. If some issue happens, it affects everything and it's a global incident. But I think in many cases, as I said already, battle-proven, polished polished and so on this i cannot say about new players i like i think you need several years of active usage and of course we have chicken versus egg problem because if people don't trust they don't use but they need
Starting point is 00:11:37 to use to start trusting but okay so new players are pg? Yeah, I saw a really good blog post from the team at Instacart about adopting PGCAT. So that's a huge deployment using it. So I'll link that one up as well. And it's written in Rust, if I'm not mistaken. So it's interesting. And is that from the team at PostgresML? Right.
Starting point is 00:12:00 Yeah, great. And I especially like, like I never used it yet, but it's my quite short to-do to try when I finally have free time. Or probably I will try to use it in some projects I have. Why is it in that list? Because in February I created issue in their GitHub repository with idea. So I've noticed these guys implement features very fast, developing
Starting point is 00:12:28 very quickly. Impossible for for example, PgBouncer, absolutely impossible. It will take a few years. So I ask them like we, Postgres community will like good feature, mirroring. So we want from connection pool or from this middleware we want to receive requests
Starting point is 00:12:43 to send it to the main server and pass back the result. But additionally, send it to another server and ignore responses. It's a very good thing for testing. And the key, besides reliability, the key metric for me for any connection pooling software is latency overhead. And it should be tested in proper way. So like in my favorite way, instead of running PgBench in default mode, when it tries to maximize everything,
Starting point is 00:13:15 you try to limit TPS to have same numbers of CPU usage and all resource usage, for example, 25% up to 50% CPU usage. Normal case for some loaded production, for example. And then you just compare latencies you have with one middleware, one puller, and another puller. Ideally, a puller should add very small latency, like below one millisecond. One millisecond is already quite a big latency.
Starting point is 00:13:45 Overhead, I mean. Especially on LTP, yeah, which is what we're mostly talking about. So, I wonder with mirroring what will happen. It's very interesting, and I just don't have capacity right now, but I'm very curious, and I think this is a super interesting feature. So, they implemented it
Starting point is 00:14:00 in a month. Already merged, already there, but I haven't tested it. If anyone needs similar functionality, because it can give you similar to like GIMP deployments or realistic testing
Starting point is 00:14:12 next to production, you create database like clone, like primary, and you can just promote it at some point. And at the same time, you start mirroring,
Starting point is 00:14:22 some queries will fail, but it doesn't matter. If you have a lot of queries, like big numbers, this testing will be much better than attempts to replay logs or something. Yeah. When you said it was super interesting, I thought you were trying to segue to the other one that popped up recently being Supervisor. It should be Superbouncer, but it's somehow called Supervisor. Supervisor means probably,
Starting point is 00:14:47 I don't know, like this name I would use like for things like Patroni, for example. I think that's actually part of their roadmap. Maybe. Yeah, yeah. Well, yeah, it's very ambitious. And it's written in Elixir, right? Yeah. And it also has CrunchyProxy and the PGR-Grawl. Yeah, I was gonna ask you about that it looked like crunchy proxy hadn't been worked it was last i saw it was like a beta from 2017 or something abandoned maybe i never touched never tried i don't know i tried odyssey i'm going to try pgcat definitely and i use a lot of pg bouncer ah Also, there is RDS Proxy, but we are not going to discuss proprietary software on this podcast, right?
Starting point is 00:15:29 Unless it's a special event. I'm not sure I have it. But RDS Proxy, they have interesting feature. Like we need to look at proprietary things sometimes because it can give you insights if you develop your own tool. It has super interesting feature. So when they started to develop AWS RDS, they started to develop for Aurora global database,
Starting point is 00:15:53 multi-region setup. Secondary region is read-only, right? And you put RDS proxy there, and local connections are constantly just reading. But what if they want rights our primary is in different region what to do so rds proxy can receive right then go to primary instead perform this right wait until this right is propagated to local standby server and then read from it maybe not from it i't, I may be mixing some details,
Starting point is 00:16:27 but it's very interesting concept of like inverted load balancing instead of saying, oh, this is a right, let's go to that node. No, no, no. We go to our node always, but there we have some magic to create right and wait until it propagates. Interesting concept. Yeah, indeed. So I was going to ask you a question around this. Do you know, why are there so many of these projects? They seem to all have similar goals, right? We want something that PgBouncer doesn't support.
Starting point is 00:16:55 We've got a couple of extra requirements. Why not work together? Because Cathedral vs. Bazaar is usually open source. It's normal for open source to have many many many competing attempts because people have different views. If it was a single corporation, of course, it would immediately,
Starting point is 00:17:14 unless sometimes the competition is provoked inside corporations as well, like two teams competing, but not 10 teams competing, right? But usually in case of corporation, cathedral model, we have roadmap, name already defined, approved by management and so on. Here we have many teams with different views,
Starting point is 00:17:34 different needs and trying to fulfill these needs. I think it's similar to what we had with autofailover and backups, backup tools. A few leaders will survive and probably remain, I think. Yeah. I've not been around long enough to know that there were loads of different backup options.
Starting point is 00:17:52 What about Autofailover? What about replication systems before replication went to Postgres 9.0? It was opinion of Postgres developers that replication should be outside, always. Replication. Can you imagine? Yeah. So we had Slony, then the same Skype guys created Londeste. I used both, and
Starting point is 00:18:13 it was kind of painful. Also Bukarto and many others. And then it went to Core. Backup system is slightly different, but we now have obvious leaders, WorldJ and PGBCrest. And still Barman. Barman, yes. Many others, yes.
Starting point is 00:18:28 But leaders are these two. Barman is much less in terms of popularity, in my opinion, at least around me. Of course, I'm biased. Well, I see most using backrest still. Maybe backrest is more popular, maybe. But I have a lot of Vol-G cases as well. And Vol-E already, I think,
Starting point is 00:18:46 out of consideration, and some others also. And about Autofiller, also, Autofiller should be outside of Polsgys. Okay, it's still outside. And we have obvious leader, Patroni, and many attempts to change it, but
Starting point is 00:19:01 in this case, I think leader is like one big leader, and that's it. So here I also expect, we have a long-term leader Pidgey Bouncer, but many attempts to compete. And these attempts are from latest years and I'm not sure what will be the result because of course, and they also have pressure on Pidge bouncer as well because i i observe it very closely well yeah you sent me one of you sent me a pull request that seems to be making progress so there seem to be some signs that pg bouncer may speed up a little bit or may get some of these improvements and maybe as it does get some of them. Prepare statements for transaction mode, right? Yeah.
Starting point is 00:19:46 Exactly. For transaction mode. Yeah, this is where we fucked up, right? Transaction mode's the default, right? And it's what most people use as far as I've seen. Honestly, I don't remember default. I know, like, let's start with session mode because it's easier. It's the simplest mode.
Starting point is 00:20:03 Like, you have your session, you're always It's the simplest mode. You have your session, you're always connected to the same backend. Backend means Postgres process through this puller, and it never changes. Context never changes, and so on. Good. When you say, I'm going to disconnect, okay.
Starting point is 00:20:20 It's also beneficial, by the way. You know why, right? Or to fight with idle connections for example pardon me to fight with idle connections for example so we still like we don't need to keep a lot of idle connections to database we disconnect faster if not needed for example yeah but there's the benefit of the session mean, obviously it has more overhead, but I always thought the benefit was you get the session-level features, like prepared statements. Right, but prepared statements can be implemented for transaction mode as well.
Starting point is 00:20:55 But historically haven't been in pooled. There are already three pool requests in PgBounce repository, so hopefully it will be soon there. I don't remember but i think odyssey supports it i might be mistaken i remember discussions but i think at least yeah i think at least one of them was started partly to support like that was one of the main features they wanted um but i can't remember which one so transaction transaction mode is the best. Why? Because we can reuse backend to do something, some other, to serve some other requests,
Starting point is 00:21:29 some other transactions between transactions in the same session. So you're connected. One transaction happens on one backend, Postgres backend. Then we have some inactivity, for example. It's not an idle transaction. It's an idle session. I mean, it's regular idle.
Starting point is 00:21:46 We went somewhere to do something like an application code or something else. And during this process, backend can be used by other sessions, by other transactions. So it's, we can, how it's called, multiplexing or something. So so like it's the backends remain not busy less time they are most time busy in this case like it's more efficient high utilization of our of our cores of our resources right right and uh statement mode is kind of strange because you can switch to different backend and send single transaction and it's like doesn't sound well well it's in some case it probably will suit but in general it's not safe yeah i've not seen a project that's used it i suspect there is a use case but just just for completeness i guess yes it exists right so transaction mode is what we
Starting point is 00:22:43 usually want to for best efficiency. But in some cases, session mode also makes some sense. For example, also Pitch Bouncer is responsible for working with slow clients because Postgres already generated result. We shouldn't keep Postgres backend busy just while we transfer data. It's better to transfer it from PgBouncer and backend can do something else or just not do it all. Or be ready at least, be available.
Starting point is 00:23:10 Right. So yeah, transaction mode. And prepared statements. It's like the sweet spot many people want because prepared statements of, help with performance as well. I saw a really good write-up recently from JP Camera. They felt that they've often read, oh, if you get to a certain number of connections, you should just use PgBouncer. And the advice was quite limited or only very, like, it said it as if there were no downsides. And they've gone through a very thorough blog post
Starting point is 00:23:44 of all of the downsides that they've come across. And I'll share that in the show notes. What do you remember among downsides? What's interesting? Well, I think these ones were the biggest. Let's have a quick look. Lock timeouts, statement timeouts. Application name is usually challenging because I remember from old days days I remember pgbouncer hides application
Starting point is 00:24:06 name and ip address or something like that like you need to do special tricks to keep it but in general pros outweigh cons if you have a lot of tps but like one of the good points they made is something I think you've often talked about is having timeouts for things so in general you time out things very quickly but then you can override it from time to time for maintenance tasks but if you've already set that you can't override anymore if you're in transaction mode you unless you connect around the pooler so either if you want to go through pg bouncer you can no longer use like set a longer time out for this maintenance operation so i think that was a really good point that i hadn't seen mentioned elsewhere yeah but also features like for example good feature is to good feature is
Starting point is 00:24:59 that pg bouncer can give you understanding how many qps you have and the average latencies because postgres doesn't have it in internal statistics this It's strange, but only TPS and that's it. Even latencies are not recorded unless you deal with PGE stat statements, which is limited because it has max number of queries, right? But PGBouncer constantly writing it to logs, TPS, QPS, latencies, and also it has internal statistics. It's implemented in an interesting way. It's like you can connect with P-SQL to it and say show stats, show help, show everything, show servers, clients, and so on. And when you want to join it, I want, for example, to take one information to join with
Starting point is 00:25:43 something else. There is no SQL there. It only shows commits. So usually you need to export it to CSV and then to import to normal Postgres and then to work with it. But there you also can find QPS, TPS, and latencies, and this is very good to monitor. Stupid question, but one thing that's sprung to mind is does that also then measure failed like you know when you talk about pg stat statements you only get successful queries right yeah that's a good question i don't know actually so from the bouncer point of view some
Starting point is 00:26:15 query which was cancelled failed it still produced an error it still if it consumed like a second of time it still should be it should contribute to second of time, it still should contribute to averages. I think it should count it, but it's worth checking. I don't know. Very good question, actually. I just remember I was curious, like, okay, we have this TPS. We can see it from PgStart database.
Starting point is 00:26:38 But how many QPS? On average, how many queries are in one transaction? And I usually found myself checking PitchBouncer logs for this information to understand our workload. So this is a benefit. But also, pause-resume, I think it's undervalued functionality. And I see other poolers also plan to implement it.
Starting point is 00:27:00 So you can, for example, restart your server, perform a minor upgrade without downtime at all you can issue pause by the way it's tricky to issue pause because when you issue pause to pg bouncer what it does it says first of all no more new all new incoming requests to run some query uh should wait and it starts waiting itself all ongoing queries to complete. And I feel it lacks some additional options because I don't want to wait forever. What if a query lasts an hour? Well, of course we have time out, but maybe
Starting point is 00:27:39 we don't. So I would like to wait, but not more than some number of seconds. give it a give it a chance to complete give ongoing queries some chance to complete
Starting point is 00:27:50 for example but no no more like two seconds three seconds or so because others already waiting
Starting point is 00:27:56 right yeah okay this post pgbouncer cannot do but you can do it yourself you can
Starting point is 00:28:02 terminate all long running queries in parallel and this in this case, Pulse will succeed, and it will return control to you. And in this situation, you can restart PulseGrid in the background, and then say resume, and users notice only some spike in latency, and that's it. Kind of almost zero downtime minor upgrade or restart.
Starting point is 00:28:26 Of course, it also helps. We shouldn't forget that it helps with restart because restart can take a lot. Postgres restart can take a lot because of checkpoint. It's called shutdown checkpoint. If you tuned your checkpoint, for example, increased Maxwell size significantly, shutdown checkpoint might take a lot of time. And in this case, you should issue explicit checkpoint before you perform attempt to restart
Starting point is 00:28:49 Postgres or shut it down. And in this case, it will be much faster. So because shutdown checkpoint will have almost nothing to do because your explicit checkpoint already did something. So you need to engineer this anyway, right? Because it's not easy to use in general case under load. But then after you start, you say
Starting point is 00:29:09 resume, and that's great. But also you can substitute Postgres node if your pitch bouncer is running on different node, or you can reroute it or something. It can be different Postgres major version. Yeah, that's a good point. So, I mean, i think i saw this
Starting point is 00:29:25 as one of the main goals of the supervisor project is to be able to i think for them one of it's not so much about major version upgrades or even minor version upgrades it was about changing the resources so if you wanted to go like if you wanted to increase cpu like if you i think a lot of these a lot of these providers have have done some clever stuff behind the scenes to be able to resize disk or resize memory, like different things. But yeah, it's like middleware, isn't it? It could almost be a little queue for a while, even if it's only for a second or two. So yeah, very clever.
Starting point is 00:30:02 Right. So summary, PgM Right. So, summary. Peter Bounser is the king, but who knows what will happen next because we see some candidates around with interesting ideas and implementations. Let's see. It should be interesting competition. Last question because I know it's a quick one.
Starting point is 00:30:20 Should this be in court? Of course. Konstantin Kizhnyk is a guy who I know tried to implement it. I remember I was reading very long threads, but it's obviously very hard to convince people
Starting point is 00:30:36 on details. And also, well, I think Postgres will have it right after threads. Okay. So that's an internal puller. That's it. But remember, if you implement it inside,
Starting point is 00:30:53 you lose these benefits of running it outside. Because rerouting, for example, is a good reason to... There are pros and cons to run it on the same node, very close to Postgres, or on a different node. Yeah. Yeah, it makes sense. Awesome. Good.
Starting point is 00:31:11 Thank you. Thank you so much, Nikolai. I hope it was requested by users, and I hope it is interesting to someone. This was requested. Great. Next time I choose, I will work hard on choosing a new topic. Nice. Thank you.
Starting point is 00:31:27 Take care, everyone. Bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.