Postgres FM - Connections
Episode Date: September 1, 2023Nikolay and Michael discuss connections — the options, security and performance tradeoffs, and a few other things to be aware of. Here are some links to some things they mentioned:Episode ...on connection poolers https://postgres.fm/episodes/connection-poolers listen_addresses https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-LISTEN-ADDRESSES SSLMODE Explained (blog post by Andrew Kane) https://ankane.org/postgres-sslmode-explained pg_hba.conf https://www.postgresql.org/docs/current/auth-pg-hba-conf.htmlTiming a query (blog post by Bruce Momjian) https://momjian.us/main/blogs/pgblog/2012.html#June_6_2012 How to connect (blog post by Lætitia Avrot) https://mydbanotebook.org/post/cant-connect/ Improving Postgres Connection Scalability: Snapshots (blog post by Andres Freund) https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462 idle_session_timeout https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-IDLE-SESSION-TIMEOUT Do you vacuum everyday? (talk by Hannu Krosing) https://www.youtube.com/watch?v=JcRi8Z7rkPg Don’t use now() with pg_stat_activity (tweet by Nikolay) https://twitter.com/samokhvalov/status/1664981076014690304 ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I'm Michael, this is my co-host Nikolai. Hey Nikolai, what are we talking about today?
Hi, connections.
Yeah, so this was your choice. We've done an episode already on connection pooling,
but this we're thinking of covering more generally, the basics,
a few interesting things around connections. I'm looking forward to this.
Yeah, a few interesting things and a few boring things.
And this topic was my choice.
And I just see how many people struggle
with trying to connect to freshly installed Postgres, for example.
I also see how many people make mistake,
how many managed Postgres providers
continue opening port to the world
and sometimes not enabling encryption in the beginning and then enabling it,
but still, I don't know, like, I don't like port being open to the world,
but many do it for the sake of simplicity and like UX, user experience.
Let's discuss this maybe first.
Why shouldn't port be open to the world?
Heroku started very long ago.
Obviously because this is the easiest way.
You know, security by obscurity is not security.
So if you use some randomly generated host name and use
randomly generated username, very long password, like strong password, it's hard to find this
password with brute force, right?
Yeah. But that's the argument, isn't it?
It depends what you're protecting.
One of them protects against fewer things, right?
Of course, there's the brute force vector,
but there's also leaked credentials.
You know, if somebody gets hold of the credentials
and your database is exposed to the public internet,
then they're in.
So, but if it's not from
anywhere yeah so i know there are additional things you can do to like limit for example
the ip addresses but it's nowhere near as good so if you can definitely a good good i guess that's
the same advice with most security things right trade-offs to security are often UX, but they're often worth it.
There's also performance considerations.
You can also change port from 5.4.3.2 to something unexpected.
Well, by the way, it helps a little bit because I'm sure a lot of bad bots are scanning well-known ports first,
but still it's security by obscurity.
It doesn't work well.
So also encryption can be in place.
If port is open to the world without encryption,
it's absolutely huge, very bad mistake.
And we know services which started from this approach,
not having encryption, meaning that anyone can
steal your password and then access from anywhere it's insane but the best thing is to not open
port to public ip addresses to public public interfaces not to open the port to the world i
mean just listen addresses setting in postgrescalic.conf.
It should be limited to only local host
and or private networks.
Right?
Yeah.
This is the best security.
Of course, it makes it much harder
to reach Postgres.
And of course, you need something
like SSH port forwarding or some jump host or something
to reach it. But any client supports it or without client you can just in a separate terminal say
ssh-ntl, I don't remember exactly, and have this SSH port forwarding. But of course it destroys
user experience. I know this. I know this. Like many, many developers will not be able to reach Postgres
if you require this.
But if you open the port, like all modern, how to say,
RDS by default doesn't do it, right?
Am I right?
I didn't check.
They have checkbox, but it's disabled.
It's not checked by default.
Interesting.
But Neon, Supabase, all these guys, they open port to the world. Is it okay? checkbox but it's disabled it's not checked by default interesting but neon super base
all these guys they open port to the world is it okay in my opinion no it's not okay and some of
them some of these guys started with no encryption but they have it now right or yes in weird way but
yes okay i mean the i usually I connect, I usually check it,
selecting from pgstat SSL and seeing if my connection is encrypted.
But sometimes it's shown as not encrypted,
but I think there is still encryption, it's just till the proxy.
But after it, after proxy to Postgres, there is no encryption.
Anyway, I don't like port open to the world if
everything else is okay still i don't like it going back to like regular postgres
and we have ssl mode and i was wondering if you thought so the default is prefer
but there are stronger settings i wondered yeah well and so there's what you have acquired, there's verify CA and there's verify full.
Don't ask me.
Yeah, fair enough.
I always forget details.
I always check documentation.
I just wanted to point out, remember, Mongo, by default, opened the port of the world and many years was nothing and then the huge wave of ransomware stealing data and blackmailing
people started to be a huge problem because by default it was open to the world and many people
used some weak password of course in the case of managed services they dictate that password
should be strong i mean they generate it for you but still still, I don't like it. Port opened to the world.
Well, we can say, you know, we live in the world when secrets are public. For full transparency,
we should publish all the data, you know. But I don't know. I don't like it. It's enough.
Let's switch.
I have a quick one, quick thing to add there. There is a good blog post. Documentation is
great on SSL mode in terms of which levels allow you to do what.
And there's a really good blog post that I came across today while looking into this by Andrew Kane called Postgres SSL Mode Explained.
So I'll link that up for anybody that wants to go into it.
Good.
Yeah.
Well, if at least you have SSL required, so nobody can connect without it.
If you have strong password and nobody can set weak password, it's already something good.
Even if you don't have weird host names trying to obscure this thing.
You said something at the beginning. You said
you see a lot of people struggling to connect.
Right. When do you see that?
Because I don't see it as much.
Well,
maybe you just don't
deal with new people.
New people to Postgres. For example, if you just
install it on
MacOS or Ubuntu,
on MacOS, if you use Homebrew, maybe it's easier to connect,
but then it will be harder to find how to, for example, start it and so on. But you can Google
it easy. But for Ubuntu, for example, by default, you need to memorize the sudo-u-posgis-psql. And
I wanted to explain why is it so. Because in some less common situations,
people have issues connecting to local Postgres,
but you should always be able to understand
how to connect to local Postgres,
even if it's running on weird port
and it's installed at weird location,
using weird username and password and everything.
I think it's also a good practical exercise
for people who work with Postgres.
Like, here is the box.
Postgres is installed.
I won't tell you password,
but you have sudo writes, for example.
You can scan everything.
You can see all directories, files, everything.
Try to connect as soon as possible
and as fast as possible and this is good
exercise and i think people master this ability over time but it probably should be just an
exercise and you just can learn faster so my usual approach i checked with ps command i check how
postgres is running and where also maybe pg config and i try to
understand which configuration files it is using then we check maybe straight to pghba conf which
is still separate configuration many people argue that it should be part of main configuration
remember we had recovery.conf and then it was a huge improvement when it was merged with the main configuration.
PGSBA.conf related to which connections are allowed
and using which methods of connection. It's still
a separate part. It cannot be configured using SQL
and it's a pity. Maybe someday it will be merged as well.
So you just check it
if you found proper configuration file
using PS and then navigation and pgconfig maybe
you found it, you just check what is allowed there
and if you see trust
you are lucky, but it's also bad practice
trust means these connections don't require password at all. You just specify user and the way
of connection and you're in. If you see peer
probably on Ubuntu it will be by default, right? For
Postgres Linux user. Linux user named Postgres.
In this case, if you're connecting through that user
you also don't need a password.
Because it will be like gates are opening for this user automatically.
That's why you say sudo dash u postgres psql.
In this case, you don't need to specify a password at all.
You're switching to Linux username to postgres and then you just run psql.
And in this case, postgres needs to have the same user like names should match in operational
system and in posgis if they match no password is needed this is the main rule you can use your own
linux user you just need to create user in posgis for example michael right you have linux user
michael if posgis also has database user named Michael, and pghba.conf has peer
for Michael for TCP connections, for example, you can go through localhost and connect without
password or it can be not TCP connection, but Unix domain socket connection, which is
by the way, if when you say so do- u postgres psql, this is how you
use connections through this Unix socket, not via TCP.
And in this case, no password is needed, and it's
a good way for local connections.
Yep, makes a lot of sense. Otherwise, you need passwords.
Yeah. Are you need passwords. Yeah.
Are you talking about development databases?
Or are you even including production things in that?
Well, this is not about managed Postgres databases because they care about password generation
and restricting the ways, limiting the ways you can connect to Postgres.
But development databases or self-managed Postgres,
and when you install it on fresh Ubuntu or CentOS or something and you need to connect,
in this case you can be lost a little bit and you need to reach your PGHB icon first and
see what's happening there and maybe to adjust. If you adjust, you can reload configuration. Even without connection, you just need to send SIGHUB signal to Postmaster and to reload configuration.
You just say kill-HUB and process ID.
And that's it.
So you're reloading configuration like that.
Or you can restart via system CTL or something, depending on your system.
This is a way to connect locally. By the way, Unix domain sockets are an interesting thing
to understand for new users. Postgres can work through TCP IP connections or through Unix socket connections.
And Unix socket connections are local only because it's the way in Linux how processes can communicate locally, only locally.
And they are faster, by the way.
It's easier to benchmark with PgBench.
Yeah, I saw in researching this, I saw a blog post by Bruce Momjohn looking at exactly this, and his numbers were something like 35% faster without SSL and 175% faster once you turned SSL on.
I was expecting some difference, but I wasn't expecting that much difference.
Yeah, well, I benchmarked myself multiple times. It's really easy to benchmark. You just connect PgBench through Socket. And with Socket, it's strange, but you can say dash H means host. You need to specify directory where Socket is located. Inside directory, you will see some file starting.s,.something, and port number.
And you specify.p like with TCP connections.
.h is a directory,.p is this port number.
It's looking like TCP IP connection, but it's not TCP IP.
It's Unix domain circuit connection.
And if you just specify it for pgbench,
you can compare dash h localhost
and dash h directory where the socket is located.
So Unix socket directory,
this is a setting in postgreSQL.conf.
And you'll see different numbers, definitely,
in terms of TPS and latencies.
And what does this mean?
This means that if we have a puller
on the same host as Postgres running locally,
PgBouncer, for example,
we probably should make it connecting to Postgres
using this type, Unix domain circuits. And also, if you have not a
huge application, and somehow you like combine application and database on the same machine,
sometimes like we have small project, right? Probably we should also make it connecting
through Unix domain sockets, not through TCP IP. And this will be beneficial for latencies.
Numbers you provided sound very good, right?
Yeah, have you seen similar?
Yeah, similar. Yes, exactly. I was expecting small difference, but
the numbers I saw in my benchmarks, I didn't remember them.
They made me think, oh, Unix socket connections are much better in terms of performance.
Less overhead, obviously.
Yeah, that makes sense.
Did you have anything else you wanted to share on the topic of what to do if you can't connect,
let's say, for people that you mentioned struggling?
Yeah, well, just once again, for some people, it's very strange to see that they need to provide Unix
socket directory as dash H meaning host. This is strange. Maybe it's not a perfect
option naming, right? Because maybe it should be, I don't know, like some other
parameter. So it's confusing. 100% so. And port also, this file name has port number, also strange. But what else?
I think that's it. Understanding this, checking list and addresses and port and postgreSQL.conf,
also checking PS, like where is everything is located, pgconf, config, which provides you
understanding where configuration files are located.
It should be enough.
So you start understanding where PG data is located and how it's running.
Sometimes, for example, on Ubuntu, unlike RHEL approach, Red Hat and CentOS approach,
configuration is not inside PG data.
It's moved to UTC, like more canonical style putting files. In this case,
you also need to figure it out and understand how to find configs. But if you have root access to
the host, you should be able to find how to connect to running Postgres. Otherwise,
you need to study it a little bit. One more thing to to add and i'm stealing this from a really good
flow chart from a blog post by leticia avro a few years ago she's mentioned that if you're not
connecting with psql try that as well if you so if you're trying to use a different tool and that
you've got a connection issue there's a chance that it's the tool being buggy, for example.
So I think that's a good tip.
And then the final one that she mentions is if you hit an error message, read the error message.
Like Postgres's errors in this area are really well worded.
So, I mean, this is good advice in general.
I do remember the first time I sat with a developer
and they read an error message and I was like,
what, you can read these and they're actually useful?
It was eye-opening.
And it's translated to many languages.
Sometimes you see, I see translations of it sometimes.
So yeah, I agree.
I use it all the time.
If I have issues connecting,
I downgrade myself to PSQL very quickly
because I trust it 100%.
I know that it's reliable.
And if i cannot
connect using psql that's a problem somewhere maybe in between on server side or or like it
depends but yeah this is probably the way to study it you need to be able to connect using psql first
and then if you know how to connect to psql you bring this understanding
to any client application by the way just to remind that some people like if we discuss
connections we should discuss that overhead of connections is high in postgres because
it's not thread based it's process based each connection means back end on server side and
to create new connection,
you need to create backend.
That's why pooling is needed.
We discussed it.
And in Postgres 14, it was improved.
So now we can have a lot of idle connections.
Also some managed service providers,
and probably you, if you manage Postgres yourself,
also need to configure Postgres to close connection
if it's not used for a longer period of times.
And previously it was possible only if you have PgBouncer or something,
because Postgres itself didn't have the setting to disconnect idle connections.
Now it has, I don't think, idle connection timeout or something.
Idle session timeout, maybe.
I always forget setting names.
But it's definitely possible in newer versions of Postgres.
And it's worth doing because who needs all those other connections, right?
And troubleshooting connections from server side,
it's sometimes challenging as well
because, of course, we have PG set activity with a lot of data in it.
So we have client address and so on.
But if connections go through poolers,
which are installed on different servers, for example, locally with Postgres,
in the client address, we will see wrong IP address usually, right?
Sometimes you can see a real client address and application name.
So there is a possibility in PitchBowser, as I remember.
But sometimes you just don't understand who is connected,
especially if you don't follow good practice
and don't distinguish usernames.
Sometimes people, like, for example,
hundreds of engineers in one organization,
they all use some super user
or some user with elevated privileges,
but it's a single username.
It's bad practice, right? Because it's hard to
trace them, you need to understand process ID, then you
need to deal with who is connected with who start who
started with parent process ID, then who is running the shell.
It's like, it's slow. And it's bad. So in my opinion, every
person should have personal username of course managing this
is challenging that's why sometimes people involve ldup or something well as well as users right like
also applications use a per application that parts of applications different applications
should use different usernames it's a traditional divide and conquer approach.
You need to divide all of them, and then you understand who is who,
and then you can adjust settings.
Because at user level, you can adjust settings.
It's also possible.
And then you see everything in logs properly.
It's great.
But I see sometimes people don't implement this, unfortunately.
Yeah, I think it makes tons of sense, but there are also interesting side effects from,
like, for example, I think it's not necessarily obvious about object ownership kind of consequences,
that kind of thing. But yeah, definitely good advice.
You can always say set role to something else if you have permissions,
and then you can switch,
pretend you're something else,
somebody else, for example,
to run some DLL if needed.
Yeah, and the side effects,
it's one of the biggest mistakes in my career.
The side effect related to different settings users had.
I already explained it maybe once.
We had situation when log statement was set to all,
and I thought it's global,
but application user had DDL.
Yeah, yeah, yeah.
And I made conclusion, very wrong conclusion,
that we are logging all queries.
It was so stupid.
So we had downtime and it was not good at all.
It was a big company.
So it was maybe one of the biggest mistakes in my career related to Postgres.
So if you have multiple users,
you always need to check if they have user level settings.
Selecting from PGRules.
That's great advice, yeah.
Or maybe backslash U plus return settings.
I don't remember exactly, but I think yes.
So it should be in P SQL, you connect and see everything with plus.
Yeah.
Nice one.
What else about connections?
So you had a nice thing on your notes, actually, how to connect if you don't know the password, everything. With plus. Yeah. Nice one. What else about connections?
So you had a nice thing on your notes, actually, how to connect if you don't know the password,
which is something I didn't know I had to Google.
So I don't know if you wanted to cover that.
Well first of all, if you have, you need to inspect HBAconf.
If you have peer there or trust, you know, you don't need password at all.
And you can just... Yeah, you mentioned trust earlier.
That is the advice on how to temporarily fix this, right?
Temporarily, yeah.
Well, all you can...
Yeah, this is the way, exactly.
So you just switch it, then you can already change passwords.
And then you can already revert it
and connect normally
with password in a secure manner.
Right. I would do this
probably, just why not.
Yeah, well, top voted answer on
Stack Overflow agrees with you.
Ah, okay.
So, yeah. Of course, it's a lot of actions
unfortunately. I mean, you need to edit
the PGHB, then you need to edit the PGHB,
then you need to reload configuration to apply it,
or just restart Postgres if it's not a problem for you.
And then don't forget to revert
once you have a normal user with a normal password.
Sounds good.
The only last thing I had was
you mentioned overhead from having long-running
connections so other than the memory footprint is there anything else like i know long-running
transactions for example have big big side effects but i couldn't think of any myself
for long-running connections other than that memory overhead yeah well there are a couple
of articles from andres freund about connection
overhead and memory he claimed that memory overhead was not the biggest problem at all and
snapshot how porges postgres works with snapshot so if we have a lot of idle purely idle connections
not idle transaction but idle not doing anything Postgres 14, it was a problem.
And it's easy to test.
PgBench, you check TPS, and then you add 1,000 or 2,000 idle connections.
Of course, you need to increase max connections setting.
And you test again, and you see TPS drops like 20%, 30%, or something like that.
In my experiment, it was so with Postgres, maybe 11, 12, I don't remember exactly.
And this is how I demonstrated to some guys,
you don't want to set 2000 max connections
and then allow a lot of vitals.
And if it's newer Postgres version,
overhead is much smaller.
I honestly didn't check myself.
I trust claims from blog posts and so on.
And it's not directly about memory. And memory also is a difficult thing to measure properly.
In newer Postgres versions, maybe if you have thousands of idle connections extra, maybe it's not a huge problem at all if you have a lot of memory also.
Who knows?
But I prefer limiting this if you have a puller and you just don't allow this to happen.
Yeah, it makes sense.
Wonderful.
I suspect we can call it there.
Yeah, I think that's it right
so connections
working with Postgres is not possible without
connecting to it
there is single user mode of course
but it's not pleasant to work with
and even if you have
transaction ID wrap around situation
as Hanno Crossing
from Google Cloud Platform
teaches us there is a recipe to avoid the use of single user mode.
He explained it in his perfect talk, Do You Vacuum Every Day?
So, connections are necessary. Connections means backends, processes.
And also, by the way, terminating connections is not a huge problem.
PgTerminateBackend is not a huge problem, normally.
Of course, ckill is not a good idea
because this causes Postgres crash and recovery immediately.
Globally Postmaster watches to see it.
But if you need to disconnect someone,
pgTerminateinate backend with process ID.
Yeah, that's it. Working with PgStat activity is essential.
What else like don't use now with PgStat activity. Now is a
mistake, you will see negative values for connection duration
or, or statement duration or transaction duration, you need
to use clock timestamp instead because it's not a snapshot.
It's not a table.
It's like row by row.
Took me a while to get my head around that.
I think you've got a good tweet
or something we can link to on that.
Yeah.
But yeah, I think we kept this episode quite simple.
I hope it was useful for some folks.
Maybe not for experts.
It was trivial information,
but I just felt it's needed because a lot of new
users.
Yeah, absolutely. Well, thank you, Nikolai. Thanks,
everybody who's been sharing and commenting like and that kind of
thing. Appreciate it. Cheers.
Yeah, bye bye. Thank you.