Postgres FM - Anniversary mailbag
Episode Date: July 7, 2023Nikolay and Michael celebrate the podcast's 1 year anniversary by going through several questions and suggestions received over the year. Here are the questions and some links to things we m...entioned: Question 1: Effect of wal_log_hints=on after bulk Deletes: Why next select runs slow and generated tons of WAL? https://twitter.com/dmx551/status/1598253188926570496 wal_log_hints https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-LOG-HINTS Exploring how SELECT Queries can produce disk writes https://blog.okmeter.io/postgresql-exploring-how-select-queries-can-produce-disk-writes-f36c8bee6b6fQuestion 2: How to get started reading PostgreSQL source code. Maybe a PostgreSQL style C reference guide to consult with for non C programmers https://twitter.com/andatki/status/1578088843940593678 So, you want to be a developer? https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3FGitHub search https://github.com/search?q=repo%3Apostgres%2Fpostgres+wal_log_hints&type=code The Internals of PostgreSQL (by Hironobu SUZUKI) https://www.interdb.jp/pg/ PostgreSQL 14 Internals (by Egor Rogov) https://postgrespro.com/community/books/internals Question 3: Isolation Levels Primer/Strategies — their uses in different scenarios, battle tested strategies and insights, performance tradeoffs, edge cases to consider at scale (with replication and sharding, etc.) I remember reading some interesting stuff on the jepsen analysis https://jepsen.io/analyses/postgresql-12.3 about Postgres's Serializable Isolation Level behaving more like Snapshot Isolation. Has this type of behavior or another one similar to this affected you or your clients in any significant way?Transaction Isolation https://www.postgresql.org/docs/current/transaction-iso.htmlWhat developers find surprising about Postgres transactions https://blog.lawrencejones.dev/isolation-levels/  Question 4: Data encryption in PostgresCybertec Transparent Data Encryption https://www.cybertec-postgresql.com/en/products/postgresql-transparent-data-encryption/ EDB Transparent Data Encryption https://www.enterprisedb.com/docs/tde/latest/Question 5: Migration from other DBMSsPostgreSQL transition/migration guide https://github.com/postgresql-transition-guide/guide  Question 6: Latest failover best practicesPatroni https://github.com/zalando/patroni~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM. I'm Michael from PG Mustard. I'm back after a couple of weeks off.
Thank you so much for holding the fort, Nikolai. This is Nikolai from PostgresAI. How are you doing?
Much better than the last two weeks. Thank you for returning. I was thinking maybe that's it.
You decided to stop this after one year of doing it.
Yeah, unlike Americans, us Europeans do take holidays and two weeks off must be almost all of your annual leave, I guess, over there.
But yeah, we like our summer holidays here.
I don't know what you're talking about.
I have three spots in my calendar today.
I've noticed so people still can schedule talks, even if it's holiday.
Today is July 4th, but I still have some work to do, probably.
So this is episode number 53. Today
is July 4th. And our first episode was published last year, July 5th.
Well, yeah, so this is our one year anniversary. And we decided we're going to do a mailbag
episode. We've got a lot of requests and suggestions and things from people over the
over the year, probably more than either of us were expecting.
So thank you, everybody. But it does mean we have struggled to keep up with the suggestions.
And we've got a few that probably aren't full episodes or that we can give a little kind of quick couple of opinions or maybe even it's just a quick answer to.
So I think the idea was for us to go through some of those this week so it's going to be a bit of a mixed bag hopefully a few of you that have asked those questions will
get something out of it and we can probably follow up with a few of them in more detail at some point
all right it's like a photo dump in instagram by the way the other side the software we are using
just told me that subscription has expired, meaning that exactly one year ago you
subscribed, right? Congratulations. You need to pay again. Okay, let's start.
Wonderful. So the first question I had in the list was, what is the effect of wow log hints
equals on after bulk deletes? Why does the next select one slow and
generate tons of wool? So this one came in via Twitter. Yeah, that's strange. I saw this question
and it's strange that wow log hints leads to, I don't think there's connection here. Maybe I'm
wrong, but wow log hints are needed to propagate hint bits to wall and to replicas. So one of the key purposes of it is
if file-overs, for example, Patroni supports it, if file-over happens and there is some deviation,
old primary can be slightly in future compared to new primary. And in this case, regular approach,
we need to rebuild it. But if hint bits are
wall logged, Patroni can apply pgRewind or we can apply pgRewind and make standby out
of former primary much faster. But speaking of bulk deletes, I think the problem is not
in wall logged hints. Maybe I'm wrong again. But I think the problem itself is in block deletes themselves. If you delete a lot of tuples and then read from the same pages some live tuples,
or revisit... If indexes don't have version information, so index can checking hip records,
records in heap and table, might find that these records are already dead, just deleted recently.
And there is also a mechanism like in place vacuum.
When you select something Postgres might decide to vacuum right now a little bit in this page.
And when vacuum happens, it leads to additional wall writes, of course.
So I think the problem is, again, maybe I'm wrong.
Wall log hint, if it's off, then on replicas, we might see selects leading to writes.
Because we don't have hint bits, there's an old article in a committer blog.
We can attach it.
If you don't have wall log hints, on replicas you just select something and postgres decides to update hint bits and this causes some writes on replicas but this is
different so i think the author of this question deals with bulk deletes and the problem is bulk
deletes and lack of control of auto vacuum behavioracuum behavior. So what we need, we need to split deletes in batches.
We need to control it.
We need to make sure that dead apples are cleaned up more aggressively,
more actively by auto-vacuum workers.
Or manually we can vacuum ourselves, right?
So this is what I think in this case, but maybe I'm wrong.
Yeah, I'll respond to the person on Twitter, case, but maybe I'm wrong. Yeah. Yeah.
Well, I'll respond to the person on Twitter.
So hopefully they can let us know more detail if they're, if they've got it for anybody
else wondering, this is off by default.
So not something you need to worry about unless, unless you've turned it on.
Yeah.
If I'm wrong, it would be great to, to have some reproduction and explore it.
It would be interesting to explore with additional tooling.
Could it be that the select is setting hint bits
and therefore causing a load of full page writes,
whereas it wouldn't have done?
If you're just doing a select afterwards,
you wouldn't have caused any wow without this setting on.
So you're causing a bunch of full page writes
that you wouldn't have done in the previous setup?
Maybe, actually.
Yeah, interesting.
Well, yeah, it's interesting.
We can explore this with page inspect, for example, right?
We can delete, inspect the pages before our select
and after our select, and hint bits will be visible there.
So page inspect could help to understand the behavior.
Yeah. Great suggestion. Should we move on?
Yeah, let's do it.
Right. So how to get started reading PostgreSQL source code, maybe PostgreSQL
style C reference guide to consult with for non-C programmers. So yeah, any advice for people first
getting started or I've got, I saw on the Wikiiki on the Postgres wiki, there's a really good, slightly, a major challenge if you want to be a developer.
The major challenge is different.
It's like dealing with discussions with other people
and convince them that your vision is better.
Yeah.
So defending your thoughts, opinions.
But reading source code requires no...
In the beginning, it doesn't require
C coding experience.
It requires just understanding English.
Because there are good comments
and there are good readme files.
It's enough to understand
what's happening. Code is
covered with good comments, definitely.
And I think this is the main thing.
I usually just read it
and maybe some challenges to find how to find proper places in source code because it's huge
you need to find proper places and i usually use git git grab sometimes i use git lab or github
search but it's not super convenient also there is source graph or something which helps you to
navigate. You have a function, you can quickly see. I have a browser extension for that. I can
quickly jump to definition of functions to see all calls. So it's basic tooling for navigation
in code, regardless of language. It helps as well to navigate a little bit.
But I also think there is opportunity here
and actually some small secret to be revealed.
There is opportunity to have a good tool
to talk to source code, right?
To like explain me, I mean,
chat GPT, right?
Or something, some LLM or something.
Just explain me how this was working
and based on comments
and not only
comments, on code itself, this tool can explain, show you, mention particular files and even line
numbers, depending on version, of course. Oh, also, a very good thing is to read two books,
internals books. One from Suzuki, one from Rogoff, Igor Rogoff, and particularly Igor Rogoff's book about
Postgres internals. It has links to source code, like check this file. We described something here
in terms of how Postgres works, and you can find it in this file. It has on side these links.
It's very convenient. This is my overview of the problem.
Nice. I agree with you, by the way and i've found
github search to be surprisingly good for just for reading the comp like for getting to places
where the comments are good they improved recently a year or two ago nice next one so
this is quite a long one but it's basically about isolation levels their uses in different scenarios
battle tested strategies and insights,
performance trade-offs,
edge cases to consider at scale with replication, sharding, etc.
There was some interesting Jepson analysis
that they've linked to.
And has this type of behavior
or another one similar to this
affected you or your clients
in any significant way?
Well, I'm a very LTP guy
with very focus on mobile and web apps.
And we usually used to work at default level read committed,
in which you might see inconsistency,
for example, in single transaction.
Oh, by the way, we also usually tend to have small transactions,
sometimes single statement transactions, right?
And actually being like DBA,
I hate when people use explicit begin commit block
for a single statement.
It doesn't make sense.
It increases round-trip time and affects performance.
Not like sometimes not significantly,
but sometimes a lot.
So first of all,
isolation levels matter a lot when you have multiple statements in a single transaction.
And there you can have anomalies.
But it's interesting to understand, like, usually we have asynchronous problem when reads are not yet propagated, but if someone, for example, added a comment, refreshes page, and we go to replica in second request and don't see comment which was just added by this user.
So we implement something like stick to primary and so on.
And similar effect can be seen with just single node, primary node, because if you have different transaction inside transaction, you read something,
then you read something else, because you read committed, old committed transaction. So these
anomalies, and, and or deleted something that some concurrent session deleted something in your
transaction, you just read something, you read it again, and you don't see it, right?
And this is okay. I mean, we got used to it. We just keep it in mind. And when we design
our transactions consisting of multiple statements, we just understand that this might happen.
So I was going to ask you, actually, what proportion of your clients,
your customers, sorry, that's a confusing word here, even change the default here?
I haven't seen many or I haven't heard of many.
Well, first of all, we all change when we
use PgDump. For example, PgDump is working in repeatable read because we need snapshot, we want
tables to be in a consistent position. We want to deal with single snapshot. We don't want
like reading different table, breaking foreign key, for example. And if we run PitchDump in multiple sessions,
dash J, or number of jobs, is four, for example,
to move faster, right?
In this case, we have snapshot,
which they are synchronized.
All workers of PitchDump will work with single snapshot.
We can do it in our application.
It's not difficult, actually,
to specify snapshot in repeatable retransaction. So repeat repeatable read transaction is needed, right? It's good.
And we all implicitly use it when we run PgDump. But as for explicit use, I saw only two big
use cases. One use case when people understand very well what they do and they move very carefully to higher
level understanding that when you move to repeatable read and especially to serializable
you are going to start getting some deadlock deadlocks occasionally and some slowness but a
second case is more interesting and probably happens more often. Some new developers who don't understand yet the problems of moving to upper
level. I mean, sometimes we need it. For example, PgDump is one of the cases or some billing system
probably sometimes we do need higher level to avoid these anomalies. But another approach is
like people just decide, oh, we want to be in a very consistent state, let's start with a serializable
level right away. I saw it not once. And in this case, then the second thing they do is start
complaining about Postgres performance. Yeah, so that's the big trade-off, right?
Yeah. Even with few users. Yes, even with few users. Of course,
for mobile and web apps, in Postgres, the lowest level read committed should be default.
But with understanding all...
This is the big thing I've seen.
I think when people first learn about transactions, it leads them to assume at least a repeatable read behavior.
So I think it does catch developers out when people
are first learning it which is understandable there's a good blog post by a former colleague
of mine actually on this that i'll link up laurence jones who i know from go cardless so
it's yeah he writes about that in detail and i think it's a good one to share with juniors on
your team if they're learning about this right so it's my comparison to replicas
synchronous and asynchronous is or semi-synchronous it's similar it's different but in terms of
anomalies it's quite similar i mean users have anomalies and we've seen asynchronous replicas
they also have anomalies and we need to deal with it but why like if you think performance doesn't matter
let's make like we have five replicas let's make all of them synchronous right and no anomalies
anymore they all have data the same data and let's go to serializable synchronous replicas
yeah oh and in multiple regions around the world as well. Of course not. Yeah. Of course, with big latencies.
What could possibly go wrong?
With big network complexity between them, right?
Yeah, yeah, yeah.
And serializable.
Good luck.
By the way, this is probably one of the topics we probably should dive deeper and explain many, many cases.
And I'm refreshing my memory from time to time.
I'm sitting in read committed most of my life.
So sometimes I go there and find new things to me.
So I would like to explore this and discuss maybe deeper.
Yeah, wonderful.
Data encryption in Postgres.
That's all we got on this one.
Data encryption can be different,
like at rest and in transit, right? Two big areas. And I think, again, like security is what we need to deal with,
but I'm not a big fan of exploring. Usually I prefer to just to check what's the best approach
and so on.
And encryption, of course, is a good thing,
but compression is also a good thing, right?
And sometimes they go together, but not always.
And encryption should be enabled.
For example, especially if you work in cloud,
but it should be enabled at protocol level.
Yeah, so, okay, I see where you're coming from now.
I assume they were actually talking about... The time I see this come up most in Postgres is because we don't have the whole, you know,
encryption at rest at the database level.
And compression also.
Yeah, so the common argument is that you can encrypt the disks.
Using which key? Provided by you or by cloud itself?
If provided by cloud, then how good is it?
Well, I also had, well, I tend to trust cloud,
at least the major cloud providers more than I trust myself.
But you may not.
The other thing that somebody brought up to me recently
that makes a lot of sense is that also you've got to worry about your backups.
So if you've just encrypted at the disk level
and you've got a Postgres backup and you're storing that somewhere,
that can be restored and it's not encrypted.
So there are other things to consider.
And usually people store backups in object storage,
which makes total sense because of its durability in terms,
not accessibility, like high availability.
High availability of S3, for example,
is less compared to EBS volumes in edible years.
But durability in terms of data loss, it's insane.
And the data won't be lost.
But they also, usually they are not surrounded by some network solution.
So, I mean, these buckets are available from anywhere.
And you just need proper keys and you can access them from different regions, from different customer accounts and so on.
It's interesting.
But of course, they usually provide a lot of things for encryption.
And sometimes it's challenging to use your own keys.
For example, it might break some processes.
Like if you want to repeatable retries for chunk upload in some clouds,
if you use their keys, it's fine.
But if you use customer managed keys, it might not work.
And you need to retry whole file, which might be one gigabyte.
In Postgres, everything is stored, like all data and indexes are split to one gigabyte files.
But if you compress it, it will be probably three, four hundred megabytes
depending on data and so on.
In this case,
it means that
if you want to use your own keys,
probably you need to perform
retries of whole file
and it's not super efficient.
So there are many challenges there.
And also if you encrypt
and then you want also to...
This is a security topic, so it's very big.
And, for example, one of the problems, it's not...
Like, encryption can work in both ways.
And sometimes encryption is used against the data owner.
You know, like ransomware, which encrypts all your data,
and then they say they will give you a key only if you pay.
So how to protect against that?
And sometimes people see bigger danger in this area,
like how to avoid encryption.
I mean, not to avoid,
but you probably need to store backups in two clouds, for example.
Because one cloud can be stolen, your account is stolen,
everything like you lost access. And losing data is probably so two big risks. Losing data is
different risk. Of course, one risk is leaking data, very big risk, but also losing data is
different risk and encryption can be used against you in the second case so yeah is it worth mentioning a
couple of third-party tools yeah let's like well not tools necessarily but i think site both cyber
tech and edb have non-core solutions here for that if you? Like transparent data encryption from cybertext?
No, it's different.
It's at upper level.
It's not encrypting disk.
It's encrypting data inside Postgres.
Which then results in even your backups having, like, being encrypted, right?
To me, as a Postgres user, I think it's a very good feature we should all have.
But I remember some discussions about it and some dead end in development.
I mean, somehow it's not in core.
It cannot be brought to core yet and so on.
I don't remember details, unfortunately.
But in my opinion, it should be in Postgres as a feature.
It would be great.
I'll link to the ones that I'm aware of in the show notes,
just in case anybody does want or
need this and wasn't aware of those i think we've probably got time for one more i'm scared that the
one i'm gonna pick is a bit big but do you have any on the list that you wanted to make sure we
talked about no but i can choose a couple let's well migration from other databases like oracle
to posgus also on non-relational ones like Couchbase, Cassandra to Postgres.
It's a big topic, but fortunately in this case, I'm not a big expert in migration.
The main migration I did was from MySQL to Postgres many, many years ago in my project.
But my colleagues did a lot of work migrating from Oracle and so on, and I understand the
process.
Of course, it depends usually on how much code
you have on database side.
And by that, you mean like procedures and functions?
Right, right, right.
Migrating schema is quite easy,
but if you have a lot of PLSQL or TSQL code,
you need to rewrite it.
And this probably will take a lot of time.
And second biggest challenge,
I mean, converting schema is also a challenge,
but it's solvable.
There are automation tools.
You will deal with some issues
which probably is easy to fix.
The key is to test a lot,
as usual, experiments, right?
And the second biggest challenge
after code, this server code,
is probably if you want
zero downtime migration. In this case,
migration is used in proper meaning, I think, not in weird meaning when we change schema.
Well, we also change schema, but we change engine for schema, right?
Yeah.
Yeah, in this case, you need some replication solution, logical replication solution.
Like change data capture type thing.
Yeah.
I actually,
this is something I have got a little bit of,
at least secondhand experience of,
with,
there's a cross company working group in France
that are doing a lot of migrations from Oracle,
from SQL Server,
from Sybase.
It's become really popular postgres in France amongst
huge organizations.
And government organizations as well.
Yeah, exactly. Lots of large organizations
including government agencies.
And they wrote a,
they collaborated on a guide, but it
was all in French. And I
spent a while with one of
them. So I ended up translating
a few of the chapters into English.
So a few of them are thanks to me.
A few of them are thanks to other people.
So I'll link that up as well
because there's basically a tried and tested formula
for doing these.
A lot of consultancies make a lot of their money
helping people do this.
So you can get help from others.
But yeah, it's not a small project
because a lot of these databases do encourage using sort of procedures and functions so yeah
if you have a lot of those don't expect it to be a small project what about no sql database
systems to postgres i haven't seen much of it to be honest but hopefully we will in the in the coming years i i saw cases but i never
i don't remember any big questions about it just do it that's it i guess it's quite simple right
i guess by by its definition that doesn't have much schema if it's mongo db there is a new project
called ferret db which like speaks mongo but yeah this is one of one of the way just to use some extension or some
project on top of postgres which will help but if it's cassandra or couchbase i don't know
you can use json as usual right yeah there will be difficulties definitely there will be difficulties
yeah well if anybody does have experience of that, let us know. But I imagine those are much simpler projects overall.
With complex queries to JSON documents in some NoSQL JSON-like storage,
it can be easy to convert them as is,
but you probably will have not good performance and also consistency.
Why do you migrate with Postgres?
You won't probably benefit from its strong sites
and strong relational data model, ACID, and so on.
Probably you should move some of parts of your data
to relational model, not just JSON.
Maybe even normalize it, yeah.
Yes, and in this case,
it's kind of like building something from scratch, almost.
Okay, good point.
Actually, performance is a big part of some of these migrations.
So once you've done the scheme, once you've done the code,
sometimes they get stuck and you see it come up on the mailing list.
Quite a lot of people saying, this used to be one second in Oracle
and now it's 20 seconds in Postgres.
What can I do about it?
You know, you only get the squeaky wheels.
You only hear about the ones that are slower, of course,
because they're the ones holding the project up.
But it's quite common to have, you know,
if you've got thousands of different queries going through your system,
a few of them are probably going to be slower in Postgres than Oracle
without some tuning.
So it's quite common for that to be a big part.
To be fully fair, SQL Server and Oracle optimization code
is much more complex, sophisticated,
bigger than Postgres. But on the other side, Oracle has hints and tendency to use them quite a
lot, manual control of plans. But also like SQL Server, if you compare performance with Postgres, sometimes I saw some works,
SQL Server is winning quite obviously, quite often.
But it does mean that in Postgres,
we cannot make things work well.
And it's improving also, right?
So I don't know.
Like I understand that it's not that simple,
that like it's easy to improve.
Well, code base is huge some methods for
query optimization used in commercial big database systems are very interesting and many of them
postgres doesn't have yet so it's a good point actually like index sometimes it doesn't have
them like natively like an index skip scan, for example.
Who's index scan? You need to master it yourself manually.
Other systems have it.
We've got transaction control in stored procedures, stored procedures, not functions.
And Postgres is only in version 12 or when?
A few years ago only.
Yeah, exactly.
Well, those guys have it for many years.
So, I mean, we should not be like we are the best.
We are the best in many areas, but not in all, right?
I mean, Postgres.
All I'm saying is pros and cons is important not to be like all good, all good.
No, sometimes it's not that good and improvements are needed and so on.
Workarounds are needed, like Lucendix can.
Yeah, all I meant is it can be a big part
of those migrations.
And they can get stuck at that point, right?
And what I can say, like Postgres has so many things.
Usually we have like, okay,
maybe not in a way you got used to,
but in different way we can achieve
very good performance, definitely. And you can put a very mission critical systems with under big tps a
lot of data a lot of tps and so on and so on so it's possible to build big reliable system using
postgres nice and i think this is another one of those ones where we could do a whole
whole yeah migration is huge topic again i'm not an expert but i can explore and say something things definitely well maybe some small last thing and that's it or what do you think sounds
good yeah go ahead one more okay latest failover best practices oh you picked a nice small one then. Failover best practices.
Failover is
when things go wrong, right?
What do you do when failover happens?
Answer is, if
well prepared, you shouldn't do anything.
And this is the key, right? I remember
times when
no good
failover systems existed. And also there was consensus that in Postgres,
in my opinion, it should have been Postgres, inside Postgres. But so far, it doesn't seem
to be happening at all. But we have Patroni and others. The recommendation is to use Patroni
or other system which follows consensus, well-developed consensus algorithms, right?
Like Raft.
But if you use RepManager, RepMGR, I have bad news for you.
Split-brain is very often in large systems.
Or in cases when you have many clusters.
It's very likely.
So migrate from it.
This is the key.
And just use Patroni, for example.
Patroni is an obvious winner right now.
Are there any particularly good guides or books on it
that you recommend?
Well, just documentation.
Well, there are some tricks there,
but it's worth a separate episode probably but one of
the things for example you should understand if you use we mentioned asynchronous replicas if you
have asynchronous replica during failover you might have data loss and patrony defines by default if
i remember correctly it's 10 megabytes maybe bytes 10 maybe bytes of data might be lost in
in case of failover so this should be be understood. Or you need to start using
quorum commit and so commit goes to at least two nodes and Patroni will choose the best one and
probably you won't lose any data during failover. Actually, you know, I suspect maybe the author of
this question meant something different, not failover, but switchover.
Maybe high availability.
I think it might be the same one as the one above,
which is also a huge topic about high availability.
Yeah, because there are also best practices on how to perform switchover
and to avoid long downtime and not to lose data.
It's not rocket science, but there are some tricks there.
Let's promise to explore these
both areas in the future.
Sounds good.
Thanks, everybody, for these
questions and suggestions. Thank you,
Nikolai. Thank you for being
back and continuing this
because I was in fear
you want to stop it.
Yeah, you're crazy.
I said I'll be back.
Okay.
And by the way, don't watch all those draft recordings
in this Riverside.
Just delete them.
They are all bad.
That's so funny.
All right, take care.
Thank you, everyone.
Thank you, Michael.
Bye.
Cheers, bye.