Postgres FM - Postgres Gotchas
Episode Date: October 18, 2024Nikolay and Michael discuss some Postgres Gotchas, things you might expect to work one way in fact working another way. Here are some links to things they mentioned:Our episode on NULLs http...s://postgres.fm/episodes/nulls-the-good-the-bad-the-ugly-and-the-unknownPostgres Gotchas (list by Ian Barwick) https://sql-info.de/postgresql/postgres-gotchas.htmlOur episode on slow count https://postgres.fm/episodes/slow-countDiscussion on X about major version upgrades and statistics https://x.com/samokhvalov/status/1844593601638260850Our episode on upgrades https://postgres.fm/episodes/upgradesStatistics Import and Export (commitfest entry which link to email thread) https://commitfest.postgresql.org/50/4538/vacuumdb https://www.postgresql.org/docs/current/app-vacuumdb.htmlvacuum_cost_delay https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-VACUUM-COST-DELAYZero-downtime Postgres schema migrations need lock_timeout and retries (blog post by Nikolay) https://postgres.ai/blog/20210923-zero-downtime-postgres-schema-migrations-lock-timeout-and-retriesPostgres Hacking 101: adding params, prevent_unqualified_deletes and prevent_unqualified_updates https://www.youtube.com/watch?v=samLkrC5xQA PostgREST suggestion to use pg-safeupdate https://docs.postgrest.org/en/v12/integrations/pg-safeupdate.html#pg-safeupdateDBOS (new company co-founded by Michael Stonebraker) https://www.dbos.dev~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the elephant artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I am Michael, founder of PG Mustard.
This is Nikolai, founder of Postgres AI.
Hello, Nikolai.
Hi, Michael.
How are you?
I'm good.
How are you?
Very good.
Let's talk about some crazy stuff.
I chose the topic this week and it was a listener suggestion.
It's Postgres gotchas.
And we've had a short discussion before this call about what we
should even count as gotchas so how do you define it what do you think of when you hear postgres
gotcha well if you for example run i don't know like select five slash two what did what will you
get is this like data type casting right right well right. Well, it's not Postgres related.
I think it's SQL related.
But still, what do you expect to get?
Normal person would expect 2.5, right?
Yeah.
But you will get, and I just double checked
because I'm always like thinking, am I right?
So you're getting two.
So five. Oh, wow wow i actually thought it was going
to be three whoops say it's flow okay we have okay we probably have two things to discuss here first
is of course okay integer and integer right result should be integer so there's no decimal there's no point here there's no dot and but two or three
rounding or flooring right yeah because like rounding conventions at least where where i grew
up we generally round up from 0.5 but not clearly not here um I like this, almost this definition, which is something where
you may expect one outcome, but a different outcome happens. That's unexpected behavior,
basically. Now they're worse.
Yeah, especially unexpected if you haven't read the documentation.
Sometimes documentation also wrong or misleading or sometimes it's just confusing.
Yeah, fair enough.
I just wanted an excuse to list a few things that are made very clear in the documentation,
but I know people get confused by it because often they don't read that.
You know, I'm curious.
I actually found a good example, right?
It's probably a gotcha of SQL, not Postgres,
because I think it's maybe defined by standard.
And this is super basic, but I'm wondering why indeed it's 2, not 3.
Because if you, for example, what I did, to double check,
of course, I checked floor and round functions.
And of course, as expected, I converted 5 to numeric.
Like cast with the colon colon right right so the result is also numeric and then we have of course
two point five zero zero zero like okay so but then i checked floor gives me two round gives me
three is expected because 2.5 means like we need, if you're rounding according
to, there are names for these rules, I don't remember, right?
But usually 5, you go up, 4, you go down, right?
And it gives 3, round gives 3.
But I was thinking, okay, why 5 slash 2 integer and integer gives 2, not 3?
Maybe because it uses 4.
We could check source code, of course.
But the next thing I checked is the final thing I checked.
5 converted to numeric slash 2 gives you 2.5.
And then you convert back to int, right?
Again, double colon int.
It gives three.
Amazing.
So how come we have two
if we don't use type conversion,
explicit type conversion at all?
Any ideas?
It was good gotcha to find.
I always knew that it's two
because I demonstrated.
I just didn't think, like, why? Now I think why. I have questions. Maybe there is some good explanation,
simple but I can see easily that it can be gotcha, right? It can be considered gotcha.
And let's agree on one more thing. We don't talk about nulls today.
I actually did have them on my list
but not to go into detail
we had a whole episode
and this whole episode is telling
nulls are surrounded by gotchas
both standard
defined
standard produced
global SQL gotchas
and local Postgres related gotchas
for example when you go to arrays
let's just exclude it because it's a whole massive area And local Postgres-related gotchas as well. For example, when you go to arrays.
Let's just exclude it because it's a whole massive area.
I think that's very sensible.
And if you haven't listened to that episode, I'll link it up in the show notes.
Consider it to be fully included here.
Yeah.
Right.
Okay, let's maybe switch to your ideas.
I know you have a list, pull something from it.
Yeah, I've got a few that I wanted to make sure we mentioned. I'm going to start with some ones
I consider more beginner level gotchas, and then maybe we can work our way up a little bit.
The first one I had is one I see all the time in forums, every now and again on reddit quite a few times in like the postgres
slack a few on a few mailing lists people getting confused or annoyed when they name objects in
postgres and they use mixed case so maybe camel case or some way of naming a table like for example like a two-worded name and using
capital letters for the two names and no space in between them that kind of thing and then realizing
the table is not called that behind the scenes and it's all lowercase and if they've used an
ORM or something to name these at the ORM often puts quotes around those case names so then they become cased in the
database so i think this whole area is like a one big gotcha for beginners um and the people who
set up the schema in the first place if they do something like this if they don't just use
lowercase object names can really make life a bit painful for everybody that comes after them
having to double quote everything yeah especially java people like it because oh really yeah
yeah i noticed i like usually if we have a new customer and we start discussing their database. We see some CamelStyle cased table names.
And I think, like, I'm already thinking, are you Java guys?
Yeah, they are.
Usually, not always, but usually so.
And my previous startup was using it, CamelStyle.
And we just decided to keep it as is because it was inherited from some
i think it was my sequel originally like we started very lean so we borrowed some schema from
some open source project and it had it and i just knowing this very well i still decided to keep it
but the problem with double quotes is not like it's it's not a big deal but when you know what comes with quote quotes usually right backslashes oh having yeah having to escape
them yeah yeah and you cannot escape from this you need to use it to escape right i mean
yeah and backslashes uh if you have automation in many languages, like sometimes it's a nightmare.
Right now we have an ongoing problem with backslashes
and escaping of quotes and double quotes in our AI system.
It's a combination of Python and Bash actually.
And yeah, it drives us crazy sometimes.
We solve some problems, but not all of them still.
And when AI needs to talk about your table names
and you give the schema with double quotes
and then you want to experiment in dbLab
to find some plans or something,
sometimes it's broken just because of problems with backslashes
and passing these pieces between different components.
Yeah, I would prefer keeping everything lowercase
that's it in positive right snake style right or yeah and that's it and just forget about it
right yeah please people please i wanted to mention the list of gotchas i saw actually this
is like as we discussed right before this recording, I didn't know the word gotcha until 2006 or 2007,
and this is exactly when this list was published.
I remember we had two lists, one for Postgres and one for MySQL.
For MySQL, it's outside of scope, again, even more than NOS.
But this list, I'm looking at it right now and i'm very
first of all it survived how many years like it's insane 18 17 years wow and i'm looking at this
it's quite short list at uh sql info. right? And almost everything is crossed out.
Good.
Yeah, I've never seen this page before.
This is really cool.
And yeah, maybe more than half of them have been crossed out.
Yeah, account start is very slow.
Let's just laugh at it.
Well, we had an episode on that, right?
I think we should.
Unicode means UTF-8 left as well.
Well, yeah.
Yeah, constraint checking is interesting,
but maybe let's not dive into it.
And finally, I wanted to highlight what else left.
Only four items.
Five, okay, five.
But one of them is upgrades upgrades related stuff
and i was discussing on twitter on x i was discussing this topic over the last few days
i just checked documentation of three major managed postgres providers so rds cloud sql providers. So RDS, Cloud SQL, and Azure Postgres. And I saw that all of them say,
okay, upgrade is done, major upgrade. Now it's time, don't forget, like, now it's time to run
analyze. And they say analyze, like they give you a snippet, analyze semicolon, which means
it's single threaded as well, right? First of all, they don't include it like none of them included into
automation but we don't talk about managed postgres providers today we talk about postgres
my question is why pg upgrade does not include it prints only some text in the end but who reads the
text many people will say for those who don't read, it's on their shoulders, it's their problem.
But I cannot agree with it.
I could even make a case for the gotcha here is that statistics are not preserved on major upgrade.
I think if you asked somebody that didn't know whether they would expect the statistics to be wiped or to be
preserved somebody maybe doesn't understand how difficult it might be or like the catalog like
the reasons why they're not preserved you'd be forgiven to think like in a in the most advanced
open source database that that might be that might happen even though it's complicated. And I think there has been some work.
It should happen, everyone agrees on it, just not solved.
Yeah, so I think that's the main gotcha.
And then, obviously, the solution at the moment
is we need to unanalyze.
Yeah, let's unwrap a little bit.
So we talk about statistics, which is required for a planner
to make right decisions.
If the database is not trivial,
like containing 100 rows in each table, that's it. If it is not trivial, like containing like 100 rows in each table,
that's it.
If it's not trivial, we need statistics.
Without statistics, database is slow.
And what happens,
I think in a zillion cases,
it happens.
It's hard to estimate,
but when I posted this tweet,
I received several replies immediately
that, oh, we got bitten by this
badly i i some someone some person wrote we learned this hard way obviously and and this is
like what happens all the time and recently we during during summer we also also failed to help a customer with it.
We discussed major upgrades, but from now on, I'm pointing out,
don't forget, don't forget, don't forget.
But we discussed upgrades, we discussed some complicated stuff,
and then they went to upgrade, and next Monday they say,
oh, today they are in Europe.
So in the morning, it was already by the evening in Europe.
And they say we had nasty database incidents today
because guess what?
We forgot to run analyze.
And this is huge gotcha, not only by managed services.
They could do it, but I don't care too much about them.
This is gotcha of PGG upgrade.
I think it's very well understood.
There is work in progress for export and import of statistics,
like quite long thread in hackers already and commit fast entry.
It's still not done, but there is hope.
There is one more thing here.
I think PGA upgrade definitely should have it and also PGAore, but let's return to it in a few minutes.
So when someone says you need to recalculate statistics yourself
after upgrade, pgUpgrade.
Definitely, I would expect pgUpgrade to do this for sure.
But okay, it just prints do it.
pgUpgrade prints vacuum db i think
now or in stages or like vacuum db in stages it has this vacuum db analyze only in stages two
options analyze only means no vacuum and just analyze in stages means first it gets only one
bucket for for each table for each column, actually.
And then some midway until your... And then default statistics target
or individual column-based setting for each column.
Default statistics target by default is 100, 100 buckets.
This is what defines how much Postgres planner will know about distribution in each column,
right? And it could be adjusted globally or for individual columns.
It's like a sample size, like the sample size increases of what it samples.
Yeah. And I quickly realized this is not what you want to do in LTP because in LTP,
you either include analyze inside maintenance window in this case there is no
sense to run it three times
or you just
or that's it there is only
option actually like I don't
trust in opening gates
with weak statistics it will
be like unpredictable again incident
yeah
yeah risky but this feels like a topic that might be
worth a deeper dive right right right i just wanted to point out that uh when people say
analyze and it's just one threaded right and you expect postgres has a lot of have has a lot of
parallelism implemented but when you run analyze it's always single threaded this is i'm
expected as well right good one yeah i like it it's kind of gotcha inside gotcha we have
matryoshka style gotcha here right russian doll style what do you do you recommend doing like
just kicking off a bunch of queries analyze tape like per table i recommend partition i recommend
vacuum db nice okay and speed up and go full speed inside maintenance window again it's different I recommend vacuum DB. Nice.
Okay.
And go full speed inside maintenance window.
It's a different story.
But this vacuum DB, it has dash hyphen.
J?
Hyphen drops or J.
Yeah.
Hyphen J.
You can define.
And usually we take as many courses we have on the server and go full speed inside maintenance window to achieve.
Like, analysis should be quite fast
if you don't do vacuum.
There is also gotcha a little bit
in vacuum DB options
because vacuum DB's hyphen-hyphen analyze
will also do vacuum.
So there is also...
So can you do analyze only or...
There is analyze only.
Yeah.
But it's easy to overlook.
There is one more gotto here inside all this.
When people go with manual single-threaded analyze,
or maybe multi-threaded, but just explicit analyze.
I, like, quickly, it was interesting.
I was thinking, someone said analyze can be super expensive.
I agree.
But even if it's like, if default statistics target is 1000, it's not
that expensive. But,
back to this. Can we
throttle it?
For example, you want it to go
very slow. Because auto-vacuum
has auto-vacuum,
it doesn't have actually.
Cost delay? Yes, there is
vacuum cost delay and vacuum cost limit.
And by default it's off because I think cost limit is zero, meaning that it's not checked at all.
Not cost limit, cost delay is zero, like no check.
But AutoVacuum has like mirrored pair of these settings.
It's throttled.
Long story short, AutoVacuum is throttled. Long story short, auto vacuum is throttled usually. And we actually usually fight
with our customers a little bit to increase this, even make it even more aggressive. And
managed providers already did part of that job, which is good. But if we run it manually,
it's not throttled. It will go full speed. I doubt you will need to throttle it, but some people won't.
Okay, let's throttle it.
Question, how?
We have vacuum limit,
vacuum cost limit,
vacuum cost delay.
We can set vacuum cost delay
from default zero to some point,
but is it about analyze
or it's only about vacuum?
Based on just the naming,
it's not about analyze.
It's about vacuum only, right?
Yeah, that would be my guess.
But by the line of your questioning, I'm guessing it does affect analyze.
Right.
I go to documentation and I see vacuum cost limit, vacuum cost delay, I think.
And I see description.
They talk only about vacuum.
They don't talk about analyze.
So I make conclusion. they talk only about vacuum they don't talk about analyze so I make conclusion
it's only about vacuum
but some part of my
like very long term memory
tells me is it really so
and then
then Sadek Dusty who
helped me a lot with and still helping sometimes
when I write it how to's and like reviews
how to's I'm very thankful
thankful for that.
I have already 94 how-tos.
So I wrote how-to, how to analyze.
And right now I need to add this.
Vacuum cost limit, vacuum cost delay,
they affect analyze.
If you scroll up to the beginning of section,
you will see that it talks about both vacuum and analyze.
And we also, using AI,
I checked source code
and quickly found that there is
a function called vacuum exec,
which works with both vacuum and analyze.
And this confusion comes from source code,
propagates to documentation
and then to our heads, right?
This is gotcha as well.
So you can use vacuum cost limit
vacuum cost delay to throttle analyze although i usually don't do it because i think we need to go
full speed yeah and in the one example we're talking about here we want to go faster not
slower yeah right right so but it's interesting right so so there is some small confusion i think
there are many such confusions but and they are slowly fixed. They are slowly being fixed.
Great example of one where the documentation wasn't helpful in avoiding that one. So yeah,
nice. I've got a few more like beginner-friendly ones that maybe we could rattle through.
I'm not even sure you'd consider them all gotchas, but one relates to this a little bit and that's conservative defaults in a
lot of places i think that not only catches well you don't think it's a gotcha or i think we should
exclude this on the same like we're using the same approach as now because it's a huge bag of
yeah cool and then another one that i see catch up well a couple that i see catch up beginners
again maybe this is another huge topic but access exclusive locks on like just basically ddl
blocking selects like blocking every blocking inserts and like you know all sorts of people
getting yeah people don't realize because if they're working on small databases at first
they start to grow in confidence of how to do things and then it only catches them out
you know once once the startups got really successful and the table's much bigger that
kind of thing yeah i think you know that there is some memes with uh like gauss distribution
or how it's called, right? Gaussian distribution.
And first, I think schema changes are painful.
Schema changes are super painful.
They require expertise.
It's like beginner, right?
Then in the middle, you can imagine,
oh, Postgres has transactional DDL.
Everything is fine.
We just need to wrap everything into one transaction and rely on Postgres' great capabilities of transactional DDL, everything is fine, we just need to wrap everything into one transaction
and rely on Postgres' great capabilities
of transactional DDL. Or just concurrently
and think, yeah, do everything concurrently.
Or MVCC works well, it doesn't
block sell, sell are always working.
You can have a lot of
exciting stuff in the
middle. And then on the
right, you say, Postgres doesn't
Postgres has transactional DDL, but you cannot use it.
Well, on the right, it has to be the same as the left, doesn't it?
Schema changes are painful.
Yes.
They are painful.
Transactional DDL cannot be used in many, many cases.
And you can shoot your feet very well many times.
Yeah.
So this is it, I think.
And there are a lot of gotchas there.
Another one that's not necessarily...
I'm not sure if this counts as a gotcha.
I'd be interested in your opinion.
But I have seen it catch people out.
And I guess it's a little bit of a foot gun.
And that's...
You know, you see the memes about this as well actually delete or update without a where clause and
then just suddenly seeing you know instead of five rows updated it's not
yeah but but if you think it's not post because why do you say it's not post
goes out of interest because the sequel standard does it like defines this and
any relational database which follows fix it SQL standard to some extent has it.
Yeah.
But, like, the clients, I've seen clients catch these, like, catch these, but the ones that ship, well, the only, the P-SQL doesn't.
So I don't think it's unreasonable that Postgres does it this way, but I've definitely seen it catch people out.
I agree.
Blame SQL standard.
And there is extension to forbid it. And also it was
maybe one of the very first
hacking sessions with Andrei. We implemented
some setting,
GUC setting
in Postgres, which
would allow
administrators to I don't think error, warning, like completely
forbid or just warn.
I don't remember.
I think error should be here.
But it was not accepted, of course, because I think it was discussed many times.
I mean, it's possible.
And right now, as I see, I think Postgres does it, right?
Postgres.
Oh, does it? So, usually right now it's sold. If there is some middleware, right? Postgres. Oh, does it?
So usually right now it's sold if there is some middleware, it's sold there.
So if you develop API using Postgres, for example, I think, maybe I'm mistaken, but I think there is a way.
If there is no way, it's a good idea to implement, actually.
Yeah, it would in URL of your HTTP request of RESTful API,
there is a command to run delete without any filters,
I think it will not do it.
It will not go to progress with it.
But inside database, I don't think it's going to change.
We tried, actually.
Yes, we did. Well, you you tried what do you mean you tried
we had a session yeah we had a patch andre submitted it but it was a small discussion and
chances are close to zero that it will be successful or somewhere or like no not in psql we wanted to have a postgres setting gc setting
to just allow administrators to forbid unspecified or i don't remember how we called it maybe
unspecified delete and update when you don't have where they wear clouds at all just like cool don't
do it there's truncate for if you want to leave right an update of whole table
i don't know yeah but again like right now it's in middleware usually sold in middleware you can
have triggers or something to protect yourself if you want like or this extension if you're on
self-managed postgres there is some extension for it i wanted to mention one thing back step back to to DDL my favorite example is
like do you think just adding a column you cannot put your traffic down or your
database down you can if it cannot acquire log it starts starts blocking
all queries to this table even select Like, I cannot acquire log, everyone else can wait, right?
Even selects who come with access share log,
like, don't touch this.
That's it.
And that's not good, right?
And this is super unexpected
and a super big surprise.
Yeah, especially because
you could even test that, right?
Like, you could even make a fork of production or clone it,
run that on its own in isolation where there is no lock,
and it runs super fast.
So you think, this is safe.
But even though it was super fast, it does require that heavy lock,
and that blocks everything else.
Definitely a gotcha. That's a great one.
Yeah, yeah.
And there are many others in the area of detail,
but this I find affecting every single project
which had at least some growth, like, I don't know,
like to 1,000 TPS, and it's already noticeable.
It's not noticeable in tiny projects.
Yeah, which I think is kind of part of the problem, right?
Or at least makes it worse, because anybody that gains experience on smaller projects as they grow,
gets some confidence that these things work or they're fine.
And then it changes at some point.
Exactly.
So you've got a great article on this, actually, I think that I can share about DDL.
It's all about DDL changes, I think.
Yeah, yeah, yeah.
This one is definitely included.
I think this one has a different blog post about it.
I have a few about DDLs.
Okay, what else?
Well, in that area of making big changes,
partly by mistake,
you know, like the delete or update without where,
do you think people would expect
to be able to undo something?
I'm not thinking just from, like,
people with experience of relational databases,
but the fact that if you do something like that
by mistake, that you have to go to your backups
to get that data back.
If I was in my early 20s or just out you know if i was in my
teens all other software i use has the ability to undo something if i do it by mistake so it
does feel slightly unexpected because potentially if i've grown up in that world that i can make
one mistake and not be able to undo it and what database you can i mean of course there are like
yeah i don't know time travel or if branching is supported you can. I mean, of course, there are like... Yeah, I don't know. Time travel or...
If branching is supported, you can...
But it's still like not super simple, right?
Well, yeah, I don't know.
I mean, there are several projects
which try to solve this problem.
And I mean, specific databases
supporting branching, for example,
or even other.
I think... Have you heard about this new company where Michael
Stonebreaker is participating?
I heard of it.
I don't know much about it.
DBOPS or how it's called?
I don't remember.
Not DBOPS.
DBOES.
DBOES.
Yes.
So I wonder if this feature will be inside it.
I don't know.
For me, it's not expected
I cannot name this
gotcha because it's
hard problem and we
didn't get used to it
to have it, to having it
because it's rare
but
maybe, like if we go
to young people mind
minds yeah, there are many gotchas for them right oh
this is working not as i expected so naming is hard just because it should avoid gotchas if you
chose some name you should think about other people and think like to be very predictable
this is the art of naming and we just discussed this vacuum
cost limit affecting analyze so yeah and uh in in young minds gorgeous because the gorgeous world
is much bigger because uh they don't understand the logic in many places how it's like like
common practice in many places right and that's like common practice in many places, right?
And that's why many more unexpected situations.
Yeah, fair.
Yeah, by the way, if you run PgDump on a very large database,
like one table after another, and then we deal with snapshot, right?
With snapshot, it's a repeatable read and snapshot.
Even if you use multiple workers for PgDump,
they are synchronized reading from the same snapshot.
But what if some table is dropped?
Will it be present in this table?
I'm not sure, actually.
I just think there is a gotcha here.
If during dump, there are some schema changes
can we have some
inconsistency in backups
not in backups, in dumps
by the way this
confusion was resolved recently
we didn't
mention it in the previous
discussions of confusions but
the pg dump page in is not saying it's for dumps or pg dump page is not saying that it's for backups
anymore so dumps are not backups so schema that's that's coming yeah in the 18 okay okay okay we will discuss this additionally okay but this is a big
confusion when the documentation says pgdump is a backup tool it's still so even in 16 and now we
have all rights to say it's like all people who said pgdump is not a backup tool we are right
because finally in 18 it's already changed i I think it won't be reverted,
I hope so, this patch. But back to our topic, pgdump can be expected to be a tool for backups
just because of documentation. Then you have quite a painful process of realizing it's very bad for backups. So it's a gotcha, which is already fixed in the future,
in Postgres 18. And if we, in this area, pg-restore doesn't run on a lice, which hurts
every second attempt to restore a big database, even with experienced people like I am.
Because again, it's like with PgUpgrade we discussed it. PgRestore, you restore huge dump,
you start working with it, but it's slow and in plans you see where you expect single like simple
index scan, you see bitmap scans, bitmap index, bitmap heap scan, because there is no vacuum, there is no analyze. It also should run
on vacuum to collect
visibility
maps. Or
it might not
be unreasonable to think that
maybe backups would include the statistics
and then restore would restore the
statistics. Physical backups
I agree. Logical backups
I disagree.
You don't include indexes which is derivative from data itself.
Statistics is also derivative of data.
So you include description like create index.
Maybe should be some word like analyze in the end.
I wanted to say create statistics but it doesn't analyze already.
There's such
a word.
Why restore?
And restore
could run
if we use
pg restore
hyphen j
eight for
example
eight parallel
jobs.
In this case
why
like
we are not
calculating
statistics.
Right now
we deal with like everyone who deals with some automation,
for example, for DBLab engine,
we need it when provisioning is logical.
So it's dump restore,
and we need to move as fast as possible
multiple jobs, multiple workers for dump,
multiple workers for BG restore.
And then, of course, vacuum DB,
in this case, hyphen hyphen analyze so to have vacuum as
well and also hyphen j but people forget about this and forget forgetting means like you expect
it will be fine but if you don't run analyze yourself ideally in multiple threads and multiple
workers using multiple workers in this, performance will be very bad
and people start to think, oh, Postgres is slow.
Have you heard about this?
Like I restore from dump and it's slow.
I have to admit, I haven't heard that complaint, no.
Okay, I'm hearing it
and I'm doing it myself almost 20 years
i know but no i'm not saying it doesn't exist i just haven't you know
maybe it's just not the category of problem maybe what happens when people experience that is they
don't notice online or either they don't notice or they don't post they just go try something else
yeah it's it's not a big deal if you restore and it's working
and then you want to deal with it one day later
because probably, likely, auto-vacuum will do it.
But if you expect to run fast queries immediately after restoration from a dump,
I'm talking about something which is 100% very well known to everyone
who is at least a little bit experienced.
But it doesn't make it non-gotcha.
It's still gotcha.
Yeah.
That BG Store doesn't calculate statistics and doesn't run vacuum.
Yeah.
Yeah.
And, I mean, it's quite remarkable.
I know we could go on with this list for quite a while
and there's like varying levels of gotchas but i like that i like the list you sent me
how how short and brief it is and i i really think for many other databases it would be a much much
longer list i listened to well you're listening to your experience of you of a couple of days of trying to use MySQL at the beginning.
And recently I was listening to an interview with Tom Lane.
And he mentioned checking out MySQL at the beginning and not liking a few things about it.
And that feels like a really common experience. And having worked with commercial databases like SQL Server and Oracle
that often move much faster in terms of adding features
that enterprise clients really want,
there's just so many more gotchas that result from processes like that.
So I imagine I could be wrong.
I think Postgres has lots of gotchas, lots of them.
But compared to other relational databases?
Yeah. It's a complex system.
All complex systems have a lot of gotchas.
Some of them are just very
niche, very
narrow and deep and so on.
Some of them are
nobody knows about them. For example,
for example,
select into.
This is what you can write
inside PLPG SQL, right?
So you write a function
or trigger
and you write select into some variable,
blah, blah, blah.
And then you think, okay, this select is
maybe it's complex select actually.
Subselect or, I don't know,
joints and so on. but in the end you
have a scholar or something and you select an into a variable inside plpg scale and then you think i
want to understand the plan or i want to just run it and try to troubleshoot and you put it
to psql what will happen if you put select into blah, blah, blah, and then some regular select from where order by?
You mean just an error?
No.
What happens?
It will create a table.
Oh, okay.
Because select into is one.
Sure, like it's create table.
This, I think, just should be removed,
like completely removed, deprecated. Select into creating create table. This, I think, just should be removed, like completely removed, deprecated.
Select into creating a table.
It's a DDL.
This is definitely a gotcha for everyone.
Yes.
Imagine all those parsers that just,
I know this is flawed for multiple reasons,
but look at any query
that starts with select and like, oh, it's a read query.
Well, yeah, usually we say select for update is blocking some, like acquiring a heavy log,
blocking all writes. Yes, blocking all writes on these rows, particular rows, not the whole
table. But select can create a table. And we have
create table as select already, and this is what all people use, I think, who need it.
So select into probably just some very old stuff that could be potentially removed, I
think.
If it's not in this SQL standard oh it's a good point if it is no chances right
yeah well probably not no cool anything else you wanted to make sure we covered
maybe it's enough I think we covered less than one percent of what we could this is my feeling
what what do you think well it depends what you count as gotchas and how niche you're
willing to go it turns out the more you know the more more gotchas you know about i guess
uh so i i could not resist and i checked the documentation sql standard user select into
to represent selecting values into a scalar variable of a host of a host program rather
than creating a new table it's's definitely post-guessism and
I think it should be removed.
Okay, cool. Good one.
Good, okay, good chat.
Maybe some kind of entertaining more than useful.
But I enjoyed it.
I enjoyed it. Yeah, absolutely.
And feel free to
send us any more that you've got
or that you've seen. Yeah, maybe we should
think about specific areas like we
did for for now and and maybe we should explore some areas from this perspective what can be
unexpected yeah for sure thanks so much then clive catch you next week you too bye