Postgres FM - Timestamps
Episode Date: December 9, 2022Here are links to a few things we mentioned: Date/Time Types (docs) Don’t use timestamp without time zone (wiki)  Date/Time functions and operators (docs)  Postgres AT TIME ZONE explain...ed (blog post by Bruce Momjian) Our episode on BRIN indexes  RUM indexes allballs (mailing list thread) ------------------------What did you like or not like? What should we discuss next time? Let us know by tweeting us on @samokhvalov / @michristofides / @PostgresFM, or by commenting on our Google doc.If you would like to share this episode, here's a good link (and thank you!)Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I'm Michael,
founder of PGMustard. This is my co-host Nikolai, founder of PostgresAI. Hey Nikolai,
what are we talking about today? Hi Michael, let's talk about something priceless, time,
timestamps, time zones, and so on. I like the priceless reference, that's great. Yeah,
so this is another listener request from quite a while ago. Let's keep this episode short. It's about time.
Oh, nice. Yeah, it's a great suggestion. Great request. It's one of our earliest requests,
actually. So it was specifically about timestamps in general and time math or arithmetic, maybe.
So yeah, it's quite broad, but also I guess we can take it in a few different directions.
Where did you want to start?
Let's start with the idea that timestamp with timezone should be always preferred over regular timestamp without timezone.
What do you think about this idea?
Yeah, so I guess we're jumping into the specific data types here on how we're storing these in the database.
And there's a couple Postgres lets you use
timestamp and timestamp tz so timestamp without time zone and timestamp with time zone I think
it's quite a confusing name and the reference I normally point people to is the Postgres wiki
they've got an excellent list that's quite aggressively titled called don't do this but one of the they it is
it's aggressively titled but it also includes information on when you when it might make sense
you know what are the exceptions but one of the one of the ones that has very few exceptions is
you probably always want timestamp tz when you're storing these right and uh since it's a couple of
more letters it's easier to write just timestamp and then realize that maybe it was not the best choice. But I would like to mention that usually, it's my perception, usually you think about timestamps as something bigger than the regular numbers, integers, especially if it's with something else like timezone, right? If you think about storage, you maybe think, okay, I maybe need to use regular numbers.
And of course, I don't want to bump into the limit of a four-byte integer, so I will choose
eight-byte integer, right?
So int8 or bigint.
And then you realize that regular timestamp is also eight bytes.
And then you realize that timestamp with timezone
is also 8 bytes. They are all
8 bytes. So the choice
should not be based on storage here.
BigInt,
timestamp, or timestamp.tz.
Timestamp with timezone.
Timestamp.tz is
post-visism because it's
standard way, SQL standard way, or data type is timestamp TZ is post-guisism because it's a standard way, SQL standard way,
or data type is timestamp with time zone for words.
So I personally try to encourage all engineers usually to choose, by default,
choose timestamp TZ, explaining that there is no storage benefits of regular timestamp.
Right?
Yeah, exactly. And there's loads of benefits of regular timestamp, right? Yeah, exactly.
And there's loads of benefits of the timestamp TZ.
So for example, if you're doing,
I guess we'll go on to arithmetic later,
but if we're calculating the difference
between a timestamp, at least in the UK,
I think in the US,
a lot of other countries observe daylight savings.
If we're doing arithmetic between two timestamps
and we're crossing a boundary like a daylight savings if we're doing arithmetic between two time stamps and what and we're
crossing a boundary like a daylight savings boundary we won't get the right answer if we
or we're more likely to have issues if we're using time stamp without time zone so postgres can
handle all of that stuff that complicated stuff the reasons people talk about times being difficult
that's that can all be handled by postgres if we're telling it that this is um
and i guess should we should we mention the name it's not the with time zone i think is it puts
people off in this confusing from the wiki i stole their description that timestamp is a bit like
a photograph of a calendar and a clock but you have have no context. And the timestamp with time zone, it knows,
firstly, it can convert that, it does convert that to UTC,
and it has the context of a point in time.
So it knows when that is rather than what it is.
Right, and the resulting value will depend on the context
which you can change in any session.
So one session sees one value, final value.
Another session sees another value, just because they are sitting in different time zones.
So that's why it's more flexible and convenient and so on.
Yeah.
So you've already touched on the storage efficiency.
Are there any other kind of performance related things that you want to talk
about? Actually, good question. I don't know. I never thought about timestamps that are less
performant. I never tried to measure it. It's a good question, but I think, I doubt there is a
big difference here. Based on my understanding of how they're stored, I don't see how they could be.
But yeah, let us know if you know differently. Of course, you need to do some arithmetic operation in runtime, if you have with timezone.
But I don't know.
I think it's very small.
Of course, it's an interesting exercise to compare and see some differences.
Maybe there are already these exercises in some blog posts.
I don't know but i wanted to mention a dangerous part of
sql standard which is operator called at time zone it's very dangerous it's the ugliest parts
the ugliest corners of this huge building called sql standard so it can be very tricky if you don't
know that is dangerous it's like mouse It's like nulls, right?
So nulls are dangerous and addTimeZone is dangerous.
If you check how it works and what is resulting data type for each value,
for timestamp, it will produce timestamp at timezone.
For timestamp at timezone, it will produce timestamp.
If you take timestamp without timezone and say addTimeZone,
you will have timestamp add some time zone.
If you take a timestamp add some time zone and say add time zone, it will produce timestamp without time zone.
And this is from standard, as I remember. So if you go there, just open the reference and do any moments with reference.
No moments without it.
Just a warning for you.
So performance, I didn't think we will discuss a lot of performance questions.
Maybe just the idea that, of course, it's perfectly indexable, both timestamp and timestamp.tz.
If you have such columns, you can index them.
But I had several times I had a case in my engineering practice when I needed to
have something like age. By the way, there is such function age, which you can give either
two timestamps to it and it will just calculate interval. There is interval data type,
it's a whole new world. Time ranges, intervals and so on. So you can give two points in time and
it will calculate a difference. Or you can just give one timestamp and it will compare it with
current time. And it will be just a regular age. So you say like my birthday is this and it gives you age as interval. But I wanted to store it this interval and then I wanted to index it several times. And always it was a question how to do it because you cannot have an index on something which involves time zone because it's volatile
right it depends on on the context on your session if you if you change time zones in session saying
site time zone to i don't know like europe berlin or something like this in this case you will or
new time zone by the way uh europe kiev there time zone. Yeah, but it's not a benefit from Postgres,
coming from Postgres itself, it's lower level. It's, sorry, I don't know libraries, but it's
coming from underlying software. So if you set it in session, the value will be different and
probably interval will be different. So you cannot index it easily. If you, for example, you want to
index expression, you can store time zone, but you want to index expression you store times but you want to index
to support fast lookups based on age and my workaround was i switched to different approach
to store time i based on unix timestamps and so on i just said okay but unix timestamps are limited on one end, on the beginning. 1970, maybe January 1st, I don't remember, but some point in time in 1970.
And then it's just a number of seconds past. Unix timestamp is a number of seconds past since then.
By the way, I wish in Postgres it would be easier to convert from timestamp to Unix timestamp back and forth. It's quite an expression.
You need to always to Google it,
Stack Overflow or something will help you, right?
It's memorize it, memorize it.
It takes like five years of practicing constantly to write it blindly.
So Unix timestamps are limited on one end.
They are technically limited because of capacity of integers on another end.
But my idea was, okay, I say our software will exist only until 2050.
So we have some line and we can see progress and we can have ranges.
And this is very stable already.
Of course, I lose time of benefits and flexibility, convenience, but I'm on perfect range of numbers.
This is our universe, right?
And we deal with only with points on this range and anything you do there, you can index
any expression, anything. so this is what my approach
is working around maybe there are other approaches to to deal with age and and still build an index
on some expression involving involving age but usually it's a trick and people bump into it and
think oh what to do so uh what else what else do you have in mind related to timestamps?
So, well, quickly, because I don't think we'll come back to performance, I guess this is a good
time to mention that obviously B-tree indexes work nicely for things that are sorted like this,
but we've also done an episode on brin indexes. So timestamps are probably the most common use of BRIN indexes that I've seen.
I think that's what they're mostly used for in the wild.
So, yeah, refer back to that maybe if that's of interest.
You mentioned intervals.
Should we cover those now?
Yeah, but one more comment about indexes.
There is a very popular type of index called GIN, right?
And there is a big problem that usually you have a full text search
and you want to order by some number or timestamp.
And to solve this, there is an index called RAM.
It's not present, I think, on RDS.
I remember I personally asked them to consider it.
But the problem with it, usually index size is
huge. I know SuperBase,
by the way, includes it, Ram.
I wonder what's
experienced there, because my experience was
I considered it several times
and every time I considered it,
Ram was
huge. So idea of Ram
is you have GIN plus you have
knowledge about some timestamp or Integer 8.
They originally developed only timestamp support, but I asked also to add Integer 8.
They did. It was many years ago.
So now it has both options.
So RAM is like GIN extended with some timestamp.
So now you can have in single index scan, you can have almost single index scan,
but you can have both operations covered full text search and order by some timestamp, which
is perfect for search in some comments. For example, you always usually want fresh comments
coming first, right? So it's a very great idea. I wish Postgres would be more included to core contribution models, maybe. But it's not. And it has issues with size and affecting performance and so on. But the idea is great. And the goal is great.
So also related to timestamps. So it's interesting that it injects timestamp to gin and it becomes
rum very strange right at the risk of going off topic i know gin is not named after the alcohol
and rum i think is a play on that that it is but there's also i heard i only read briefly about it
but an index type called vodka as well by by the same team. It was only in development
and never reached some production stage.
But RAM is quite finished, and you can consider it.
But I'm just saying that my personal experience was not perfect.
But it doesn't mean that there's no need.
Need is huge.
Usually, we end up having two types of plans.
Postgres either chooses to use full-text search and then order by memory, or it chooses to walk along primary key or index on timestamp at tz and then apply filtering in memory for
full-text search, which both are not perfect paths.
The problem is obvious and it's not
fully solved but with ram it's solved again like maybe you should try it so we should also mention
that there is a there is date and there is time and when some engineer creates something some
column call and calling this column time but it's's timestamp, it breaks my mind always.
Because, like, okay, this framework of naming in Postgres,
I, like, adapted it already, and I cannot see time when it's timestamp.
No, it's not time.
It also includes dates. At least let's say daytime.
By the way, maybe standard also has daytime.
I remember other database systems have daytime as data type. So there is date, there is time. By the way, maybe standard also has date time. I remember other database systems have date time as
data type. So there is date,
there is time. When we combine them,
it's timestamp, right?
Like it's full thing. But we
can have also a column of
type date, right?
Or of type time.
Why not?
Yeah, but I think date
gets... date's difficult
because of the
boundaries right
like what does
like midnight
and time
like I think
it's tricky
I think
personally I would
stick to timestamps
in general
but what
maybe I'm missing
it's
two
better in terms
two times better
in terms of
performance
in terms of
storage
because it's
four bytes
right so you can save some bytes if you want but you should remember of course Two times better in terms of storage because it's four bytes.
So you can save some bytes if you want.
But you should remember, of course, about alignment padding. If you place date and then integer 8, no benefits because four bytes will be filled by zeros.
So you should combine with our four byte column.
I don't use it often, honestly.
So I agree with you here here but also there is time and
sometimes we technically we can use timestamp as time saying like you can use timestamp instead of
date taking some time of day but also you will start thinking okay if it's time with time zone
it's another date can be different depending on time zone.
So it's becoming tricky.
So probably this is where you probably will try to choose timestamp
without time zone, just to be concrete.
Okay, it's UTC.
I can see that as UTC.
This is our date.
That's it.
Again, depends, right?
But if you think about time time technically you can use timestamp just
like choosing some date always like january 1st something right i don't know but it's not
it looks ugly right so time sometimes is needed yeah so there's there's time with time zone as
well and time and there's a really good note in the documentation and on that wiki page I mentioned earlier to say, basically, I think there's no exception.
They couldn't think of a single exception where you would want time with time zone.
They go into some details as to why that makes sense.
But time on its own makes sense in terms of like a photo of a clock or're if you're a calendar application or to-do list
application and you want the meat like let's say you want to go walk your dogs at 9 a.m every day
you want that to be 9 a.m in summer and maybe you want that to be 9 a.m in winter but it's like it's
it's the time of day not uh not a time in in with a date and so I think I can see more uses for that.
And I probably would store it as time
because it's not timestamp.
Yeah, maybe.
Also worth mentioning,
they're all both timestamp,
both of timestamps and both of times.
They have precision, how to say, modifier, right?
So you can, in parentheses,
you can say, I don't want microseconds.
Just put a zero there.
And it's also convenient.
I use it sometimes because it becomes easier to have predictable output without.
Just now, convert it to timestamp and zero in parenthesis gives you timestamp with seconds, but without milliseconds.
The wiki advises against it,
I think because it rounds rather than truncates.
So it's possible to round to a time in the future,
which might not be what you want.
But yeah, it's interesting.
I think they recommend truncate instead of date trunk.
Date trunk will give you, yeah, it will give you,
you can take seconds yeah right so yeah and everything else will be filled like with zeros right right or it just can be if you want
for example to take only date from like my small trick usually if i have timestamps i want only
date for example to then aggregate and have some report. I use left 10.
Left times 10 value, comma 10.
It gives you a date usually, but it's stable because we have four digits
for here and divider and so on and so on.
So left is much faster.
But of course, date rank day, for example, also works.
So it depends.
It depends.
But left function, I love it.
Of course, it works with text, but it's much shorter sometimes.
So, yeah.
But interesting that time is regular time.
Without time zone, it's also 8 bytes.
But with time zone, it's 12 bytes.
Yeah, I didn't understand that.
How could it be more?
Surprise.
I don't know why, but surprise.
So that's why I usually try to stick with timestamps as long as I can,
and only then I go to less, for me, to less popular data types.
But interval is even worse.
It's 16 bytes.
So why is that? Is it because it's
storing two typesets? Actually, I don't know.
Question
to our listeners. Leave a comment
if you're watching at YouTube.
Leave a comment. Please
explain why.
It's an interesting question. Well, interesting
but not like, again,
if you store it, you care. But you don't
have a lot of good options.
Interval is very good.
It's very powerful
and very flexible, very convenient.
I use it a lot.
Including for DBA practice.
For example, you want to
understand how long your
transaction or statement lasts,
or when last state change happened, and you want something like age,
you can use minus operator and you will have interval as output.
By the way, question to you.
What would you use if you select from PG set activity to see when transaction, like transaction age, for example?
Oh, I don't know.
How would you use it?
So there is an exact start, X, A, C, T, underscore start column.
And there is, of course, current timestamp.
So most people, not most people, many people use now function
minus exact start, but it's quite wrong.
Now function is very good in terms of performance because it gets timestamp only once in the beginning of your transaction.
So your select is also a transaction, right?
So now is calculated only in the beginning, but PSAT activity, it's not a normal table.
And values are constantly shifting.
And if it's quite big, while we read it, it's not stable.
We don't have snapshot there. So you will have shifted values.
You probably will end up with negative H, negative intervals. So what you should use is less performant function because
it will calculate current timestamp for each row. It's called clock timestamp. So clock
timestamp minus exact start and transaction start. And in this case, you won't have negative values,
which is good.
So yeah, small trick.
And sometimes it's also worth remembering
that now it's like boom and you have it for all.
But maybe it's not what you want.
Okay, good.
Anything else we wanted to cover?
I only had one more, which is only tangentially related,
but I heard some good advice a while back
that if you are adding a column that's Boolean,
that you're planning to use Boolean for,
consider using a timestamp.
So I think the good cases I've heard for this
are things like unsubscribed at,
or it's basically sometimes you don't necessarily need it
for the first set of features,
but it can be incredibly useful down the line.
Yeah, I also use it, this approach.
But you should remember storage overhead, of course,
because Boolean, of course, it's not one bit, it's one byte.
You already are eight times worse than you could be.
Most storage efficient approach would be like spend one byte for eight checkboxes, right?
And then work with bit operations.
But, okay, if you have Boolean column, it's one byte.
Of course, if you pack,
if you have eight of them,
they will occupy eight bytes.
It's good.
If you have Boolean and then timestamp or integer eight,
you will have seven bytes lost anyway.
But in this case, of course,
switching to timestamp won't make any difference in terms of storage overhead, right?
Yeah, but it was more like, yeah, obviously you're right in terms of storage.
But it's more of like a, I think it's quite a good thinking exercise when you're designing a new feature,
just to think, could there be benefits down the line of storing this in a different format?
When you are designing, think about if you need to remember when checkboxes are pressed, clicked, or checked.
So, yeah, it depends on the application, of course.
Oh, I wanted to mention these old balls.
You know old balls? What is old balls? Oh, I wanted to mention this all balls, you know, all balls. What is all balls?
Oh, yes. The time.
Zero, zero, zero.
Zero, zero, zero time. UTC. Right. So, yeah, it's kind of like interesting joke, maybe. Right.
But it's committed, right? It works, of course. You can say, oh, by the way, how to get current time minus one day?
Like now, I write one day in commas, not in commas, in quotes,
single quotes,
and then convert it to interval,
and that's it.
Or say minus interval,
and then string literal one day.
It works perfectly.
Or one hour,
like it works.
But also there is,
so wall balls,
it's like also in single quotes,
you convert it to time,
and have this time.
But like, it's kind of a joke but
what is more useful is infinity minus infinity and plus infinity right this is useful you can
have a timestamp infinity and i can imagine very good use cases when it's like very very helpful to develop good application behavior that's a really good
point so i um there's i feel like i've mentioned it too many times already but that wiki article i
think also includes not using or between probably not being what you want when it comes to
time stamps and you probably want less than one and greater than another and that's i think that's where i've seen infinity used most so you might want to be selecting all times from sometime in the future to positive
infinity or from sometime in the past uh but less than uh minus infinity right right agree good so
what else or that's it We wanted to keep it short.
That's all I had.
Yeah.
Then let's wrap up and call it a day, right?
Yeah.
Fabulous.
Thanks so much.
Thanks everyone for listening.
Keep the suggestions coming and see you next week.
Thank you.
Bye-bye.