Postgres FM - Timestamps

Episode Date: December 9, 2022

Here are links to a few things we mentioned: Date/Time Types (docs) Don’t use timestamp without time zone (wiki)  Date/Time functions and operators (docs)  Postgres AT TIME ZONE explain...ed (blog post by Bruce Momjian) Our episode on BRIN indexes  RUM indexes allballs (mailing list thread) ------------------------What did you like or not like? What should we discuss next time? Let us know by tweeting us on @samokhvalov / @michristofides / @PostgresFM, or by commenting on our Google doc.If you would like to share this episode, here's a good link (and thank you!)Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I'm Michael, founder of PGMustard. This is my co-host Nikolai, founder of PostgresAI. Hey Nikolai, what are we talking about today? Hi Michael, let's talk about something priceless, time, timestamps, time zones, and so on. I like the priceless reference, that's great. Yeah, so this is another listener request from quite a while ago. Let's keep this episode short. It's about time. Oh, nice. Yeah, it's a great suggestion. Great request. It's one of our earliest requests, actually. So it was specifically about timestamps in general and time math or arithmetic, maybe. So yeah, it's quite broad, but also I guess we can take it in a few different directions.
Starting point is 00:00:44 Where did you want to start? Let's start with the idea that timestamp with timezone should be always preferred over regular timestamp without timezone. What do you think about this idea? Yeah, so I guess we're jumping into the specific data types here on how we're storing these in the database. And there's a couple Postgres lets you use timestamp and timestamp tz so timestamp without time zone and timestamp with time zone I think it's quite a confusing name and the reference I normally point people to is the Postgres wiki they've got an excellent list that's quite aggressively titled called don't do this but one of the they it is
Starting point is 00:01:26 it's aggressively titled but it also includes information on when you when it might make sense you know what are the exceptions but one of the one of the ones that has very few exceptions is you probably always want timestamp tz when you're storing these right and uh since it's a couple of more letters it's easier to write just timestamp and then realize that maybe it was not the best choice. But I would like to mention that usually, it's my perception, usually you think about timestamps as something bigger than the regular numbers, integers, especially if it's with something else like timezone, right? If you think about storage, you maybe think, okay, I maybe need to use regular numbers. And of course, I don't want to bump into the limit of a four-byte integer, so I will choose eight-byte integer, right? So int8 or bigint. And then you realize that regular timestamp is also eight bytes.
Starting point is 00:02:26 And then you realize that timestamp with timezone is also 8 bytes. They are all 8 bytes. So the choice should not be based on storage here. BigInt, timestamp, or timestamp.tz. Timestamp with timezone. Timestamp.tz is
Starting point is 00:02:41 post-visism because it's standard way, SQL standard way, or data type is timestamp TZ is post-guisism because it's a standard way, SQL standard way, or data type is timestamp with time zone for words. So I personally try to encourage all engineers usually to choose, by default, choose timestamp TZ, explaining that there is no storage benefits of regular timestamp. Right? Yeah, exactly. And there's loads of benefits of regular timestamp, right? Yeah, exactly. And there's loads of benefits of the timestamp TZ.
Starting point is 00:03:08 So for example, if you're doing, I guess we'll go on to arithmetic later, but if we're calculating the difference between a timestamp, at least in the UK, I think in the US, a lot of other countries observe daylight savings. If we're doing arithmetic between two timestamps and we're crossing a boundary like a daylight savings if we're doing arithmetic between two time stamps and what and we're
Starting point is 00:03:25 crossing a boundary like a daylight savings boundary we won't get the right answer if we or we're more likely to have issues if we're using time stamp without time zone so postgres can handle all of that stuff that complicated stuff the reasons people talk about times being difficult that's that can all be handled by postgres if we're telling it that this is um and i guess should we should we mention the name it's not the with time zone i think is it puts people off in this confusing from the wiki i stole their description that timestamp is a bit like a photograph of a calendar and a clock but you have have no context. And the timestamp with time zone, it knows, firstly, it can convert that, it does convert that to UTC,
Starting point is 00:04:11 and it has the context of a point in time. So it knows when that is rather than what it is. Right, and the resulting value will depend on the context which you can change in any session. So one session sees one value, final value. Another session sees another value, just because they are sitting in different time zones. So that's why it's more flexible and convenient and so on. Yeah.
Starting point is 00:04:37 So you've already touched on the storage efficiency. Are there any other kind of performance related things that you want to talk about? Actually, good question. I don't know. I never thought about timestamps that are less performant. I never tried to measure it. It's a good question, but I think, I doubt there is a big difference here. Based on my understanding of how they're stored, I don't see how they could be. But yeah, let us know if you know differently. Of course, you need to do some arithmetic operation in runtime, if you have with timezone. But I don't know. I think it's very small.
Starting point is 00:05:15 Of course, it's an interesting exercise to compare and see some differences. Maybe there are already these exercises in some blog posts. I don't know but i wanted to mention a dangerous part of sql standard which is operator called at time zone it's very dangerous it's the ugliest parts the ugliest corners of this huge building called sql standard so it can be very tricky if you don't know that is dangerous it's like mouse It's like nulls, right? So nulls are dangerous and addTimeZone is dangerous. If you check how it works and what is resulting data type for each value,
Starting point is 00:05:57 for timestamp, it will produce timestamp at timezone. For timestamp at timezone, it will produce timestamp. If you take timestamp without timezone and say addTimeZone, you will have timestamp add some time zone. If you take a timestamp add some time zone and say add time zone, it will produce timestamp without time zone. And this is from standard, as I remember. So if you go there, just open the reference and do any moments with reference. No moments without it. Just a warning for you.
Starting point is 00:06:30 So performance, I didn't think we will discuss a lot of performance questions. Maybe just the idea that, of course, it's perfectly indexable, both timestamp and timestamp.tz. If you have such columns, you can index them. But I had several times I had a case in my engineering practice when I needed to have something like age. By the way, there is such function age, which you can give either two timestamps to it and it will just calculate interval. There is interval data type, it's a whole new world. Time ranges, intervals and so on. So you can give two points in time and it will calculate a difference. Or you can just give one timestamp and it will compare it with
Starting point is 00:07:19 current time. And it will be just a regular age. So you say like my birthday is this and it gives you age as interval. But I wanted to store it this interval and then I wanted to index it several times. And always it was a question how to do it because you cannot have an index on something which involves time zone because it's volatile right it depends on on the context on your session if you if you change time zones in session saying site time zone to i don't know like europe berlin or something like this in this case you will or new time zone by the way uh europe kiev there time zone. Yeah, but it's not a benefit from Postgres, coming from Postgres itself, it's lower level. It's, sorry, I don't know libraries, but it's coming from underlying software. So if you set it in session, the value will be different and probably interval will be different. So you cannot index it easily. If you, for example, you want to index expression, you can store time zone, but you want to index expression you store times but you want to index
Starting point is 00:08:26 to support fast lookups based on age and my workaround was i switched to different approach to store time i based on unix timestamps and so on i just said okay but unix timestamps are limited on one end, on the beginning. 1970, maybe January 1st, I don't remember, but some point in time in 1970. And then it's just a number of seconds past. Unix timestamp is a number of seconds past since then. By the way, I wish in Postgres it would be easier to convert from timestamp to Unix timestamp back and forth. It's quite an expression. You need to always to Google it, Stack Overflow or something will help you, right? It's memorize it, memorize it. It takes like five years of practicing constantly to write it blindly.
Starting point is 00:09:19 So Unix timestamps are limited on one end. They are technically limited because of capacity of integers on another end. But my idea was, okay, I say our software will exist only until 2050. So we have some line and we can see progress and we can have ranges. And this is very stable already. Of course, I lose time of benefits and flexibility, convenience, but I'm on perfect range of numbers. This is our universe, right? And we deal with only with points on this range and anything you do there, you can index
Starting point is 00:10:03 any expression, anything. so this is what my approach is working around maybe there are other approaches to to deal with age and and still build an index on some expression involving involving age but usually it's a trick and people bump into it and think oh what to do so uh what else what else do you have in mind related to timestamps? So, well, quickly, because I don't think we'll come back to performance, I guess this is a good time to mention that obviously B-tree indexes work nicely for things that are sorted like this, but we've also done an episode on brin indexes. So timestamps are probably the most common use of BRIN indexes that I've seen. I think that's what they're mostly used for in the wild.
Starting point is 00:10:50 So, yeah, refer back to that maybe if that's of interest. You mentioned intervals. Should we cover those now? Yeah, but one more comment about indexes. There is a very popular type of index called GIN, right? And there is a big problem that usually you have a full text search and you want to order by some number or timestamp. And to solve this, there is an index called RAM.
Starting point is 00:11:18 It's not present, I think, on RDS. I remember I personally asked them to consider it. But the problem with it, usually index size is huge. I know SuperBase, by the way, includes it, Ram. I wonder what's experienced there, because my experience was I considered it several times
Starting point is 00:11:35 and every time I considered it, Ram was huge. So idea of Ram is you have GIN plus you have knowledge about some timestamp or Integer 8. They originally developed only timestamp support, but I asked also to add Integer 8. They did. It was many years ago. So now it has both options.
Starting point is 00:11:57 So RAM is like GIN extended with some timestamp. So now you can have in single index scan, you can have almost single index scan, but you can have both operations covered full text search and order by some timestamp, which is perfect for search in some comments. For example, you always usually want fresh comments coming first, right? So it's a very great idea. I wish Postgres would be more included to core contribution models, maybe. But it's not. And it has issues with size and affecting performance and so on. But the idea is great. And the goal is great. So also related to timestamps. So it's interesting that it injects timestamp to gin and it becomes rum very strange right at the risk of going off topic i know gin is not named after the alcohol and rum i think is a play on that that it is but there's also i heard i only read briefly about it
Starting point is 00:13:00 but an index type called vodka as well by by the same team. It was only in development and never reached some production stage. But RAM is quite finished, and you can consider it. But I'm just saying that my personal experience was not perfect. But it doesn't mean that there's no need. Need is huge. Usually, we end up having two types of plans. Postgres either chooses to use full-text search and then order by memory, or it chooses to walk along primary key or index on timestamp at tz and then apply filtering in memory for
Starting point is 00:13:40 full-text search, which both are not perfect paths. The problem is obvious and it's not fully solved but with ram it's solved again like maybe you should try it so we should also mention that there is a there is date and there is time and when some engineer creates something some column call and calling this column time but it's's timestamp, it breaks my mind always. Because, like, okay, this framework of naming in Postgres, I, like, adapted it already, and I cannot see time when it's timestamp. No, it's not time.
Starting point is 00:14:18 It also includes dates. At least let's say daytime. By the way, maybe standard also has daytime. I remember other database systems have daytime as data type. So there is date, there is time. By the way, maybe standard also has date time. I remember other database systems have date time as data type. So there is date, there is time. When we combine them, it's timestamp, right? Like it's full thing. But we can have also a column of
Starting point is 00:14:36 type date, right? Or of type time. Why not? Yeah, but I think date gets... date's difficult because of the boundaries right like what does
Starting point is 00:14:48 like midnight and time like I think it's tricky I think personally I would stick to timestamps in general
Starting point is 00:14:55 but what maybe I'm missing it's two better in terms two times better in terms of performance
Starting point is 00:15:01 in terms of storage because it's four bytes right so you can save some bytes if you want but you should remember of course Two times better in terms of storage because it's four bytes. So you can save some bytes if you want. But you should remember, of course, about alignment padding. If you place date and then integer 8, no benefits because four bytes will be filled by zeros. So you should combine with our four byte column.
Starting point is 00:15:21 I don't use it often, honestly. So I agree with you here here but also there is time and sometimes we technically we can use timestamp as time saying like you can use timestamp instead of date taking some time of day but also you will start thinking okay if it's time with time zone it's another date can be different depending on time zone. So it's becoming tricky. So probably this is where you probably will try to choose timestamp without time zone, just to be concrete.
Starting point is 00:15:54 Okay, it's UTC. I can see that as UTC. This is our date. That's it. Again, depends, right? But if you think about time time technically you can use timestamp just like choosing some date always like january 1st something right i don't know but it's not it looks ugly right so time sometimes is needed yeah so there's there's time with time zone as
Starting point is 00:16:22 well and time and there's a really good note in the documentation and on that wiki page I mentioned earlier to say, basically, I think there's no exception. They couldn't think of a single exception where you would want time with time zone. They go into some details as to why that makes sense. But time on its own makes sense in terms of like a photo of a clock or're if you're a calendar application or to-do list application and you want the meat like let's say you want to go walk your dogs at 9 a.m every day you want that to be 9 a.m in summer and maybe you want that to be 9 a.m in winter but it's like it's it's the time of day not uh not a time in in with a date and so I think I can see more uses for that. And I probably would store it as time
Starting point is 00:17:07 because it's not timestamp. Yeah, maybe. Also worth mentioning, they're all both timestamp, both of timestamps and both of times. They have precision, how to say, modifier, right? So you can, in parentheses, you can say, I don't want microseconds.
Starting point is 00:17:24 Just put a zero there. And it's also convenient. I use it sometimes because it becomes easier to have predictable output without. Just now, convert it to timestamp and zero in parenthesis gives you timestamp with seconds, but without milliseconds. The wiki advises against it, I think because it rounds rather than truncates. So it's possible to round to a time in the future, which might not be what you want.
Starting point is 00:17:56 But yeah, it's interesting. I think they recommend truncate instead of date trunk. Date trunk will give you, yeah, it will give you, you can take seconds yeah right so yeah and everything else will be filled like with zeros right right or it just can be if you want for example to take only date from like my small trick usually if i have timestamps i want only date for example to then aggregate and have some report. I use left 10. Left times 10 value, comma 10. It gives you a date usually, but it's stable because we have four digits
Starting point is 00:18:36 for here and divider and so on and so on. So left is much faster. But of course, date rank day, for example, also works. So it depends. It depends. But left function, I love it. Of course, it works with text, but it's much shorter sometimes. So, yeah.
Starting point is 00:18:57 But interesting that time is regular time. Without time zone, it's also 8 bytes. But with time zone, it's 12 bytes. Yeah, I didn't understand that. How could it be more? Surprise. I don't know why, but surprise. So that's why I usually try to stick with timestamps as long as I can,
Starting point is 00:19:15 and only then I go to less, for me, to less popular data types. But interval is even worse. It's 16 bytes. So why is that? Is it because it's storing two typesets? Actually, I don't know. Question to our listeners. Leave a comment if you're watching at YouTube.
Starting point is 00:19:38 Leave a comment. Please explain why. It's an interesting question. Well, interesting but not like, again, if you store it, you care. But you don't have a lot of good options. Interval is very good. It's very powerful
Starting point is 00:19:53 and very flexible, very convenient. I use it a lot. Including for DBA practice. For example, you want to understand how long your transaction or statement lasts, or when last state change happened, and you want something like age, you can use minus operator and you will have interval as output.
Starting point is 00:20:18 By the way, question to you. What would you use if you select from PG set activity to see when transaction, like transaction age, for example? Oh, I don't know. How would you use it? So there is an exact start, X, A, C, T, underscore start column. And there is, of course, current timestamp. So most people, not most people, many people use now function minus exact start, but it's quite wrong.
Starting point is 00:20:49 Now function is very good in terms of performance because it gets timestamp only once in the beginning of your transaction. So your select is also a transaction, right? So now is calculated only in the beginning, but PSAT activity, it's not a normal table. And values are constantly shifting. And if it's quite big, while we read it, it's not stable. We don't have snapshot there. So you will have shifted values. You probably will end up with negative H, negative intervals. So what you should use is less performant function because it will calculate current timestamp for each row. It's called clock timestamp. So clock
Starting point is 00:21:37 timestamp minus exact start and transaction start. And in this case, you won't have negative values, which is good. So yeah, small trick. And sometimes it's also worth remembering that now it's like boom and you have it for all. But maybe it's not what you want. Okay, good. Anything else we wanted to cover?
Starting point is 00:22:02 I only had one more, which is only tangentially related, but I heard some good advice a while back that if you are adding a column that's Boolean, that you're planning to use Boolean for, consider using a timestamp. So I think the good cases I've heard for this are things like unsubscribed at, or it's basically sometimes you don't necessarily need it
Starting point is 00:22:31 for the first set of features, but it can be incredibly useful down the line. Yeah, I also use it, this approach. But you should remember storage overhead, of course, because Boolean, of course, it's not one bit, it's one byte. You already are eight times worse than you could be. Most storage efficient approach would be like spend one byte for eight checkboxes, right? And then work with bit operations.
Starting point is 00:23:02 But, okay, if you have Boolean column, it's one byte. Of course, if you pack, if you have eight of them, they will occupy eight bytes. It's good. If you have Boolean and then timestamp or integer eight, you will have seven bytes lost anyway. But in this case, of course,
Starting point is 00:23:22 switching to timestamp won't make any difference in terms of storage overhead, right? Yeah, but it was more like, yeah, obviously you're right in terms of storage. But it's more of like a, I think it's quite a good thinking exercise when you're designing a new feature, just to think, could there be benefits down the line of storing this in a different format? When you are designing, think about if you need to remember when checkboxes are pressed, clicked, or checked. So, yeah, it depends on the application, of course. Oh, I wanted to mention these old balls. You know old balls? What is old balls? Oh, I wanted to mention this all balls, you know, all balls. What is all balls?
Starting point is 00:24:07 Oh, yes. The time. Zero, zero, zero. Zero, zero, zero time. UTC. Right. So, yeah, it's kind of like interesting joke, maybe. Right. But it's committed, right? It works, of course. You can say, oh, by the way, how to get current time minus one day? Like now, I write one day in commas, not in commas, in quotes, single quotes, and then convert it to interval, and that's it.
Starting point is 00:24:48 Or say minus interval, and then string literal one day. It works perfectly. Or one hour, like it works. But also there is, so wall balls, it's like also in single quotes,
Starting point is 00:25:01 you convert it to time, and have this time. But like, it's kind of a joke but what is more useful is infinity minus infinity and plus infinity right this is useful you can have a timestamp infinity and i can imagine very good use cases when it's like very very helpful to develop good application behavior that's a really good point so i um there's i feel like i've mentioned it too many times already but that wiki article i think also includes not using or between probably not being what you want when it comes to time stamps and you probably want less than one and greater than another and that's i think that's where i've seen infinity used most so you might want to be selecting all times from sometime in the future to positive
Starting point is 00:25:52 infinity or from sometime in the past uh but less than uh minus infinity right right agree good so what else or that's it We wanted to keep it short. That's all I had. Yeah. Then let's wrap up and call it a day, right? Yeah. Fabulous. Thanks so much.
Starting point is 00:26:12 Thanks everyone for listening. Keep the suggestions coming and see you next week. Thank you. Bye-bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.