Postgres FM - Data types

Episode Date: August 4, 2023

Nikolay and Michael discuss data types in PostgreSQL — including system types, choosing between types, types provided by extensions, and more.  Here are some links to some things they men...tioned:Data Types (docs) https://www.postgresql.org/docs/current/datatype.html 10 tips for beginners https://postgres.ai/blog/20230722-10-postgres-tips-for-beginners Tid Scan (explain glossary) https://www.pgmustard.com/docs/explain/tid-scan Don’t do this (wiki) https://wiki.postgresql.org/wiki/Don't_Do_This Boundless `text` and back again https://brandur.org/text UUID episode https://postgres.fm/episodes/uuid I use ENUM (30min talk by Boriss Mejías) https://archive.fosdem.org/2021/schedule/event/postgresql_i_use_enum_vindicating_the_underdog_of_data_types/ Peter Geoghegan tweet https://twitter.com/petervgeoghegan/status/1680275871905775616 JSON episode https://postgres.fm/episodes/json pg_repack reorder columns discussion https://github.com/reorg/pg_repack/issues/101 Use bigint https://blog.rustprooflabs.com/2021/06/postgres-bigint-by-default~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to PostgresFM, a week's show about all things PostgresQL. I am Michael, founder of PGMuster. This is my co-host Nikolai, founder of PostgresAI. Hey Nikolai, how's it going? Hi Michael, all good. How are you? So what's the topic today? The topic is data types. What kind of data types? We're going to try and keep this to half an hour, so we're going to very much have to do a 101 intro to data types but i think you've got
Starting point is 00:00:26 some interesting things you want to go into first right yeah well so data types is a huge topic and we probably should just try to provide an overview and some entry points as usual because for podcast it's impossible if you want to squeeze it into 30 minutes, it's impossible to cover all the topics. Postgres has a huge number of data types supported and extensions bring even more and more data types, obviously. Yeah, I've heard it described as one of Postgres' main strengths, one of the biggest reasons to choose Postgres, which I completely agree with. Did you want to talk about system types to begin with? Yeah, yeah. So before we go really wide, we should discuss, this is my idea, it was my idea, let's discuss data types each Postgres always has because they are internal. And I encourage
Starting point is 00:01:20 touch some data types even if you just start working with Postgres. A recent episode with tips for beginners and a related article I posted, it mentioned CTID data type, which is quite useful to understand how MVCC, multiversion concurrency control works in Postgres and to start understanding versions of rows and like to feel how vacuum works basically and so on and you just this is just a hidden column in any table and it provides you two numbers so it's like composite type it you two numbers which tell you location of a tuple. So page number starting from, I guess, one. Or I always forget. One or zero.
Starting point is 00:02:14 So page number and offset inside the page. And the interesting trick, as I usually say, you can extract one of parts if you just convert it to point first and then address the first or second part of this value it's just a trick which you can use to do some i don't know like some statistics to learn how your data is currently organized and of course if you perform some updates insert deletes or auto vacuum or manual vacuum processes the table, of course, physical layout changes and some rows can migrate to different pages. This is one of the system types which is worth knowing and using from time to time. Yeah, I really liked your tips on using this
Starting point is 00:03:00 to get an understanding for how things work at the disk level. So understanding tuples are row versions using this to get an understanding for how things work at the disk level so understanding tuples are row versions and that they can move and that they get a new ctid when they've moved like if they've been updated even if it's a like a no op type update they can move that was really good the first time i actually came across this was i did a bunch like i did a whole glossary for explain and all the different operations and fields you could see in explain. And I came across a TID scan. And I was like, what is this?
Starting point is 00:03:31 And it was so interesting to read up on it. It's generally only used, well, it's like an internal thing. You can use it, but it's generally a bad idea because tuples can move. So, yeah, that was my first experience with it. Right. tuples can move so yeah that was my first experience with it right also there are a few more data types you probably want to at least know they exist and i'm sure at some point they will be useful for any even like backend engineer not just for dbas they of course all of these things are implementation details and it's like kind of internals, but it's very useful to know about them. For example, LSN, there is a data type called PG-LSN and it's kind of our moment in time.
Starting point is 00:04:14 So it always grows, grows, grows, grows. And interesting thing to remember, if you have two LSNs, you can subtract one from another and you will have delta measured in bytes. So it's really easy, just use operator minus and that's it. Of course there is some function pglsnd for something, but anyway, delta of LSNs is log sequence number, it's position in wall and it's constantly progressing. Of course there is another value, xid, transaction ID, you can also play with it it's like regular integer so
Starting point is 00:04:55 you can work with it as well but LSN is very interesting so you can understand how much data you produce in terms of wall data all writes go to wall, obviously. Also, vacuum can write to wall and shift LSN as well. And maybe final thing from me is a pair of useful data types, also internals, OID and reg class. So each database object has OID, it's like object ID inside database.
Starting point is 00:05:27 And there are some OIDs which are hard coded in Postgres code. For example, when new system table, system view or new data type in core is added, it has assigned OID, which is just written source code. And even there is a function, there is a script, a helper script, which helps Postgres developers, Postgres hackers, to find unused OIDs that can be used.
Starting point is 00:05:57 There are some gaps sometimes. So OID, for example, if you create a table, it immediately has OID assigned assigned and you can select it from system view called pg class so you select pg class where real name equals your table name and you can find oid and also there is a type called rec class which corresponds to records in pg class and all tables and indexes are there also views materialized views they all are there and interesting trick is that you can take oid and just convert it to regular table name or index name just saying colon colon rec class this is interesting this is the shortest way to get table name from OID.
Starting point is 00:06:45 And vice versa, of course, you can convert it back to OID. So it's useful sometimes to deal with some catalogs and to find, for example, if you have a very wide rows for some table, you know, there is Toast, we had episode about Toast. When you have Toast, you sometimes need to convert table name to... Toast table has OID of main table inside its table name. So you need to play with it and understanding that it's so easy to convert to OID and back to relation name. So relation, it's kind of overused term, but almost anything is a relation. I mean, table is a relation, even index is a relation, and they all go to pgClass. So you can convert
Starting point is 00:07:28 back and forth just remembering type names. And the most difficult is to remember regClass. I tend to forget it. But I use it quite a lot as well. I came across it recently when I was hosting the PG SQL Friday event and somebody, I think it was Diane Fay, used it in their query for PG stat statements as a really neat way of casting. So actually casting
Starting point is 00:07:52 is the important word there, isn't it? The colon colon you mentioned is used to cast from one data type to another. So it's really cool, really powerful. Yeah, and it's postgresism because the standard way is say cast. Cast something as something. I never use it because it's Postgres-ism because the standard way is say cast. Cast something as something. I never use it because it's too bloated. It's better to just write colon colon because I
Starting point is 00:08:11 definitely know I'm not going to migrate from Postgres to anywhere. Nice. In the interest of time, should we move on? Yeah. Let's talk about basic atomic data types like numbers and texts and dates and times and so on where to start numbers or text i think text because i thought this was quite like a nail when i first started learning about postgres the advice i read felt like really sensible and i've got on board with it and then the more i learn the more i realize it's like a little bit more complex. So I feel like this might be one of the more interesting ones. So the reason that I think it's interesting is, unlike a lot of relational databases, Postgres has a text type and a lot of the other databases rely on character.
Starting point is 00:09:03 Yeah, exactly. Exactly. databases rely on character yeah exactly exactly and character limits are really common in a lot of when people are using oracle sql server and others and i think there's a few guides out there and a few even the wiki in fact i was going to mention this later but the don't do this page on the wiki is brilliant for data types so i'll link that up but one of their advice is generally to use text and they go into good reasons as to when it may or may not make sense but yeah how do you feel about that is it the same wiki page which says don't use money yep and don't use timestamp use timestamp easy don't use money sounds funny actually i i yeah well inside Postgres I tend to follow this advice, but outside I prefer
Starting point is 00:09:47 not to. So it sounds like communism, you know. But as for text, first of all, Postgres supports varchar char as well, like standard, so standard type support. But with some addition, varchar can be used without limits as well, You can say VARCHAR. And it will behave similar to text. And on Stack Overflow, there is a question where I think Peter Eisentraut commented that there is very, very small difference in terms of performance because there is some additional code. Well, I have mixed opinion here. Because first of all, of course, it's good to care about performance but it's also good to care about the quality of data and when we talk about quality of data we need the constraints
Starting point is 00:10:32 and varchar offers such such constraint and actually if you define varchar 100 and then next day you decide oh i need 200 200 characters It's just an alter, and it will be quick. Oh, it's no luck, even if you've got a lot of data in there. From 100 to 200, it's obvious we don't need to scan table, right? Well, in theory, but not always things that are in theory possible are true. I recently checked. I was surprised it's not scanning table. Well, maybe I'm wrong, actually. But it's easy to test. You generate a huge table, you start observing logs, measure time, and so on, and you see if it scans table
Starting point is 00:11:14 or not. I suspect in one way it won't, if you increase it. We could spend the rest of this episode on this specific topic. I remember a really good blog post on this. Maybe it's outdated. Well, Brander did a really good in-depth blog post on this. Pardon me? How many years ago? Only recently. Depends on version, probably. I don't know.
Starting point is 00:11:37 But recently, I was explaining to someone that it's so, as you say. And I wanted to, as usual, when i say something i i first of all i need to double check myself so i do experiment always experiment so just check it and see and i need it also to demonstrate with real experiments so i made experiment and i was surprised that it didn't perform it was very fast in this way like from 100 to 200, increasing this limit. So again, it's better not to memorize these things, but know how to check these things. It's always good to check and it's always also good to have some testing system which will not let you deploy bad change, which will block table for long.
Starting point is 00:12:27 True. Right, so we discussed that with database experiments, testing, and so on also. So it's that area. Well, okay, to use text, but my question is how are you going to control the length? Many people tend to control it on application side, and many years ago I was criticizing them a lot because I was saying, you know, look, only database can control constraints reliably. Because application, like, soon you can have several applications and you need to copy paste this logic in different languages, probably, right? It's not good.
Starting point is 00:13:02 Database can control this and so on. But they had reasons, for example, painful schema changes. Because I get it. And for example, if you tell me you are going to use enum, even with recent support, I think in Postgres 13 or something, it was added, you can alter it saying I want one more value in enum i will say okay what if tomorrow you will decide to remove one value and you know your table doesn't have it anymore and it's terrible so instead of enum probably it's better to use just check constraints and know how to cook them you do it in two phases, you define check constraint with not valid flag.
Starting point is 00:13:47 After each, some people think it doesn't work if you say not valid, but it works for new values. So all inserts and updates will already be checked for this check constraint, even if it's marked as not valid. But not valid means we skip checking existing data and then you know in different transaction you say alter table alter constraint validate validate constraint and in this case you don't block anyone and it's so it's so good and in this case you can create another constraint with this approach and drop old constraint. And this is how you migrate to a different set of values to be checked. So I prefer this recipe instead of using enum. And similar here, well, I honestly also use text in many cases,
Starting point is 00:14:37 but it's not good when you want to store email and then you forget in your application, you forget to check it, and then someone inserts one gigabyte of data because Postgres supports up to one gigabyte inside one value. It's huge, right? So I question, can you use check constraint for the length then? Why is it different? Exactly. Yeah, you can. And altering them probably is easier because it can be fully automated. You just define in two phases.
Starting point is 00:15:04 You define new constraint drop all along. Good about this approach is that you can work with any data type, even with JSON. You can take JSON and say, I have check constraint. This JSON always must have this key. Well, you can build even JSON schema around it. And I think SuperBase guys did something like they have extension probably for kind of JSON schema to put some constraints. You know, sometimes you want, again, for data quality, you want, if you work with JSON, you still want to have it like semi-structured.
Starting point is 00:15:40 You want to have some keys existing or some values inside keys or some arrays, objects, I mean, and so on. And in this case, you can define some check constraints that will prohibit inserting data you don't want to be inserted. And I mean, this recipe can work with any data type. This is the good side of this recipe. Yeah. And we know how to deploy without blocking.
Starting point is 00:16:04 I think I agree with you on the enums front. I did once see a really good, really entertaining talk by Boris Meihas that I liked, trying to defend the enums, trying to say all the positive things about them. But I think overall, I agree with you. Yeah, exactly. It was a valiant effort, I think. Do you want to move on to, like, numeric types? Numeric types. Okay, yeah, sure.
Starting point is 00:16:27 One of, again, one of recent tips I had, again, in beginners topic as well, we had it. Don't use integer for primary keys. Yeah. Just don't. And what else? Like, numeric, if you talk about integers, there are small int, normal int, 4 byte, and big int. I think float is an interesting discussion, and I think there's some subtleties here.
Starting point is 00:16:53 I wanted just to mention that some people think big int and small int, I just saw it recently, it's not standard. Well, it's standard. Small int, big int, all the relational databases which want to follow SQL standard implement them. So big int is the choice, but not always. I mean, if you know about storage,
Starting point is 00:17:15 we will probably touch the storage topic separately. But sometimes, of course, you want fewer bytes to be spent. So I think for keys, it makes a lot of sense. And a lot of the advice that's out there, a lot of the good talks that have been given are not necessarily quite old, but they're pre-version 10 of Postgres. So a lot of them mention serial and big serial,
Starting point is 00:17:36 but now we have identity, so that problem goes away. I still cannot develop a habit to use it. I still use big a habit to use it. I still use Big Serial primary key. The only downside of the identity type is it's so long. To type it out is such a verbose syntax. But yeah, I'm trying to get in the habit of using it because it becomes a few of them. Because it's standard, right?
Starting point is 00:18:00 It's standard, but it also has few... Serials have some weird complexities, especially if you end up having to migrate, as we discussed in the downtime. Column has owner, the table has an owner of, like, of sequence. Yeah. Ownership, this is kind of, yeah. So, about float, what do you want to hear? I recently learned from Peter Gagan on Twitter. Yeah. Who mentioned that AudioDB published a very good benchmark
Starting point is 00:18:33 showing how it's good to live without vacuum. And they chose a benchmark which relied on float eight columns, like I think four columns, and created B3. And Peter Gagan mentioned that, unfortunately, B3 optimizations he worked on in Postgres 13 and 14 did duplication, right? These optimizations don't work with float 8. Which is a big deal, right?
Starting point is 00:19:00 Because they were looking at the impact on float especially. Well, maybe some people well i mean so the general the general advice in the wiki is don't use it use numeric with precision it's like don't use money don't use flow date similar right because yes and they give the thing i really like about that wiki is they do go into when should you like what are the exceptions i think it's a very very smart way of doing it but in general if you've got any doubt just don't unless you know exactly what you're doing the the optimization by peter reminded me also that recently i think it was postgres 15 maybe 14 added a bunch of like optimizations
Starting point is 00:19:42 around sorting specific data types. And I think the more common or the more standard data type, the more popular data type you choose, the more likely it is to benefit from some of these optimizations, whether that's in indexing or sorting or something. Or vice versa, because, for example, if you store your JSON, or, okay, if you store some vectors inside text column and expect good performance for some algorithms for sure arbitrary neighbors nearest neighbors search a and n right in this case you probably will find out that there are extensions which provide better yeah better
Starting point is 00:20:23 functionality and performance. Yeah, I think we should talk about advanced types in a moment. Just before we do, do you have anything you wanted to add on? You did a whole episode on UUIDs, so I think we can probably skip those. Yeah, let's skip it. Anything on booleans or dates and times? Well, boolean is not one bit. It's one byte. Yeah.
Starting point is 00:20:47 Unexpectedly bigger than just one bit. Dates and times, the general advice is timestamp.tc. Timestamp.tc and pay attention to ranges. Learn ranges. Oh, yeah. Ranges are great. Good shout. Ranges are great.
Starting point is 00:21:00 They have indexes also. So if you have beginning and ending points in time for your data, consider ranges. They will also take care of data quality. You will never have beginning, which is in future compared to ending point. Yeah, really awesome feature. And of course, timestamp TZ,
Starting point is 00:21:27 timestamp with time zone is better and just prefer using it. That's it. So I also, I think I recently was productive in terms of tips. And one of the tips was to block authors and consultants who explain some Postgres stuff to other people and i i asked and it was uh it resonated i saw many good feedback and i mean people like this idea let's stop using integer four for primary keys let's always use integer eight and develop like help people develop a habit to use integer eight and same foramp. Let's stop using timestamp.
Starting point is 00:22:06 Let's use timestamp to that. Like this is like good things for those who write blog posts or develop some, I don't know, like educational courses and so on. I agree. And I would add using identity over serial and big serial. This is probably yes. Well, and in this case, you need to advise, use cast instead of colon colon and so on and so on.
Starting point is 00:22:29 It depends. I don't care. I prefer Postgres because they are usually more compact and just more convenient for me. So I'm going to continue using Sequences and Big Serial. But if you, for example, develop a course for students who you don't know, many of them will probably go and use Oracle, for example develop a course for students who you don't know like many of them will probably go to and use oracle for example but you use postgres when you educate in this case you need probably to stick with sql standard language syntax right in this case yes well i was just teasing because i know you said you still use serial.
Starting point is 00:23:07 Yeah, well, today I see your face. I knew you were teasing, but still worth explaining. Let's move on, probably. Yeah, all right. Advanced types, arrays, JSON. Yeah, first of all, we're going to break the first normal form. The first normal form. 1NF or NF1. I think canonical naming is is one enough, right? In
Starting point is 00:23:30 I actually don't know, is it like what probably one of them is French or something. That's how these acronyms normally get. So the first normal form says you usually you should not put something which can be split two things inside the value in your tables. But well, text can be split two things inside the value in your tables. But well, text can be split to things. And we know in many languages, text, for example, in C, it's considered as array as well. You can work with it as an array. So there is a first letter, second letter with like index zero, index one, and so on. So it's interesting thing. But anyway,
Starting point is 00:24:03 Postgres is very advanced. In queries, we can even collapse whole table, whole relation in one value and then unpack it, manipulating. But it's a different story. As for complex data types, first of all, arrays. They can be of many... Underlying the data types, for example,
Starting point is 00:24:26 integer 8 array or integer 4 array or text array and so on. So it's great, but of course you cannot mix. So you cannot have both text values inside the same array. And things to remember, the indexing starts with one, not zero. This is unexpected to many and leads to interesting mistakes sometimes. Well, arrays are awesome.
Starting point is 00:24:54 I just can say, I use them a lot. Array underscore agg function, which helps me when I group something, I can group many things inside one value and then I also use parentheses and I can
Starting point is 00:25:10 use like I say I need only first five elements to see for example if you select from pgset activity and you want some queries I use left function to cut them left query comma hundred for example.
Starting point is 00:25:26 And then I use array underscore agg to aggregate them and then additional parentheses and then access indexes from 1 to 5 and you see only first 5 elements. Quite convenient. In one single input for one row you see 5. You can
Starting point is 00:25:42 also order by inside array agg. Oh, wow. That's unexpected to many many inside the aggregation functions maybe count doesn't count here sorry i feel like you told me that before but i've still yeah like it's super interesting yeah yeah so so when you aggregate, for example, JSON, AGG, text, and array AGG, so you can aggregate. And in this case, you can say, I want specific order for when I aggregate. And this order can be different compared to order by clause in your statement. So it's kind of interesting trick. But what else?
Starting point is 00:26:21 Like we have arrays, it's only the beginning of the topic right it's a huge topic i mean complex data types yeah i feel like we should skip json because we did a whole episode on it and we can yeah at that and just don't follow advice uh who those who say prefer always json b over json they're probably not completely, it's not probably good, very good advice. In some cases, JSON is much better in terms of performance. So just check, use both, I would say. Yeah. Yeah. Or read about both and see which suits you. I think I've seen way more use cases. I've seen way more people using JSONB than I have JSON, I think for good reasons. Also, we've discussed it at length, haven't we? than I have JSON, I think for good reasons. We've discussed it at length, haven't we?
Starting point is 00:27:11 JSON is a way to escape from entity attribute value approach, which is terrible when you have a lot of data. These joins will be terrible. Actually, you can have kind of, it will not be foreign key, but you can put many IDs inside one row in this array value and then try to join using a nest function but it's it's kind of interesting and probably you will have better performance but again okay json and gsob let's consider h store and xml as kind of semi-duplicated types in our case i would try to avoid them. Despite the fact I participated in development of XML type and functions many years ago,
Starting point is 00:27:49 these days I would probably, unless needed, I would try to avoid them. And Edge Store is very old. It has good features, but most of the time you will probably be good with JSON. And it's what other parts of your architecture probably understand. And what else parts of your architecture probably understand. And what else?
Starting point is 00:28:07 We have also many others, for example, full text search, TS vector and TS query. They also can, like they consist of many things. TS vector based on the name, you understand like it's many words combined to one vector value and you have you can create a gene index and have good performance new thing pg vector for very high dimensional vectors like up to 2000 i think they say and for not strict neighbor search but arbitrary neighbor search for this LLM, charge-GPT area of engineering, which is currently very hot. In this case, PgVector. By the way, PgVector is not the best in terms of...
Starting point is 00:28:56 It implements not the best algorithms. So there is now competition, and I know Neon developed something, PgEmbedding, right? But PGA embedding is quite new and it doesn't, it's like it builds index in memory. I know they work on making it persistent, this index. So I mean, if you restart Postgres, you need to build an index again. What? So it's very new. I wouldn't say it's production ready yet, but it's very interesting because it implements a different algorithm, which probably is considered one of the best right now.
Starting point is 00:29:31 I think we're going to need to do a whole episode on PGP. Yes, let's do it. About vectors. Another one we haven't done an episode on that it feels rude not to mention in this space is PostGIS in terms of data types and functions. Yeah, exactly. And a lot of stuff there, a lot, of course.
Starting point is 00:29:47 So many data types, many functions, and it's a whole world. Yeah, incredibly impressive and hugely popular. We've mentioned performance a few times. Is it like, is there anything you wanted to make sure we talked about? I feel like that's where some of the choices between these gets interesting. Is there anything that people should be definitely be aware of well i
Starting point is 00:30:11 don't think we have a lot of time right now but let's just probably mention a few tips first of all documentation yeah wiki is also good and many articles and blog posts are good. Probably we will attach a few articles which are helpful in terms of learning about various data types. But also a data storage tip related to alignment padding. Due to alignment padding, Postgres always needs to fill gaps. So if you put one byte and then you put eight bytes, Postgres will need to add seven zeros between them. So there is an approach you basically need to either in the setting or descending order arrange. So starting from one, one, one, two, two, two, four, four, four, and so on bytes. So they pack or vice versa. So you have 16, 16, for example, timestamps, then eight, 1, 2, 2, 2, 4, 4, 4, and so on bytes. So they pack. Or vice versa. You have 16, 16, for example, timestamps, then 8, 8, 8, and there are articles about it.
Starting point is 00:31:11 But there are also Varlina data types like text, Varchar, TS vector, vector from PG vector. I'm sure it's Varlina because it allows very huge values, up to one gigabyte probably as well involving toast these values you should move to the end of the table in this case you will have as few zeros these gaps as possible and it's called column tetris basically someone on stackoverflow named it maybe earlier earlier, I don't know. There are a few articles about it. And sometimes you can save a lot of space, but don't overestimate it.
Starting point is 00:31:49 I saw people immediately love this topic and try to optimize. And in many cases, it looks like premature optimization because, okay, you save like 3% of storage, but you spent a lot of engineering time. I read a post by braintree that had huge tables that were very badly optimized i think they saved about 10 to 20
Starting point is 00:32:10 from memory that's great that's a great example but it's rare rare and they and a brain tree you know they're huge yeah my old ad hoc tool called postgres dba has a report, which is a very interesting recursive CTE I wrote one day. I'm still not 100% sure it will work in all cases, but so far it works quite good. And it can tell you about potential optimization in each table. So you can use it and try to understand if it's worth spending some time to rearrange columns. For some people, it's unexpected that Postgres doesn't do it itself. It's interesting, but it's so. Postgres will store columns as you say,
Starting point is 00:32:50 as you define in create table definition. And something I only learned today when I was looking it up, I wondered if pgRepack supported changing the order, but it doesn't based on implementation details. It can't really. So it's very interesting, but you're going to have to roll this yourself this yourself basically if you want to do this so there is potential to have it but it's not supported yet unfortunately yeah you need to rebuild the table and deal with it it's it's a huge task if you want to do it without downtime it's a huge task but as you remember
Starting point is 00:33:21 we mentioned don't use int4 primary key because I saw it very often. int4 primary key, which is doomed to have a wall being bumped into when you reach 2 billion rows, 2.1 billion rows, roughly, right? Half of integer 4 space. And then second column is timestamp.tz, which is 16 bytes. So we definitely know we lost four bytes because they're just alignment pending and we have zeros there if we used into eight from very beginning we wouldn't have this problem with capacity and we would use the same amount of
Starting point is 00:33:59 storage which is very unexpected but it's so i think the argument stands to use in a even if you were having to pay double the storage i think there's so i think the argument stands to use in a even if you were having to pay double the storage i think there's so much benefit because if you're not let's say you don't hit two billion rows you haven't got that much data to worry you haven't got that much footprint anyway but if you've been worried about the storage footprint you're probably thinking you're going to have more than two billion rows so that that would be my argument yeah okay maybe one more advice on your choice and that's it oh what do you think we should well i see people i see a common argument it's not just for data types but people that choose to do things in a standard way or a way that will let them
Starting point is 00:34:42 convert between databases over you choosing the power of the current database they're using thinking what if we need to migrate one day like and i think it's it's an easy trap to fall into it sounds smart but i've not like any of us have been around for a while you don't move databases that often like it really existing projects have a lot of momentum and it costs a lot to migrate there are arguments for doing it we see some from some very expensive vendors coming to postgres but it's rare and it's basically so much work that i think you're really hurting yourself by missing out on the power especially a postgres especially when we're talking about so many of
Starting point is 00:35:21 these things that are so have so many features many functions, so much power, that it feels like such a waste that you'd be missing out on that. It's like choosing C++ but staying with C only syntax, like the very basic types and so on, and not using power of C++ or something. Like you chose Java, but you think, oh, maybe one day we will migrate to, I don't know, like to something else. And you restrict yourself using only 5% of functionality. It's nonsense. Yeah, I think so. But I see people, I see smart people falling into that trap. So that's my advice would be try to avoid that or try to...
Starting point is 00:35:56 Don't consider database as something like as gloves. It's a heart, not gloves. It's the heart of your system. You are not going to replace heart easily. Gloves. It's a heart, not gloves. It's the heart of your system. Heart, oh yeah. Yeah, you're not going to replace heart easily. So database is in the center of everything. Yeah. Heart transplants are possible, but they're rare.
Starting point is 00:36:16 Well, yes. They can go badly wrong. It's not gloves. You can just change them every day. Yeah, yeah, yeah. Okay, good. Thank you so much. Oh, we should remind those who listened until this point, please help us grow
Starting point is 00:36:30 with likes, with comments, with feedback. It's both helpful for us to understand we are being heard and also it's helpful, for example, on YouTube,
Starting point is 00:36:42 if you like, of course, due to algorithms, it helps our channel grow and make audience bigger and make this project more interesting basically true and requests as well feel free to drop requests in the comments on you need more requests yeah more requests please thank you thanks everyone

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.