The Changelog: Software Development, Open Source - What's so exciting about Postgres? (Interview)

Episode Date: October 23, 2020

PostgreSQL aficionado Craig Kerstiens joins Jerod to talk about his (and our) favorite relational database. Craig details why Postgres is unique in the world of open source databases, which features a...re most exciting, the many things you can make Postgres do, and what the future might hold. Oh, and some awesome `psql` tips & tricks!

Transcript
Discussion (0)
Starting point is 00:00:00 I'm more of a Python guy than I am a Ruby guy. And I really loved the Django model for a long time, batteries included. It was kind of everything you needed to run a web app. Like, cool, here's authentication, here's caching, here's all these things that I normally have to go grab off a shelf when I'm building a web app. And Postgres is kind of that for databases. It's like batteries included.
Starting point is 00:00:18 Here's all the things. Here's your data types. Here's your extra indexes. Here's your PubSub. Here's your geospatial. Here's your queue functionality. Here is your geospatial. Here's your queue functionality. Here is all that. And it's like, cool, we've checked all those boxes. Now let's just keep making it stable. A little new feature here and there, more polished, easier to
Starting point is 00:00:35 use. And a big one's always faster, right? Being with her changelog is provided by Fastly. Learn more at fastly.com. We move fast and fix things here at ChangeLog because of Rollbar. Check them out at Rollbar.com. And we're hosted on Linode cloud servers. Head to Linode.com slash ChangeLog. What up, friends? You might not be aware, but we've been partnering with Linode since 2016.
Starting point is 00:01:01 That's a long time ago. Way back when we first launched our open source platform that you now see at changelog.com, Linode was there to help us, and we are so grateful. Fast forward several years now, and Linode is still in our corner behind the scenes helping us to ensure we're running on the very best cloud infrastructure out there. We trust Linode. They keep it fast, and they keep it simple. Get $100 in free credit at linode.com slash changelog. Again, $100 in free credit at linode.com slash changelog. Welcome back everyone, this is the ChangeLogger podcast featuring the hackers, the leaders, and the innovators in the world of software. I'm Adam Stachowiak, Editor-in-Chief here at ChangeLog. Today, Jared talks to PostgresQL aficionado Craig Kirstens about his and our favorite relational database, yes, Postgres.
Starting point is 00:02:06 Craig details why Postgres is unique in the world of open source databases, which features are most exciting, the many things you can make Postgres do, and what the future might hold. Oh, and also some awesome PSQL tips and tricks. Here we go. So Craig, you're in an interesting position because you've done so much for Postgres, the project, and you've taught and you've spoken and you've blogged and you've provided endless resources. You've curated the PG Weekly newsletter. I've been following a lot of your stuff for a long time, just learning as you kind of put stuff out there as I'm a casual Postgres user, long time user, but just casual. I don't really dive into the nitty gritty very often because I just don't have to. But you've done all of this and yet Postgres, the project, like it's not your baby.
Starting point is 00:02:57 Maybe it feels like it now, but like it's a long standing open source thing with dozens of people. Tell us how you came involved with the project and maybe how you fell in love with the database. Yeah. So, I mean, I think, you know, Postgres itself is interesting. Like I've actually never contributed a single line of code myself to it. That's, that's so many other people that know the ins and outs. I've read it, I've looked at it, but maybe one day I actually will. But yeah, my contributions have mostly been on the community side, right? Speaking, talking with others, working with people. I actually came to it, I guess the first place was at a startup many years ago that I joined
Starting point is 00:03:35 TrueViso. I joined it about 10 people. They were taking Postgres and transforming it into a streaming database that basically did map reduce on data as it came in. It was back in about 12 years ago during one of the last crashes. And basically that complex events processing was this new kind of cool, sexy thing. I had familiarity with it from college and university, but hadn't really used it there. But we were basically taking and transforming it into something, you know, really advanced, handling a whole lot of data. We had customers like I think MySpace was one of our customers back in the day when they were still the hot thing.
Starting point is 00:04:12 Right. Which which states it pretty well. Yeah. And from there, we we built a product and the company grew and dwindled back down to about 10 people. And I moved on and I found myself a few years later at Heroku. And it was really interesting because it was kind of like you're yourself, right? You're an app developer. It's there. You're using it. And even internally at Heroku, we had, you know, 50 employees and they're like, yeah, Postgres is fine. Whatever. I'll use it. Right. I'm like, guys, you've got a great database here. And I found myself starting to blog more and more about like, I'm a lazy blogger. I think I have a blog post talking about like why I blog and it's because I'm lazy.
Starting point is 00:04:52 Like once I explain something twice, I don't want to explain it again. So I'll just write a blog post on it. Like once I found myself teaching something, like let's just go and it doesn't have to be polished or useful. I tell a lot of people that about blogging. Like, don't worry about getting it perfect. Just get it out there. It probably has value for a lot of people. And you just send people the link when they ask you questions instead of having to rewrite the answers, right?
Starting point is 00:05:14 I absolutely do. Sometimes I feel like an absolute jerk when I'm like, hang on, I'm not going to actually answer. I'm just going to send you a link to a blog post. Right. I definitely feel a little bit like a jerk, but it's why it exists, right? Right. It's kind of like, go read my book you know oh that's worse because you have to buy the book with the blog post at least it's free but i know people like just go read my book it tells you the answer it's like well i asked you the question right here but it's online and you're asking
Starting point is 00:05:37 like sending a link is useful i understand it yeah yeah so um i found myself really kind of speaking to like app developers, right? Like I'm not a DBA, but I helped out developers all the time with, you know, how do I troubles be? Back at that time, Amazon RDS didn't even exist. Right. And we looked and said, Postgres is a really solid database. Like it has good fundamentals, a good core. I mean, if you go back to the history of it, like Postgres has the same roots as so many other databases. Like it's in the name. Ingress was one of the early origins, like grandfathers of databases that came out of UC Berkeley. Post-Ingress is postgres there it is and so many other databases have that same root of ingress like Sybase SQL server but it's
Starting point is 00:06:33 over 20 years old now and it's just solid and sturdy so I got started with it way back really got into it at Heroku running and helping build Heroku Postgres for a number of years and have found myself doing it, you know, at Citus and now at Crunchy Data, building and running their cloud service, basically just running Postgres for people for a while. So you may remember this history better than I do, but I was an early Rails developer. And I know that Rails, the original stack was MySQL,l was the production database now it started with sqlite so you could just run your tests but like people even base camp ran mysql and at a certain point it was like mysql fell out of favor and postgres came into favor just like amongst the zeitgeist of rails developers and i can't remember why, like what triggered that? Was it because of Heroku's
Starting point is 00:07:25 influence and Heroku Postgres? I'm pretty sure it was Heroku's influence. Like at Heroku, you've got a Postgres database. Every Rails app got a Postgres database. So I'm pretty sure we can thank Heroku for a lot of it. I wish I could say it was wonderfully thought out and strategic. We thought Postgres was a good database. We decided to run Postgres. Like, I think it was the right choice. And we invested a lot in it to make it better. Postgres for the longest time wasn't the most user friendly. It wasn't the most sexy database. Can databases be sexy? I don't know. I hear people say that all the time. I'm like, really? But I know what you're saying. Maybe exciting is a better word. Yeah, right. It's, it's, it's, you know, it's not my idea of like a Friday night is to go hack on Postgres.
Starting point is 00:08:07 But it's gotten the cooler features, right? Like JSON was a big one. Yeah. So I think like, I think that really shifted with Heroku. I actually remember Heroku supported it probably five years before Amazon RDS did. I was at, I go to a lot of Amazon conferences, re-invent some big one. It's massive, massive conference. And they roll out product announcement after product announcement after product
Starting point is 00:08:30 announcement. And I remember being there, like I had a heads up it was coming. So I made sure to actually go to that keynote to kind of see the announcement, that sort of thing. And there was literally a standing ovation. Like I've never seen this before at a tech conference for a product announcement. Like, there was this big pause, standing ovation, like the speaker started talking again, and like people still standing on their feet clapping for this. I've never seen this before, never seen this after at a tech conference when you announce a new product. So why do you think Postgres is so beloved?
Starting point is 00:09:02 I know that I switched myself from MySQL to Postgres. Mostly when I switched, maybe it was because of just the flow of the tutorials and this general lemming status of developers, and I was just in there amongst them. But I remember I had some data, this is way back in 05, 06, I had data consistency problems with my MySQL stuff,
Starting point is 00:09:20 and I read some blog posts about how MySQL can just be recording nothing and it's not going to report it. And there are these edge cases that are in there where it's kind of less safe. And Postgres was there and it had this reputation of being just rock solid, consistent, and more strict with your data. And because of the ease of switching in Rails, I could switch very quickly. And so I just started switching all my projects over and I found it easy to use, to administer. And so that was kind of my switching point.
Starting point is 00:09:50 I didn't really have much more thought than that. I think if you look at it, there's, you know, Postgres was that. It was like, I'm going to be safe and reliable. I'm not going to lose your data. That's a funny thing to think about when it comes to a database, right? uh keeping your data not losing it but that was like postgres is kind of cooler value like we're not gonna add shiny new things the the mysql defaults
Starting point is 00:10:15 got got some flack for a little while because hey certain modes it would run it had a different type of storage engine certain ones weren't as safe mysql it's it's funny when you do like search in mysql by default it's uh case insensitive uber it was really funny uber switched from mysql to postgres and then postgres to mysql so they've gone you know back and forth a couple times but when they swapped to postgres they had to go and figure out they they had all this app logic that relied on things being case insensitive because the database just doesn't respect case search. To me, that's a little bit of an anti-feature. I can go on about MySQL versus Postgres.
Starting point is 00:10:53 I think MySQL is a good database and does good things. I think Postgres really started it being safe and reliable. And then we said, now there's these rich user features. There's things like JSON. There's things like commonson there's things like common table expressions window functions there's a really rich set of features that i think we're going to hit on a bunch of them scattered throughout like i could give you the laundry list of them right now but it's it's like each one is unique in its own right but it started
Starting point is 00:11:20 with that being really safe and reliable i I look at a lot of other databases. You know, Mongo didn't start as safe and reliable and work to catch up to that. It started as easy to use. And I think that developers have come around to respecting and appreciating that, right? Like you have corruption. You spend two weeks. Hey, and data is gone. How do you answer that to a customer, right? Right.
Starting point is 00:11:42 For all the shiny things we like, you know, new frameworks that come out every single day, old and reliable just works. And I've really started to appreciate that, like, hey, I've got my Braille stack or my Python and Django stack and my Postgres database, and this app can go really far. I don't need shiny new things
Starting point is 00:11:57 to build an interesting business. Yeah, I like that as well, especially for a database. It's like there's areas in my application where I'm willing to experiment and I'm willing to go with the bleeding edge and bleed a little bit, but get those bleeding edge features. And there's something just makes you sleep well at night,
Starting point is 00:12:14 just like this database stores my data and I can just trust it to do that. And I think that was one of the reasons why I just stuck with it. It never really stabbed me in the back. I never had that moment where I was like, oh, Postgres, you screwed me over. Right? Right. It's the case with so many others too. You have that bad experience and yeah, it's shiny and new, but man, you lose data. I think there's kind of nothing worse you can do, right? Like, obviously I'm a data guy. I like databases, but that's more valuable than downtime, right? Fine. Go be down for an hour, a few minutes
Starting point is 00:12:46 or a day, but don't lose the data that you can't get back. If you're banking or something like that, like, and you've lost my deposit, I couldn't be more unhappy with you. Yeah, exactly. So help us out with a little bit of the history, because one of the things that happened was like, by the time all this developer movement over to it, I'm sure it was popular, you know, amongst DBAs and amongst, you know, different people around the world, but like really there's a groundswell and maybe it was the rails and the Heroku's and whatever this trend was, people started picking up Postgres as like their default starter database. But by then it had been around for a very long time. Like maybe that's why it was so rock solid. It had the ingress roots and then it had, I mean, by the time 2006, 2007, when I started using it,
Starting point is 00:13:28 it had been actively developing as Postgres for like a decade at that point. It didn't have like the JSON features and all the stuff that we're going to get into at that point, but help us unpack like where it came from, who, who works on this thing and like the history and kind of the community around Postgres. Yeah. So a lot of it, we can go way, way back to UC Berkeley, right? Like Stonebreaker. I think I don't know that he won a touring award just for Postgres, but Postgres is a huge piece of that. Right. And he wanted to build this kind of extensible database.
Starting point is 00:13:59 So he looked at Ingress and said, you know, how can we evolve it to the next thing? I think that's a lot of the way back history. There's a great article on the, from an ACM looking back at Postgres. I think it's a, goes through a lot of that history. I think there's a lot to of, of those years kind of right after it came out from UC Berkeley of a good stewardship of community that really gets all the credit, right? Like I go out here and talk about it, but people just logging away and working on it, making sure it was rock solid. I got in a little bit of trouble a little bit ago when I listed off a few names and I started listing off three or four or five, right?
Starting point is 00:14:35 And I think one you've got to point to is Tom Lane, who Tom Lane has contributed to so much open source. He helped create JPEG and TIFF and PNG. Like he helped authored some of those specs. He helped write lib JPEG, lib PNG. And this is 20, 25 years ago. And then said, yeah, I'm kind of tired of image formats. What's this database thing? Let me spend some time on that. And it's contributed to a massive amount of Postgres code.
Starting point is 00:15:06 If there's bugs digging in, making sure they're fixed. But you've got him, you've got Bruce Momgin, you've got Robert Haas, you've got Andres Freund, you've got Stephen Frost, you've got Joe Conway, you've got Simon Riggs. You've got a whole bunch of people that just sit here and work on Postgres. And you can go see all of the activity in the open. It's all on the mailing list. Development still happens that way that it did 15 years ago on the mailing list, showing up with patches. It's really interesting that they've got a set of solid committers, major contributors, minor contributors. There's this whole kind of hierarchy there. It's not too formal. There's a lot of debate within the community of like,
Starting point is 00:15:45 okay, do we have a major contributor for docs or for community contributions or other things? But the code just moves forward. And I think it's fascinating to me in that it does work that way. Often we talk about open source, like open source is not a new topic anymore. We talk about it plenty.
Starting point is 00:16:09 And databases say they're open source, but Postgres is really unique. Yeah, it's open source and other databases are open source, but no one owns Postgres. No one can own Postgres. You can't go and buy up a company and say, my SQL is now owned by Oracle. Like they own the copyright. They own the code. Postgres, I guess, in theory, if you went and, you know, had a few billion dollars that you wanted to try to hire all the people that commit to it, maybe, but I just don't see it happening. And the core structure and the way it's delivered and developed, it's kind of in its own category of what is open source. It's community led, community run, community managed. It's kind of a purebred that way. It's not even like Linux because Linux, you had a benevolent dictator. Right.
Starting point is 00:16:49 Postgres doesn't have that. It's got people on equal footing. So is there like a governance structure set up or is it just like people debate on mailing lists? How does it actually run that? Yeah. So there's definitely debates on mailing lists, right? It is open source in that sense. You're not getting away from that at all. So there's a core committee. That's five people. They kind of oversee and there's an entity set up to maintain copyrights and that sort of thing. And the Postgres, you know, there is a license that is owned by, you know, a Postgres entity there.
Starting point is 00:17:19 And they're set up in multiple companies. The core team isn't who says what goes in. Like core is kind of a steering body, right? So there's five people. There is no more than two from any company at a time, right? I think it's like 40%. So if core were to grow to seven, it might change in that sense. But basically, you don't have a ruling set from any one company. So it's distributed across companies. But in the actual development, you've got basically, you earn a commitment, you show up, you contribute, you review patches. Postgres is released,
Starting point is 00:17:52 a major release once a year. Things show up in, you know, ideas on mailing lists, patches, there's debate and discussion. There's a couple of kind of sprints, commit fests where patches are submitted. There's an app and the biggest way commit fests, where patches are submitted. There's an app. The biggest way to build your credibility is come in and review a patch for people. Reviews are always welcome, always helpful. If you want to build credibility, you don't show up with a, hey, I want to fix this thing.
Starting point is 00:18:21 It's show up on the mailing list and engage. There's two mailing lists that are the big ones. There's a PG SQL hackers list and a PG SQL users. The hackers list is where all the interesting stuff happens. It's what I read to fall asleep at night. But if you want to learn the internals of Postgres, it's a fascinating area to go and read. You said earlier, you haven't committed a single line of code. Have you dove into the code as a readability standpoint? I mean, it's a C project, probably millions of lines of code at this point. Have you dove into the code as a readability standpoint? I mean, it's a C project, probably millions of lines of code at this point. Have you tried to actually tease it apart and look underneath the hood at all? At times, not in any in recent years, but at times, absolutely in my career.
Starting point is 00:18:54 It's funny, it's C, but I've heard people describe that they don't like writing C, but they're okay with writing Postgres C. It's really well structured, well defined. If you want to see a good example of a large C project, it's a pretty good one. It has to be to keep going this long and to stay actively
Starting point is 00:19:13 worked on. I mean, over the course of all of these years, if you had that much technical debt, and you're just slaying in C code without good architecture, good refactoring, over time, you would just slow to a crawl. Crawl to a slow. I can refactoring over time you would just slow to a crawl crawl to a slow i can't say it right you would just stop moving fast and we see postgres just keeps on that yearly release cadence like new features just keep rolling out and they're
Starting point is 00:19:34 just built on top of that foundation uh over and over again yeah and i like it's a testament to to so many people long before me that that do keep keep plugging away. And, um, there's been an, you know, an active focus in the community to, to, I think, to grow that number of committers. I think, I think it's at about 40 now that have a commit bit. Um, but you earn that over several years and you've got to stay active. Um, if you don't commit something, uh, every year or two, you lose that commit bit. Um, so I think I think, you know, it takes work to earn it and it takes work to maintain it. And then they're watching each other, right? Like you can't go and commit something.
Starting point is 00:20:12 There have been releases where features got in and some of the code quality wasn't there. And that engineer spent, you know, the rest of that next release, that next major release, not writing new features, cleaning up, right? Like they hold themselves accountable in a good way. So it's definitely an interesting kind of testament of a project that I don't know many other examples like it. This episode of The Change Log is brought to you by Teamistry. Teamistry is a podcast that tells the stories of teams who work together in new and unexpected ways to achieve remarkable things. Each episode of Teamistry tells a story, and in each story, you'll find practical
Starting point is 00:21:05 lessons for your team and your business. I got a sneak preview of season two, and I couldn't stop listening. I was once in the U.S. Army, and nothing gets me more excited than seeing teams achieve great things when they learn to work together. And that's exactly what this show delivers. This season, the show travels deep into the underwater caves of northern Thailand to discover how divers, medics, soldiers, and volunteers saved a group of trapped teenagers, explains how a world-renowned watch company pitted their two factories against each other in an attempt to become the best watchmaker in the world, and finds out how Iceland went from having one of the highest COVID-19 death rates to a model example of how to deal with the virus. These are stories that entertain, and they're packed with business cases you can actually use.
Starting point is 00:21:48 Season two of Teamistry is out right now. Search for Teamistry anywhere you listen to podcasts. Check the show notes for a link to subscribe. And many thanks to our friends at Teamistry for their support. So we talked about its stability and how it doesn't let you down and the data consistency. There was also this NoSQL trend, or NoSQL if you prefer, trend, where a lot of shiny new databases
Starting point is 00:22:25 were coming out. And I'm not sure what year this was, maybe 2010, maybe 2012. Time kind of molds into a continuum. But in that range where it was like, look what this thing can do, and traditional relational database management systems, RMSs like Postgres and MySQL,
Starting point is 00:22:44 were really kind of thrown under the bus. They were stodgy. They didn't have the flexibility that you need. A lot of people left Postgres at that time. They left SQL altogether to move to NoSQL. Those of us that were a little more patient
Starting point is 00:23:00 just kind of like, Postgres just kept adding things that was like, hold on a sec. There's that nice feature over there in NoSQL land and you can actually get pretty close or good enough and not have to lose all of the, you know, the acid guarantees and all the things that you have over here with Postgres. One of those was JSON. Talk about some of the exciting features that have come into the project over the last decade, which have really made it not just be the stable rock that it is, but also it's kind of exciting. It's got a lot of cool stuff that they've been adding over the years. Tell us some things that excite you about it. Yeah.
Starting point is 00:23:36 So, I mean, JSON's the big one, right? Like, I can't talk about Postgres and not talk about JSON. And for context, that was eight years ago. That's pretty old in terms of like technology, right? Like what was our front end JavaScript stack like at that point? Like were we still using jQuery maybe? Like, yeah. For context, a lot's changed in that landscape, right?
Starting point is 00:23:59 And JSON was a really interesting one. I remember having, you know, conversations with the Postgres core community and some of them were really dismissive at first. They're like, I've seen this before. Like 10 years ago, document databases, XML databases were going to replace relational databases. And they came and went. They just like went by the wayside. Postgres added an XML data type. So technically, Postgres was a document database like 18 years ago when it added the XML data type.
Starting point is 00:24:28 So I, you know, defend that Postgres has been a document database for nearly 20 years or so now. But, you know, some of them don't write web apps. Some of the core developers, they're like, yeah, I built a website once upon a time. So like JSON is foreign to them. What is this JSON thing? Now now that's not all of them uh so i think there were a lot of interesting conversations around that time about no sql and json and there's really no reason json didn't fit into to postgres just fine right like you can leverage json and i think the the broader debate on a you know sql no sql or schemerless schemerless is a really good one and we should get into that a JSON and I think the broader debate on a, you know, SQL, NoSQL or schema list, schema
Starting point is 00:25:06 list is a really good one. And we should get into that a little bit because I think there's a lot there. But JSON first came in eight years ago and really we cheated. Like we totally cheated. We did validation on JSON on as JSON as it came in and stored it in a text field. There's nothing special about it. Yeah, it wasn't even really like a data type. It was just like the text field with like some stuff on top of it, right?
Starting point is 00:25:30 Pretty much. It just went into a text field under the covers. Postgres did the validation that it was standard JSON. That's about it. So we totally cheated. Everyone still loved it and said, okay, cool, I'm going to use this. Just proves you don't have to always be technically correct to kind of advance your case. And then JSONB came a couple years later. And so JSONB
Starting point is 00:25:50 is a binary representation on disk of compressed JSON. I use it in almost every project. Like, I usually have a column called like extras. And it's really easy. I didn't know I needed this, I can throw it in in or I don't know if it's going to be there long term. Feature flags are a great example for it. And so now you've got two data types in Postgres. You've got JSON and you've got JSONB. So the B stands for binary. One of the, an engineer I've worked with for a long time says it stands for better. I prefer that version a little bit more. Is there any reason to just use regular JSON as text now? Is there still reasons to do that?
Starting point is 00:26:28 So there's a few, and it's a pretty narrow set of use cases. But if you're not indexing and querying into that, if you want to just save logs, right? If you've got logs coming in, or if you want to say like you've got an API that's accepting and ingesting stuff, and you want to save the exact format, you're not querying that all the time. You just want to really quick. It's going to be a little
Starting point is 00:26:48 bit faster because it doesn't have to go and compress it to a binary format. Right. If you care about preserving the white space, like logging and API logs are huge for this. I'm like, I just want to save them and I want to replay them exactly as they were. If you want to go and replay your API logs, just throw them in JSON. So it's really fast, efficient. There's no extra processing. If you're not querying at a time, JSON is useful. But most of the time, you've got a, you know, key you want to index and query into, you do want JSONB. JSONB comes with all the extra perks. So with JSON, you can index like a very specific key. With JSONB, you can put a gen index on and index all the keys and columns kind of within that JSON document. So
Starting point is 00:27:31 then when you go and query it, it's going to be really, really fast, like your other index tables. So JSONB is usually what you want when you're developing in most cases. Maybe real quick, do a quick, when should I use JSON in general? You mentioned extras. It makes total sense if you're splatting logs into a thing or API responses, like just take the API response and store it as JSON. So you're going to store the type as JSON. There's a balance of when it's smart to actually use this and when, nah, you actually needed a separate table there,
Starting point is 00:28:00 but you denormalized when you shouldn't have and now you're going to have problems, usually data consistency problems. What's your heuristics? Like, when does JSON be a good idea, and when is it not a good idea? Yeah, so, I mean, there's, it dives a little bit into, like, relational versus NoSQL, right? Like, I think it's worth a quick detour there. It's like, you always have a schema no matter what you're just like maintaining that in the database or you're maintaining that in code like right your application expects something to be there now you're building an if statement to say if this is there otherwise do
Starting point is 00:28:33 this right or write out a default value yeah whereas when you add add a column in postgres you're adding a default value there's a number of things that hey if you don't rely on that being there great in a schema there's there's really usually some things that, hey, if you don't rely on that being there, great. In a schema, there's really usually some things that always exist. Like, hey, you've got a user's table and there's like a username and a password. That's always going to exist. You've got some really basic fields. And what I tend to do is like, oh, if it's a temporary use thing, or if it's like a tags thing, or this is kind of extra, so it doesn't exist on every model. Instead of creating a whole other table, for us, we run a database as a service
Starting point is 00:29:09 that exists on top of AWS and Azure. And some things we need for AWS, some things we need for Azure. We could create completely separate tables for those servers, or I could just say, oh, this extra field I need only for Azure is over here. So it's kind of when the optional, you know, optional extra fields. Right. It's a really, really common case that I see quite a bit. That's kind of that extras category.
Starting point is 00:29:31 Right. I use it for settings a lot. So for instance, notification settings on a user, like here are a bunch of emails we may or may not send. And these are like triggers for them to say whether or not they want those emails. And I don't really want to have my own table for that. Some users, they've never filled it out. It's just the default. And then every once in a while, we're going to add a new email that we want to send. And so it's easy to just add that in your code
Starting point is 00:29:54 and not have to go through a migration process to add a new column to another table. So I'll just be like, well, here's a new email. Just throw it in code. And it's just flexible that way. Yeah, I think that's a really kind of good way to think about it. It's basically a lightweight table right there on that, on that object, right? You've got, Hey, maybe some nested data, but I kind of don't want to go two or three layers deep because now I'm basically, you know, now I've got to go and figure out and recompute all those dependencies
Starting point is 00:30:21 and constraints and all that stuff. So a layer too deep as a table kind of works really well as a heuristic of if it's light enough weight. I think the other piece is, you know, how are you going to do analytics on it? Like this is a big one for NoSQL databases where, hey, even if you get to manage your schema, you suddenly want to ask a question and say, hey, how many users signed up in the last week? And now when you're like traversing down a document three layers deep, that's some, you can write that SQL, but it's pretty gnarly. The joints aren't going to be as efficient,
Starting point is 00:30:52 that sort of thing. So if I'm doing lightweight filtering, like give me a user that has some Boolean of true in my JSON, that's really easy. But if I'm doing aggregation on some like, give me all the users that signed up based on some data and I'm parsing that out from JSON, that's really easy. But if I'm doing aggregation on some, like give me, you know, all the users that signed up based on some data and I'm parsing that out from JSON,
Starting point is 00:31:10 like now I'm doing a lot more overall kind of on the SQL side, which is not where I want to be. A lot of applications start without analytics and add that on later. And this comes back to where to me, part of the reason that Postgres 1 is people started building these applications and said, oh, now I need to do analytics.
Starting point is 00:31:27 Oh, wait, that's hard on a NoSQL database. Like, how do I group by and aggregate and filter and all that stuff inside documents? Not so trivial. And some of those you can actually only query on indexes, right? Like there's nothing you can't even write an ad hoc query in certain cases. Like you have to have a predefined index or something on some NoSQL databases. Yeah. I think they're, you know, they often start there and they get more advanced. So over time, there's not a foundational limit that they couldn't, but yeah, some definitely start there and try to get more
Starting point is 00:31:57 and more, but it's not going to be as efficient. I look at the roots of SQL and it was really well designed. It's relational algebra and relational calculus. And if you want to geek out on it, that's where I say, like, learn those things. And then you understand all of the power of SQL and it blows your mind. Most people never need to go that route. But to throw away all that as a foundation, you lose a lot, you know, going the other side. So JSON over eight years ago, JSON be a couple years later, but this is like a longstanding, this is a thing that I now take for granted. So it's not like a shiny new feature. It's still
Starting point is 00:32:30 shiny. It just isn't new. What else about Postgres that's exciting in terms of features for relational databases? So I think I look in, you know, Postgres just moves forward each year with something new. JSON, JSONB are the shiny ones. I can go back and probably each release and there's some interesting new thing. One big area for me is indexing. Most databases have an index type. When you do create index, it's creating a B-tree index. If you have a CS degree, this is what you learned is the basics of a B-tree.
Starting point is 00:33:06 Postgres has had that for a while, and it's also getting more exotic with its indexes. So, you know, with JSONB, when you index it, you use a gen index, a generalized inverted index, which flips it on its head and basically indexes every key and value inside that JSON document. So really interesting that I don't have to say index this column or this little piece. It indexes everything key and value inside that that json document so really interesting that i don't have to say index this column or this little piece it indexes everything for me so when i query it it's fast um postgres has five different index types for a long time you know i would read the docs and look and i finally got you know um i just kind of like got over my humility and asked like the core engineers like can you actually
Starting point is 00:33:45 explain to this to me in plain English like English I read the docs and I'm like I don't I don't get it right yeah and so you've got gen which is really useful when you've got multiple values inside the same column so if you think about that right like JSON you've got multiple keys and values, you've got arrays, that sort of thing, right? Postgres has an array David type, that's super handy. If you're doing something with like tagging your categories. It's wonderful, like, please don't go build a whole categories table and join against it. You don't need it, just go ahead and use the array type. So gin really useful on arrays json those sort of things you've got a just index a generalized search tree this is useful though the way i best describe it is when you think about
Starting point is 00:34:32 you've got records that overlap values between rows so if you think about like full text search so you've got like a sentence and like maybe you want to index on like the dog but not the right so like you need both parts and hey v may appear in a bunch of other places but it's not going to be in the index so you've got parts of the values that that span across rows um geospatial is another one where just is really useful so if you've got like polygons like how do you find the dot within the polygon, right? So it's useful for things that can kind of overlap in that sense. You've got SPGIS, space partition GIS. I only know it's useful for phone numbers. I keep asking the question, give me other examples.
Starting point is 00:35:17 And they're like, phone numbers, that's when. Why phone numbers? What's unique about phone numbers? It's something about how like things naturally cluster together. So if you think about like area, then some three-number prefix, like, hey, this clusters together, this clusters together, then this is kind of the unique part. So there's kind of distinct blocked groupings of values there.
Starting point is 00:35:37 So maybe zip codes might be another one, because those kind of have, you have a certain area, they'll start with the first two letters or numbers in a zip code, and then the last three will be different or something like that. Yeah, that's a, I'm going to ask one of the core developers next time I see them if that's an equal use case. Yeah. But yeah, typically really, really large data sets. You also got brain indexes, block range, which is really similar where you've got, you know, billions and billions and billions of records, some that naturally cluster together. Those are definitely a little more specialized.
Starting point is 00:36:10 So like, if you think about it, you're using JSON B and arrays, you want to use a gen index, if you're using geospatial stuff, or full text search, you want to use a gist. If you have no clue, you know, a B tree just create index is kind of for a single column. But it's interesting that each one of these kind of comes in, you know, a new year. Like for a little while there, I think three years in a row, we had a new index type every year in Postgres. And it just marches forward. A lot of this has come out of a group that we, you know, within the community called the russians there's a professor uh from university of moscow that i'm not sure if he still teaches or used to teach on like astrophysics and then like hacks on postgres for fun like we have different definitions of fun but he'll show up with some some like hey i read this
Starting point is 00:36:55 research paper i wrote this research paper what do you think about this and for a little while there it was kind of like well this is an absolutely crazy idea like postgres is stable and solid and we don't do these crazy things and things and show up with a patch on the mailing list and do the back and forth and debate it's like here's the performance here's the characteristics of it it's you know okay great you'll maintain it you'll support it here it is you know there's uh i think right now um he is russian he says we have. He's working on a type called vodka right now. He says, you know, we need that. I hope that's just the working name.
Starting point is 00:37:29 I'm not sure if that's true or not. But yeah, it just keeps moving forward, right? Indexing is one of those things that as a developer, I don't go in like my checklist of features when I'm evaluating a database. Like how many indexes do you have? And yet when I need it, it's there. Really, there's just so much in the box for Postgres. Some of it I have absolutely no experience with.
Starting point is 00:37:54 PostGIS is one huge area. Is that a third party thing though? Or it's like it's maintained as its own separate? Because I have used it one time, it's been years now, and I remember I had to install it as an extension and maintain its upgrade cycle was different than postgres's and there was some pain there were like upgrading one not upgrading the other i remember being like oh this is a little bit so it's like it's like a an extension of postgres or is it yeah so it's an extension
Starting point is 00:38:20 and i think um i think we'll probably get to this in a little bit. Like extensions are a whole fascinating area. Yeah. Post GIS being one of the largest, biggest ones. Like I'm not a geospatial developer at all. In some sense, there's a completely parallel community. Like there's a post just set of committers and a post GIS core team.
Starting point is 00:38:40 And they, they collaborate. They're at the same conferences, but some of the things they do are completely separate from core, right? And so I think there was a couple of rough years there. I think it was upgrading to PostGIS 2.5 or it was 2.5 to 3 that it was a really, it was the dark times. I think you drilled it. I think those are actually the version numbers I was on. Yeah, I think I had words with some of the PostGIS committers then and they
Starting point is 00:39:07 understand, I think, the world of how many developers are using it now and what the upgrade process is and I don't think we're ever going to see that again, hopefully, knock on wood. It was one of those situations where I had enough competence and confidence in Postgres maintenance and
Starting point is 00:39:23 administration that it didn't worry me. But when I pulled the Postgres stuff in, it worked great and I was using it, but maintaining it over time because I wasn't actively working and doing that. It was just this thing that I also had to do. When I upgraded Postgres, I was running into issues and I wouldn't know how to solve them because it was like this third leg kind of a thing. It was definitely a rough one. And I think you've got a range of indexes.
Starting point is 00:39:47 And some things, you know, are core in Postgres. Some things are extensions. You've got really lightweight extensions that are a little safer. Post.js is a huge one. It's a whole geospatial database. Like it passes up functionality that like Oracle geospatial doesn't have. Like an open source thing is better than Oracle at something. And that's geospatial doesn't have like an open source thing is better than oracle at something and that's geospatial for sure like that's not disputed at all it has
Starting point is 00:40:10 new data types new operators and functions and all sorts of things in there so it's it's definitely a massive one that kind of has its own path but um i think if you look there's other things that are are more in core you've've got like full text search. I think it was a couple of years ago I saw someone write a blog post that was like a deep dive on full text search. And the title was something like Postgres full text search is good enough. Like it works. Like it's fine. I installed it.
Starting point is 00:40:39 It works. And it was a wonderful post. And people were like, yeah, why do i need elastic or or whatever else when i can just try this and see if it works and you can kind of replace full-text search with almost anything in postgres like postgres geospatial is good enough uh one of my favorite small ones that you shouldn't use all the time but but listen notify is PubSub directly inside Postgres. If you want to use Postgres for a queue, you can do that. Yeah, I've seen that done, and it's really cool because you don't expand your maintenance surface area at all.
Starting point is 00:41:15 I know we're in the world of serverless and all these things, so nobody has to maintain servers and stuff. But if you have to have a Redis instance somewhere in your stack, you're either paying somebody to maintain it or you've built that into your infrastructure. And a lot of times when people go to queue background jobs, they have to pull in some other thing, whether it's Redis or people use Memcache for that, probably not.
Starting point is 00:41:33 But there are queue things, Celery, I don't know. Beanstalk, words are coming to my mind. But anyways, you can do it right in Postgres. You're already backing that up. It's there with everything else. There's not another cog in your wheel. That's pretty cool. And I've seen people use that to do background jobs. Yeah. And like you say, it's one more thing. I don't have to deploy and manage. Like that's as strong as anything. Like even if this thing is 10%, 20%,
Starting point is 00:42:00 even if it's 2x better, Postgres is just this kind of stable workhorse that gets a new feature every so often that oh i can do that and i can do that and i can do that postgres 13 just came out you know a few weeks back and it kind of epitomizes postgres to me like there's not of all releases i think there's not a shiny new feature it's kind of like upgrade and it's just better like it's like you you get space savings from b3 indexes if you were using you know partitioning before it's just better now now you can have like better constraints between your partition tables and you know joins are more efficient it's it's like more and more i think likegres itself, like we've kind of reached the point where it's got, I'm more of a Python guy than I am a Ruby guy. And I really loved the Django model for a long
Starting point is 00:42:52 time, batteries included. It was kind of everything you needed to run a web app. Like, cool, here's authentication, here's caching, here's all these things that I normally have to go grab off a shelf when I'm building a web app. And Postgres is kind of that for databases. It's like batteries included. Here's all the things. Here's your data types. Here's your extra indexes. Here's your PubSub. Here's your geospatial. Here's your queue functionality. Here is all that. And it's like, cool, we've checked all those boxes. Now let's just keep making it stable, a little new feature here and there, more polished, easier to use. And a big one's always faster, right?
Starting point is 00:43:41 What's up, friends? Have you ever seen a problem and thought to yourself, I bet I could do that better? Our friends at Equinix agree. Equinix is the world's digital infrastructure company, and they've been connecting and powering the digital world for over 20 years now. They just launched a new product called Equinix Metal. It's built from the ground up to empower developers with low latency, high performance infrastructure anywhere. We'd love for you to try it out and give them your feedback.
Starting point is 00:44:01 Visit info.equinixmetal.com slash changelog to get $500 in free credit to play with, plus a rad t-shirt. Again, info.equinixmetal.com slash changelog. Get $500 in free credit. Equinix Metal. Build freely. As I kind of take a step back, I think more and more is going to happen in extensions. Like PostGIS is a big one. There's roughly, I think, 250 extensions that exist on one of the kind of extension networks.
Starting point is 00:44:44 Extensions are unique. They're low level hooks deep into Postgres. I kind of hate the term extensions because you think like every database and tool and library has like extensions, right? Like they're like, like extensions and air quotes. It's like a plugin layer that you can just throw something on top. But this is like deep, low level C hooks. Like you can write an extension in SQL. You can write an extension in C. you can write an extension in SQL. You can write an extension in C. You can write an extension in other languages.
Starting point is 00:45:13 And you can completely change the underlying behavior of what Postgres can do. You can have new data types. You can have new functions and access methods. And it can move at a separate pace from core. Core can still have that same mantra of like, I'm not going to lose data. I'm going to be safe. I'm going to be safe. I'm going to be reliable. They can maintain that C code base at a really high quality.
Starting point is 00:45:35 And now we've got this world where something can happen in an extension, improve over time. And the core community can sit there and say like, well, this is really solid. Everyone needs this. Let's put it in core. Pull it in. Yeah. So to me, I think as we, you know, i start to look at the landscape of postgres and what's the future extensions are are absolutely huge and what you can do with them is kind of unbelievable can you give some more examples of extensions and in addition to post
Starting point is 00:45:59 just i know there was one called h store which i remember using but what are some other things that people have built in the community that you can pull in and use that extend for lack of better term Postgres? Yeah. Yeah. Yeah. Right. It's, uh, it's, it makes sense. And I think it's, you know, the term, it kills me because it's like, Hey, what extends it? But it does it in a unique way that, uh, that's the part that hangs me up. Right. There's a bunch. One of my previous employers cited this data, turned Postgres into a sharded, distributed, horizontally scalable database. So when you were at 100 terabytes of data, hey, that doesn't fit easily on a single node. How do you get performance?
Starting point is 00:46:37 Under the cover, everything was sharded. It still looked like a single node database. But, you know, to your application, you didn't have to think about sharding, right? You don't have to go hire the experts like Instagram did. You can just work with it in your, you know, your Rails, your node app, and just pretend it's a single node database. PostGIS is obviously a big one. There's really simple ones. You know, one of my colleagues at Crunchy Data, he's on the like the core for PostGIS, but he wrote one that's like just HTTP basically, like I want to go and curl this website.
Starting point is 00:47:12 There's pgcron. So if you think about this, right, you've got pgcron, which is crawling your database, you could go in and curl something. And then you can go and like, parse that website, you could do screen scraping automatically inside your database, without ever having to run like a separate scheduler web process. You've got different procedural languages, you've got like PLV eight, which is V eight directly inside Postgres. So that's an extension that you can run JavaScript inside your database. Okay, it's getting crazy. So how would you trigger that then? Well, let's go back to the cron one, like, like use a select or something like how do you actually interact with these things is it using the the query the query language standard SQL query language or
Starting point is 00:47:53 extensions is that yeah so the the basics is you just run create extension right you've got to have it built and they're available on your system then you run create extension and it's available depending on what it does it's going to enable something new so hstore is a great one hstore is a data type so you run create extension hstore and now you have this new data type so now when you're creating tables you use the hstore data type right and hstore is a key value store directly in postgres so it was kind of the precursor of json right i think that's when i used it was back before i either json didn't exist or i didn't know it was in there. So I used HStore. Similar fashions.
Starting point is 00:48:28 It wasn't as nice because there was all these little edge cases with it. Yeah. So it's, you know, and I think it proved the point of like, hey, a bunch of people use HStore. Yeah. Maybe we need something more official in JSON, right? JSON went directly into core because of how the community saw that being used. Madlib is one that's out of UC Berkeley. It's a whole analytics package. Like when people
Starting point is 00:48:47 talk about data science, like, cool, I'm going to go and do something in Spark. Like I'm kind of going to pick up Madlib because it has like supervised learning, unsupervised learning. Like you want to like look at K-median, you know, run a regression. It's right there. And it's been maintained for north of 10 years now it's really old and we just had the ability to do all this data science directly in postgres for a long time now now when you enable it you get whole new functions right and you basically execute these functions and pass in the right things and you get something back um something like a plv8 which i'm becoming a bigger and bigger fan of pl python to me this, this, like you mentioned, like, hey,
Starting point is 00:49:27 things are getting crazy when we've got JavaScript in our database. Yeah, right. For a long time, we had this idea of like, no, no, never put application logic in your database, right? Like the database is this dumb store. Think, you know, a big hash in the sky. And I'm like, no, it's useful. It has all the data and you can do interesting things.
Starting point is 00:49:48 So like PL Python, a couple of weeks ago before we launched our product, I was like, what can I do with this? And I just started poking and I installed SciPy, NumPy and Pandas. And I started live tweeting. I probably shouldn't have because it could have gone horribly wrong, right?
Starting point is 00:50:01 Like I'm going to see what I can do. And then, you know, two hours later, there's like crickets. But no, I actually was able to in about 20 lines of python which i basically wrote a function that executes python inside my postgres database okay imports pandas i pass in uh some records to it and i pass in like a history of orders and what's in a shopping cart and it's basically a recommendation engine of you should recommend these products to this person. 20 lines of Python directly in my database. I didn't have to go and spin up a Kafka queue to get the data out of Postgres into Spark to run some model to feedback in to Redis to then, you know, show this to the user. Literally 20 lines of Python directly in my database to have a recommendation engine for products. And it breaks the mold, I think, of what we've been thinking for
Starting point is 00:50:50 the past five or 10 years. But if I look at it from a practical standpoint, like this gospel of never put logic into your database, like the large enterprises have been doing this for years and years and years. If you at things like oracle and sap applications they're almost nothing but huge procedural you know code and with postgres now that we can do it in things like javascript with plb8 and and python it can be more native to an app developer so i i'm kind of weird i kind of enjoy writing sequel i ask a lot of people who enjoys writing sequel and you know if there's a hundred people, there's like three hands. But I have asked the question to people like, well, do you enjoy writing other people or reading other people's SQL? And there's never like a hand that goes up. Yeah, no. It's not a pretty language and someone else writes it. It's
Starting point is 00:51:38 probably not well formatted. Like it gets the job done. But if I can write Python to do the same thing or JavaScript, how do you feel about that? Yeah, I think that's more legible to more people. How do you maintain that? And like, where does that live in a software system? Basically, it's like a create function call. So you run a create function here, you define your function directly, then you can execute that within the database, right? So I've defined my function and then I can just do select, get recommendations for my recommendation engine. So a lot of extensions come and basically they packaged up
Starting point is 00:52:11 all of these functions for you in that C format. Like Madlib does all this. So now all you've got to do is go through and say, what's the function? What's my inputs and what do I get out? But if you want to write your own, you absolutely can. You can go down that deep path
Starting point is 00:52:23 and create an extension or writing your own functions and deploying, you know, just like you would other schema stuff. Yeah. Let's say, let's just get real practical with this Python example. Maybe if you have the code or you have the tweets or something, you can provide links to somewhere where people can go and look at what exactly you did because it's fascinating.
Starting point is 00:52:41 Are you just storing that in its own Python file? Actually, you're writing the Python inside of a function. So this is like an SQL file that you're piping into the system via PSQL. Is that how you get it in there? Yeah, I'm just connecting via PSQL. And I'm just like, so a few PSQL tips, right? So I'm a CLI guy. I think a lot of developers are.
Starting point is 00:53:01 PSQL is really great and powerful. I don't like, you know, like you're in there for the first time and you're just kind of typing. Like a big tip is if you set your editor environment variable, just like the, you know, pound kind of editor and you do backslash E, that'll open up your default editor.
Starting point is 00:53:19 So like if you want to edit your queries in Vim or Tmux or Emacs or whatever, just set that and then do backslash E. Now you can kind of work as native as possible. It can be sublime text either. TIL, I've been using PSQL for years. I did not know that until just now. You just taught me something.
Starting point is 00:53:36 There's a whole bunch of things. Backslash timing will automatically show how long it took to run a query. I'll give a link as well to like,'s how to customize your pc editor um you probably have like a bash uh profile set up or a bash rc you can set up a pcql rc that'll customize all this for you i uh a friend just set his uh his null character value so that when you have nulls in your database, he set it to the poop emoji so that you know it's a null. It's not an empty string.
Starting point is 00:54:11 It's actually a null in your database. Right. I have picked up a few things over the years. That backslash E, I've never picked up. I'm reading my PS Grill RC right now, and I have the slash timing on. I have the pset null, but I just have like the word
Starting point is 00:54:25 no i need to replace that with poop emoji for sure a couple other things then it unsets quiet i don't know what it's doing oh it starts with quiet does some stuff unsets quiet i think that's the set up the prompt so yeah definitely link that up because uh we love to trick out our environments and if you can be more productive inside your uh p sequel then why not right yeah i think mine's back to them right now but like for the longest time it was actually like sublime text which people are like wait how are you doing that and it's executing it inside psql so it pops open your sublime text in a new window and then when you say that somehow it pipes it back into your your command line and it's executing
Starting point is 00:55:01 whatever you save like you can basically you know in vim i can quit not save or i can you know write and quit and if i write and quit it's going to execute it executes that's cool and so that's what i was doing with the the python i was basically you know building it up a few lines at a time and i it was create or replace function i've got my inputs i started with just a couple of inputs and saying okay okay, now I'm going to import pandas. Like, did that error for me or did that work? Now I'm going to see, can I, you know, parse this into a data frame?
Starting point is 00:55:32 Nope, I didn't parse it right. Okay, how do I get from a set of arrays in Postgres and transcribe that into what Python wants for a data frame? So it was definitely, you know, a couple hours of debugging, but probably not much longer than like i i haven't written anything in pandas in probably a year so probably no longer than it would take me directly in pandas and i'm just kind of creating that function and then i'm calling that select as soon as i create it to kind of iterate and test i'm saying select get my recommendations
Starting point is 00:56:01 oh error or did it execute and do something? Pretty cool stuff, man. Pretty cool stuff. So that's in present day. That's presently available in Postgres. What's the future look like then? Is there more like this coming, or what do you think is going to happen next? Yeah, I fully expect a lot of this, right?
Starting point is 00:56:19 Like you're going to see new extensions doing all sorts of things. You're going to see, you know, you've got time series ones like PG part man. You've got things like Citus for shard aid. You've got post GIS. There's, there's ones that show up that I'm like, I had no idea you could or wouldn't want to do that.
Starting point is 00:56:36 You know, your reaction of like, this is crazy when we've got JavaScript in our databases is spot on. Thank you. I thought so. As you pause and think about it, why not? Like kind of the art of the crazy is really fun as developers, right? Like we're going to see a lot more of this. And I see them, ZomboDB is a fascinating one.
Starting point is 00:56:56 So it'll keep your Postgres data in sync in Elasticsearch. And you can maintain Elasticsearch indexes and query them from directly within Postgres. Say that again for us slower folks like myself. So if you want to use Elasticsearch for your full-text search, right? Normally you've got to pipe all that data over somehow. So what this is going to do is going to, as you write a record
Starting point is 00:57:18 and it's a transactionally consistent record, it's going to sync it over to Elasticsearch and maintain that index. And now when you query this, you could over to elastic search and maintain that index and now when you query this you could go to elastic and query it but you've already got you know your your application connected to uh postgres like why not just use that index so you can basically have a little like kind of reference to say use this elastic search index it's automatically going to call out use elastic for full text search and get that back in your standard SQL Postgres query.
Starting point is 00:57:49 I see. So it's like a proxy for Elastic without having to worry about it. Like you feel like you're just using Postgres, but it's actually proxying to Elasticsearch back there. Exactly. That's cool. And it's, I'm like, who thought you would like need that? Like, it's like, you're still running Elastic, but like now I don't have to worry about transactional consistency and keeping things in sync. And how do I query this?
Starting point is 00:58:12 There's nothing new to learn in that way. Yeah. So I think we're going to see a lot of that. I think we're going to see a lot of that on extensions, just continuing to advance. Within Postgres itself, one huge area is going to be pluggable storage. So a couple of years ago, Postgres got
Starting point is 00:58:27 this committed into core. It's still early on, but basically you can have a different storage engine. This is going to unlock a lot in coming years. I think it's, will Postgres core ship with multiple storage engines? I don't know. That's a good question. I think probably at some point you'll have a choice. There's a few in development. I don't think we'll see hundreds and hundreds like we do extensions because it's a higher bar to write. It's deep C code and understanding how the Postgres storage engine works and how you change it and optimize it. One of the biggest pains with Postgres that people complain about is vacuum.
Starting point is 00:59:02 Postgres under the covers, what it is it's it's a giant append only log like you you do an update and what happens is it doesn't go and update those bytes on disk it basically flags that record as dirty then writes out a whole new record so it basically it's like a logical delete right where if you have like a deleted that column that's hiding everything it's kind of like that now what happens is uh when the systems at low load vacuum comes in and cleans up all that it says okay let's free up some space now let's free up some space so people have this love hate relationship with vacuum because they're like oh man vacuum's running my system's slower well it's actually going deleting things that you didn't want to be there so it's a
Starting point is 00:59:40 good thing but it's uh there is this love hate relationship with it. But there's a new backend type that aims to completely change that and changes how the heap works, all sorts of things under the cover. ZHeap is the backend that's under active development, shows really promising kind of improvements around how Postgres handles vacuum and you don't have to deal with it quite as much. Basically, it's some better space savings and performance on all that front, which is really, really interesting, right?
Starting point is 01:00:11 So the idea that we didn't have to go and do a complete rewrite of Postgres and make this change for everyone, for some people, this is really beneficial. Other people, they may not need it, right? Zed Store is another one, which is Columnar. So that's in active development right now. Columnar data stores are, they flip things around on their head. Instead of storing rows,
Starting point is 01:00:30 they store things kind of by columns. What that means is things compress down really well. So if you've got things like time series data, you could imagine like, oh, I store something for an hour. I don't have to store a record. I just say at at record 101 i've got like 12 o'clock and 12 o'clock goes all the way to record 2000 so i don't have to write 1900 records i just say like right here start right here stop and so columnar is really useful in time series it compresses data down really really tightly from like a 3x to 10x so you're storing less on disk you have to scan less now columnar like all these things are trade-offs columnar isn't perfect for every application yeah that sounds like a much different way of going about storage it seems like it would it would be backwards for a lot of what postgres normally is used to do right like you wouldn't want to you wouldn't be hey, I got my Postgres database and I'm just going to
Starting point is 01:01:26 like swap out some backends and see which one I like. I don't think the columnar one is going to, for your normal use case, is going to be advantageous, right? Right. Well, I think normal is a tough question, right? Because it's like that's the world you come from, but like Postgres is at the core of a lot of data warehousing tools. Postgres,
Starting point is 01:01:42 because of its license, people take Postgres and kind of modify the code and change it and make it columnar. If you look at some things like Greenplum or- That's why I call it abnormal, because they're modifying Postgres. They're changing the way it works, right? That's abnormal. Yeah. I think from a traditional Rails or Node or web application, the transactional workload, yes, right? Now we're completely changing what Postgres, the bread and butter of it, but there's no reason I can't do this. Right. And it can work this other way, but it's a really different set of trade-offs and this isn't for everyone. I think it's going to be interesting how it evolves. I don't know if the core community will maintain a bunch of these,
Starting point is 01:02:21 or if it's, you know, some side companies or, you know, whole new companies that evolved out of this. But it really just expands what Postgres can do. Plugable storage to me is one of those next big frontiers. It's going to be an exciting area for five or 10 years. You can see remnants of where people have taken Postgres. Amazon Redshift's a great example. That was Postgres like 10 or 15 years ago. They got modified and modified and modified. It was a company called Paracel that got kind of sort of bought by Amazon. And you see hints of it, but it's like, I think it was like Postgres 8.1, which is north of 10 years old. So it doesn't have things like JSON. So I think that that extensions world is really interesting and pluggable storage of the next
Starting point is 01:03:03 five, 10 years. We're going to see a lot there because Postgres can keep moving, be safe and stable, reliable, not lose my data as a database, the most important thing, but it's worth stating. And then we get all these, like you say, like these kinds of crazy things that, well, yeah, I actually do want this. Like as my stack evolves and I want to do more, I do want to write a recommendation engine inside my Postgres database and not have to have this ETL job that feeds into a data lake,
Starting point is 01:03:35 that feeds into Redis, that I'm maintaining five things. I just wanted to do this one thing. So I think extensions, Plugable Storage to me are a lot of excitement, though I, a ton of credit to just core Postgres, right? Postgres is just going to keep new indexes, new polish. There's maybe some really new, awesome kind of sexy feature like JSONB that I'm forgetting, but I think we've got a pretty good base. Now it's keep working on performance, ease of use,
Starting point is 01:04:07 those sort of things in the core. And a ton of credit to the people, the 40 or so committers and all them that just keep plugging away. Not a lot of them kind of end the limelight, just making it available for millions and millions and millions of developers out there. Well, Craig, I can tell you're very excited about this.
Starting point is 01:04:23 I want to ask you one last question, which is you've been doing all of this work on education and evangelism, for lack of a better word, getting the word out there, about all the cool stuff Postgres is able to do and will be able to do in the future. What's the best place to get started for people who are brand new
Starting point is 01:04:39 to the world of interacting with the database? Surely you have resources, maybe your website. Where do people go to learn Postgres and keep up with the new features coming out? Yeah, so I mean, there's a few places, you know, one shameless plug, I help curate Postgres Weekly. It's not a DBA newsletter.
Starting point is 01:04:57 It's really targeted to app developers. Like it's like, here's the how-tos, here's this tutorial, here's this shiny new feature, right? We'll talk about extensions and pluggable storage, but it's really targeted at kind of app developers that want to learn more. It's, you know, once a week, you've got kind of five to 15 articles in there. It's not a, here's a hundred things to read, so pretty easy to parse. I blog about things a good bit.
Starting point is 01:05:22 There's a number of companies out there that blog a good bit as well. There's a Planet Postgres, kind of a syndication of a bunch of people that blog. It's a great resource if you want to follow every article that comes out. Those are my two top recommendations. If you really want to learn about the internals, go subscribe to the PG SQL hackers mailing list. Read that. If you want to get some more of the basics, the PG SQL hackers mailing list. Read that. If you want to get some more of the basics, the PG SQL users mailing list, like if you want to learn like, hey, how do I debug a slow query? There's a bunch of great resources there. There's a Postgres team community Slack that there's thousands of people in just hanging out. There's still an active PostgresQL IRC. There's some really loyal people in there that, hey, if you've got trouble with a query,
Starting point is 01:06:07 they'll come in and help there. Those are kind of my offhand lists. There's a lot of blogs, but a lot of those try to hit in PostgreSQL Weekly and that sort of thing. Yeah. So those are my default. And there's probably a few other links that I can come up with after that I can make sure to kind of send over and we can get added as well.
Starting point is 01:06:27 Absolutely. Listeners, all of the links to all the things, including things maybe that he didn't even mention right now, but he thought of later, will be in your show notes. You know how to access those. Craig, this has been a lot of fun. You got me excited once again about Postgres. Not even just the future, but the present of Postgres. I want to get out there and play with some of these extensions. Thanks so much for coming on the ChangeLog. And
Starting point is 01:06:48 thanks so much for keeping us all abreast of what's going on with Postgres over the years. We really appreciate it. Yeah, thanks so much for having me. That's it for this episode of the ChangeLog. Thank you for tuning in. If you haven't heard yet, we have launched ChangeLog++. It is our membership program that lets you get closer to the metal, remove the ads, make them disappear, as we say, and enjoy supporting us. It's the best way to directly support this show and our other podcasts here on ChangeLog.com. And if you've never been to ChangeLog.com, you should go there now. Again, join ChangeLog++ to directly support our work and make the ads disappear.
Starting point is 01:07:25 Check it out at ChangeLog.com slash plus plus. Of course, huge thanks to our partners who get it, Fastly, Linode, and Rollbar. Also, thanks to Breakmaster Cylinder for making all of our beats. And thank you to you for listening. We appreciate you. That's it for this week. We'll see you next week. We'll see you next week. Bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.