The Changelog: Software Development, Open Source - What's so exciting about Postgres? (Interview)
Episode Date: October 23, 2020PostgreSQL aficionado Craig Kerstiens joins Jerod to talk about his (and our) favorite relational database. Craig details why Postgres is unique in the world of open source databases, which features a...re most exciting, the many things you can make Postgres do, and what the future might hold. Oh, and some awesome `psql` tips & tricks!
Transcript
Discussion (0)
I'm more of a Python guy than I am a Ruby guy.
And I really loved the Django model for a long time, batteries included.
It was kind of everything you needed to run a web app.
Like, cool, here's authentication, here's caching,
here's all these things that I normally have to go grab off a shelf
when I'm building a web app.
And Postgres is kind of that for databases.
It's like batteries included.
Here's all the things.
Here's your data types.
Here's your extra indexes.
Here's your PubSub.
Here's your geospatial.
Here's your queue functionality. Here is your geospatial. Here's your
queue functionality. Here is all that. And it's like, cool, we've checked all those boxes. Now
let's just keep making it stable. A little new feature here and there, more polished, easier to
use. And a big one's always faster, right? Being with her changelog is provided by Fastly.
Learn more at fastly.com.
We move fast and fix things here at ChangeLog because of Rollbar.
Check them out at Rollbar.com.
And we're hosted on Linode cloud servers.
Head to Linode.com slash ChangeLog.
What up, friends?
You might not be aware, but we've been partnering with Linode since 2016.
That's a long time ago. Way back when we first launched our open source platform that you now see at changelog.com, Linode was there to help us, and we are so grateful.
Fast forward several years now, and Linode is still in our corner behind the scenes helping us to ensure we're running on the very best cloud infrastructure out there.
We trust Linode.
They keep it fast, and they keep it simple.
Get $100 in free credit
at linode.com slash changelog. Again, $100 in free credit at linode.com slash changelog. Welcome back everyone, this is the ChangeLogger podcast featuring the hackers, the leaders, and the innovators in the world of software.
I'm Adam Stachowiak, Editor-in-Chief here at ChangeLog.
Today, Jared talks to PostgresQL aficionado Craig Kirstens about his and our favorite relational database, yes, Postgres.
Craig details why Postgres is unique in the world of open source databases,
which features are most exciting, the many things you can make Postgres do,
and what the future might hold. Oh, and also some awesome PSQL tips and tricks. Here we go. So Craig, you're in an interesting position because you've done so
much for Postgres, the project, and you've taught and you've spoken and you've blogged and you've
provided endless resources. You've curated the PG Weekly newsletter. I've been following a lot
of your stuff for a long time, just learning as you kind of put stuff out there as I'm a casual Postgres user, long time user, but just casual.
I don't really dive into the nitty gritty very often because I just don't have to.
But you've done all of this and yet Postgres, the project, like it's not your baby.
Maybe it feels like it now, but like it's a long standing open source thing with dozens of people.
Tell us how you came involved with the project and
maybe how you fell in love with the database. Yeah. So, I mean, I think, you know, Postgres
itself is interesting. Like I've actually never contributed a single line of code myself to it.
That's, that's so many other people that know the ins and outs. I've read it, I've looked at it,
but maybe one day I actually will. But yeah, my contributions have mostly been on the community side, right?
Speaking, talking with others, working with people.
I actually came to it, I guess the first place was at a startup many years ago that I joined
TrueViso.
I joined it about 10 people.
They were taking Postgres and transforming it into a streaming database that basically
did map reduce on data as it came in. It was back in about 12 years ago during one of the last crashes.
And basically that complex events processing was this new kind of cool, sexy thing.
I had familiarity with it from college and university, but hadn't really used it there.
But we were basically taking and transforming it into something, you know, really advanced, handling a whole lot of data.
We had customers like I think MySpace was one of our customers back in the day when they were still the hot thing.
Right. Which which states it pretty well. Yeah.
And from there, we we built a product and the company grew and dwindled back down to about 10 people.
And I moved on and I found myself a few years later at Heroku. And
it was really interesting because it was kind of like you're yourself, right? You're an app
developer. It's there. You're using it. And even internally at Heroku, we had, you know,
50 employees and they're like, yeah, Postgres is fine. Whatever. I'll use it. Right. I'm like,
guys, you've got a great database here. And I found myself starting to blog more and more about like, I'm a lazy blogger.
I think I have a blog post talking about like why I blog and it's because I'm lazy.
Like once I explain something twice, I don't want to explain it again.
So I'll just write a blog post on it.
Like once I found myself teaching something, like let's just go and it doesn't have to be polished or useful.
I tell a lot of people that about blogging.
Like, don't worry about getting it perfect.
Just get it out there.
It probably has value for a lot of people.
And you just send people the link when they ask you questions instead of having to rewrite the answers, right?
I absolutely do.
Sometimes I feel like an absolute jerk when I'm like, hang on, I'm not going to actually answer.
I'm just going to send you a link to a blog post.
Right.
I definitely feel a little bit like a jerk, but it's why it exists, right?
Right.
It's kind of like, go read my book you know oh that's worse because you have to buy the book with the blog post at least it's free but i know people like just go read my book it tells you the
answer it's like well i asked you the question right here but it's online and you're asking
like sending a link is useful i understand it yeah yeah so um i found myself really kind of speaking to like app developers, right? Like I'm not a DBA, but I helped out developers all the time with, you know, how do I troubles be? Back at that time, Amazon RDS didn't even exist.
Right.
And we looked and said, Postgres is a really solid database.
Like it has good fundamentals, a good core.
I mean, if you go back to the history of it, like Postgres has the same roots as so many other databases. Like it's in the name.
Ingress was one of the early origins, like grandfathers of databases that came out of UC Berkeley.
Post-Ingress is postgres there
it is and so many other databases have that same root of ingress like Sybase SQL server but it's
over 20 years old now and it's just solid and sturdy so I got started with it way back really
got into it at Heroku running and helping build Heroku Postgres for a number of years and have found myself doing it, you know, at Citus and now at Crunchy Data, building and running
their cloud service, basically just running Postgres for people for a while.
So you may remember this history better than I do, but I was an early Rails developer. And I
know that Rails, the original stack was MySQL,l was the production database now it started with sqlite
so you could just run your tests but like people even base camp ran mysql and at a certain point
it was like mysql fell out of favor and postgres came into favor just like amongst the zeitgeist
of rails developers and i can't remember why, like what triggered that? Was it because of Heroku's
influence and Heroku Postgres? I'm pretty sure it was Heroku's influence. Like at Heroku,
you've got a Postgres database. Every Rails app got a Postgres database. So I'm pretty sure we
can thank Heroku for a lot of it. I wish I could say it was wonderfully thought out and strategic.
We thought Postgres was a good database. We decided to run Postgres. Like, I think it was the right choice. And we invested a lot in it to make it better.
Postgres for the longest time wasn't the most user friendly. It wasn't the most
sexy database. Can databases be sexy? I don't know. I hear people say that all the time. I'm
like, really? But I know what you're saying. Maybe exciting is a better word. Yeah, right.
It's, it's, it's, you know, it's not my idea of like a Friday night is to go hack on Postgres.
But it's gotten the cooler features, right?
Like JSON was a big one.
Yeah.
So I think like, I think that really shifted with Heroku.
I actually remember Heroku supported it probably five years before Amazon RDS did.
I was at, I go to a lot of Amazon conferences, re-invent some big one.
It's massive, massive
conference. And they roll out product announcement after product announcement after product
announcement. And I remember being there, like I had a heads up it was coming. So I made sure to
actually go to that keynote to kind of see the announcement, that sort of thing. And there was
literally a standing ovation. Like I've never seen this before at a tech conference for a product announcement.
Like, there was this big pause, standing ovation, like the speaker started talking again, and
like people still standing on their feet clapping for this.
I've never seen this before, never seen this after at a tech conference when you announce
a new product.
So why do you think Postgres is so beloved?
I know that I switched myself from MySQL to Postgres.
Mostly when I switched,
maybe it was because of just the flow of the tutorials
and this general lemming status of developers,
and I was just in there amongst them.
But I remember I had some data,
this is way back in 05, 06,
I had data consistency problems with my MySQL stuff,
and I read some blog posts about how MySQL
can just be recording nothing and it's not
going to report it. And there are these edge cases that are in there where it's kind of less safe.
And Postgres was there and it had this reputation of being just rock solid, consistent, and more
strict with your data. And because of the ease of switching in Rails, I could switch very quickly.
And so I just started switching all my projects over
and I found it easy to use, to administer.
And so that was kind of my switching point.
I didn't really have much more thought than that.
I think if you look at it,
there's, you know, Postgres was that.
It was like, I'm going to be safe and reliable.
I'm not going to lose your data.
That's a funny thing to think about
when it comes to a database, right? uh keeping your data not losing it but that was like
postgres is kind of cooler value like we're not gonna add shiny new things the the mysql defaults
got got some flack for a little while because hey certain modes it would run it had a different type
of storage engine certain ones weren't as safe mysql it's it's
funny when you do like search in mysql by default it's uh case insensitive uber it was really funny
uber switched from mysql to postgres and then postgres to mysql so they've gone you know back
and forth a couple times but when they swapped to postgres they had to go and figure out they
they had all this app logic that relied on things being case insensitive because the database just doesn't respect case search.
To me, that's a little bit of an anti-feature.
I can go on about MySQL versus Postgres.
I think MySQL is a good database and does good things.
I think Postgres really started it being safe and reliable.
And then we said, now there's these rich user features.
There's things like JSON.
There's things like commonson there's things like
common table expressions window functions there's a really rich set of features that i think we're
going to hit on a bunch of them scattered throughout like i could give you the laundry
list of them right now but it's it's like each one is unique in its own right but it started
with that being really safe and reliable i I look at a lot of other databases.
You know, Mongo didn't start as safe and reliable and work to catch up to that.
It started as easy to use. And I think that developers have come around to respecting and appreciating that, right?
Like you have corruption.
You spend two weeks.
Hey, and data is gone.
How do you answer that to a customer, right?
Right.
For all the shiny things we like, you know, new frameworks that come out every single day,
old and reliable just works.
And I've really started to appreciate that,
like, hey, I've got my Braille stack
or my Python and Django stack
and my Postgres database,
and this app can go really far.
I don't need shiny new things
to build an interesting business.
Yeah, I like that as well,
especially for a database.
It's like there's areas in my application
where I'm willing to experiment
and I'm willing to go with the bleeding edge and bleed a little bit,
but get those bleeding edge features.
And there's something just makes you sleep well at night,
just like this database stores my data and I can just trust it to do that.
And I think that was one of the reasons why I just stuck with it.
It never really stabbed me in the back.
I never had that moment where I was like, oh, Postgres, you screwed me over. Right?
Right. It's the case with so many others too. You have that bad experience and yeah, it's shiny and
new, but man, you lose data. I think there's kind of nothing worse you can do, right? Like,
obviously I'm a data guy. I like databases, but that's more valuable than downtime, right? Fine.
Go be down for an hour, a few minutes
or a day, but don't lose the data that you can't get back. If you're banking or something like that,
like, and you've lost my deposit, I couldn't be more unhappy with you.
Yeah, exactly. So help us out with a little bit of the history, because one of the things that
happened was like, by the time all this developer movement over to it, I'm sure it was popular, you know, amongst DBAs and amongst, you know, different people around
the world, but like really there's a groundswell and maybe it was the rails and the Heroku's and
whatever this trend was, people started picking up Postgres as like their default starter database.
But by then it had been around for a very long time. Like maybe that's why it was so rock solid.
It had the ingress roots and then it had, I mean, by the time 2006, 2007, when I started using it,
it had been actively developing as Postgres for like a decade at that point. It didn't have like
the JSON features and all the stuff that we're going to get into at that point, but help us
unpack like where it came from, who, who works on this thing and like the history and kind of
the community around Postgres. Yeah. So a lot of it, we can go way, way back to UC Berkeley, right?
Like Stonebreaker.
I think I don't know that he won a touring award just for Postgres,
but Postgres is a huge piece of that. Right.
And he wanted to build this kind of extensible database.
So he looked at Ingress and said, you know,
how can we evolve it to the next thing?
I think that's a lot of the way back history. There's a great article on the, from an ACM looking back at Postgres. I think it's
a, goes through a lot of that history. I think there's a lot to of, of those years kind of right
after it came out from UC Berkeley of a good stewardship of community that really gets all
the credit, right? Like I go out here and talk about it, but people just logging
away and working on it, making sure it was rock solid. I got in a little bit of trouble a little
bit ago when I listed off a few names and I started listing off three or four or five, right?
And I think one you've got to point to is Tom Lane, who Tom Lane has contributed to so much open source. He helped create JPEG and TIFF and PNG.
Like he helped authored some of those specs.
He helped write lib JPEG, lib PNG.
And this is 20, 25 years ago.
And then said, yeah, I'm kind of tired of image formats.
What's this database thing?
Let me spend some time on that.
And it's contributed to a massive amount of Postgres code.
If there's bugs digging in, making sure they're fixed.
But you've got him, you've got Bruce Momgin, you've got Robert Haas, you've got Andres Freund, you've got Stephen Frost, you've got Joe Conway, you've got Simon Riggs.
You've got a whole bunch of people that just sit here and work on Postgres.
And you can go see all of the activity
in the open. It's all on the mailing list. Development still happens that way that it did
15 years ago on the mailing list, showing up with patches. It's really interesting that they've got
a set of solid committers, major contributors, minor contributors. There's this whole kind of
hierarchy there. It's not too formal. There's a lot of debate within the community of like,
okay, do we have a major contributor for docs
or for community contributions or other things?
But the code just moves forward.
And I think it's fascinating to me
in that it does work that way.
Often we talk about open source,
like open source is not a new topic anymore.
We talk about it plenty.
And databases say they're open source, but Postgres is really unique. Yeah, it's open source and other databases are open source, but no one owns Postgres. No one can own Postgres.
You can't go and buy up a company and say, my SQL is now owned by Oracle. Like they own
the copyright. They own the code. Postgres, I guess, in theory, if you went and, you know,
had a few billion dollars that you wanted to try to hire all the people that commit to it,
maybe, but I just don't see it happening. And the core structure and the way it's
delivered and developed, it's kind of in its own category of what is open source. It's
community led, community run, community managed. It's kind of a purebred that way.
It's not even like Linux because Linux, you had a benevolent dictator. Right.
Postgres doesn't have that. It's got people on equal footing.
So is there like a governance structure set up or is it just like people debate on
mailing lists? How does it actually run that? Yeah. So there's definitely debates on mailing
lists, right? It is open source in that sense. You're not getting away from that at all.
So there's a core committee.
That's five people.
They kind of oversee and there's an entity set up to maintain copyrights and that sort of thing.
And the Postgres, you know, there is a license that is owned by, you know, a Postgres entity there.
And they're set up in multiple companies.
The core team isn't who says what goes in.
Like core is kind of a steering body,
right? So there's five people. There is no more than two from any company at a time, right? I
think it's like 40%. So if core were to grow to seven, it might change in that sense. But basically,
you don't have a ruling set from any one company. So it's distributed across companies.
But in the actual development, you've got basically,
you earn a commitment, you show up, you contribute, you review patches. Postgres is released,
a major release once a year. Things show up in, you know, ideas on mailing lists, patches,
there's debate and discussion. There's a couple of kind of sprints, commit fests where patches
are submitted. There's an app and the biggest way commit fests, where patches are submitted.
There's an app.
The biggest way to build your credibility is come in and review a patch for people.
Reviews are always welcome, always helpful.
If you want to build credibility, you don't show up with a,
hey, I want to fix this thing.
It's show up on the mailing list and engage.
There's two mailing lists that are the big ones.
There's a PG SQL hackers list and a PG SQL users. The hackers list is where all the interesting stuff happens.
It's what I read to fall asleep at night. But if you want to learn the internals of Postgres,
it's a fascinating area to go and read. You said earlier, you haven't committed a single line of code. Have you dove into the code as a readability standpoint? I mean, it's a C
project, probably millions of lines of code at this point. Have you dove into the code as a readability standpoint? I mean, it's a C project, probably millions of lines of code at this point.
Have you tried to actually tease it apart and look underneath the hood at all?
At times, not in any in recent years, but at times, absolutely in my career.
It's funny, it's C, but I've heard people describe that they don't like writing C, but
they're okay with writing Postgres C.
It's really well structured, well defined.
If you want to see a good
example of a large C project,
it's a pretty good one.
It has to be to keep going this
long and to stay actively
worked on. I mean, over the
course of all of these years,
if you had that much technical debt, and you're just
slaying in C code without
good architecture, good refactoring,
over time, you would just slow to a crawl. Crawl to a slow. I can refactoring over time you would just slow
to a crawl crawl to a slow i can't say it right you would just stop moving fast and we see postgres
just keeps on that yearly release cadence like new features just keep rolling out and they're
just built on top of that foundation uh over and over again yeah and i like it's a testament to
to so many people long before me that that do keep keep plugging away. And, um, there's been an, you know, an active focus in the community to, to, I think, to grow that number of committers. I think,
I think it's at about 40 now that have a commit bit. Um, but you earn that over several years
and you've got to stay active. Um, if you don't commit something, uh, every year or two,
you lose that commit bit. Um, so I think I think, you know, it takes work to earn it
and it takes work to maintain it.
And then they're watching each other, right?
Like you can't go and commit something.
There have been releases where features got in
and some of the code quality wasn't there.
And that engineer spent, you know,
the rest of that next release, that next major release,
not writing new features,
cleaning up, right? Like they hold themselves accountable in a good way. So it's definitely an interesting kind of testament of a project that I don't know many other examples like it. This episode of The Change Log is brought to you by Teamistry.
Teamistry is a podcast that tells the stories of teams who work together in new and unexpected ways to achieve remarkable things.
Each episode of Teamistry tells a story, and in each story, you'll find practical
lessons for your team and your business. I got a sneak preview of season two, and I couldn't stop
listening. I was once in the U.S. Army, and nothing gets me more excited than seeing teams achieve
great things when they learn to work together. And that's exactly what this show delivers.
This season, the show travels deep into the underwater caves of northern Thailand to discover how divers, medics, soldiers, and volunteers saved a group of trapped teenagers,
explains how a world-renowned watch company pitted their two factories against each other in an attempt to become the best watchmaker in the world,
and finds out how Iceland went from having one of the highest COVID-19 death rates to a model example of how to deal with the virus.
These are stories that entertain, and they're packed with business cases you can actually
use.
Season two of Teamistry is out right now.
Search for Teamistry anywhere you listen to podcasts.
Check the show notes for a link to subscribe.
And many thanks to our friends at Teamistry for their support. So we talked about its stability
and how it doesn't let you down and the data consistency.
There was also this NoSQL trend,
or NoSQL if you prefer, trend,
where a lot of shiny new databases
were coming out.
And I'm not sure what year this was,
maybe 2010, maybe 2012.
Time kind of molds into a continuum.
But in that range where it was like,
look what this thing can do,
and traditional relational database management systems,
RMSs like Postgres and MySQL,
were really kind of thrown under the bus.
They were stodgy.
They didn't have the flexibility
that you need.
A lot of people left Postgres at that
time. They left
SQL altogether to move to NoSQL.
Those of us that were a little more patient
just kind of like, Postgres just
kept adding things that was like, hold on a sec.
There's that nice feature over there in NoSQL land and you can actually get pretty close or good enough and not have to lose all of the, you know, the acid guarantees and all
the things that you have over here with Postgres. One of those was JSON. Talk about some of the
exciting features that have come into the project over the last decade, which have really made it not just be the stable rock that it is, but also it's kind of exciting.
It's got a lot of cool stuff that they've been adding over the years.
Tell us some things that excite you about it.
Yeah.
So, I mean, JSON's the big one, right?
Like, I can't talk about Postgres and not talk about JSON.
And for context, that was eight years ago.
That's pretty old in terms of like technology, right?
Like what was our front end JavaScript stack like at that point?
Like were we still using jQuery maybe?
Like, yeah.
For context, a lot's changed in that landscape, right?
And JSON was a really interesting one.
I remember having, you know, conversations with the Postgres core community and some of them were really dismissive at first.
They're like, I've seen this before.
Like 10 years ago, document databases, XML databases were going to replace relational databases.
And they came and went.
They just like went by the wayside.
Postgres added an XML data type.
So technically, Postgres was a document database like 18 years ago when it added the XML data type.
So I, you know, defend that Postgres has been a document database for nearly 20 years or so now.
But, you know, some of them don't write web apps.
Some of the core developers, they're like, yeah, I built a website once upon a time.
So like JSON is foreign to them.
What is this JSON thing? Now now that's not all of them uh so i think there were a lot of interesting conversations
around that time about no sql and json and there's really no reason json didn't fit into
to postgres just fine right like you can leverage json and i think the the broader debate on a you
know sql no sql or schemerless schemerless is a really good one and we should get into that a JSON and I think the broader debate on a, you know, SQL, NoSQL or schema list, schema
list is a really good one.
And we should get into that a little bit because I think there's a lot there.
But JSON first came in eight years ago and really we cheated.
Like we totally cheated.
We did validation on JSON on as JSON as it came in and stored it in a text field.
There's nothing special about it.
Yeah, it wasn't even really like a data type.
It was just like the text field with like some stuff on top of it, right?
Pretty much.
It just went into a text field under the covers.
Postgres did the validation that it was standard JSON.
That's about it.
So we totally cheated.
Everyone still loved it and said, okay, cool, I'm going to use this.
Just proves you don't have to always be technically correct
to kind of advance your case. And then JSONB came a couple years later. And so JSONB
is a binary representation on disk of compressed JSON. I use it in almost every project. Like,
I usually have a column called like extras. And it's really easy. I didn't know I needed this,
I can throw it in in or I don't know
if it's going to be there long term. Feature flags are a great example for it. And so now you've got
two data types in Postgres. You've got JSON and you've got JSONB. So the B stands for binary.
One of the, an engineer I've worked with for a long time says it stands for better. I prefer that
version a little bit more. Is there any reason to just use regular JSON as text now?
Is there still reasons to do that?
So there's a few, and it's a pretty narrow set of use cases.
But if you're not indexing and querying into that,
if you want to just save logs, right?
If you've got logs coming in,
or if you want to say like you've got an API
that's accepting and ingesting stuff,
and you want to save the exact format,
you're not querying that all the time. You just want to really quick. It's going to be a little
bit faster because it doesn't have to go and compress it to a binary format. Right. If you
care about preserving the white space, like logging and API logs are huge for this. I'm like,
I just want to save them and I want to replay them exactly as they were. If you want to go
and replay your API logs, just throw them in JSON. So it's really fast,
efficient. There's no extra processing. If you're not querying at a time, JSON is useful. But most
of the time, you've got a, you know, key you want to index and query into, you do want JSONB. JSONB
comes with all the extra perks. So with JSON, you can index like a very specific key. With JSONB,
you can put a gen index on and index all the keys and columns kind of within that JSON document. So
then when you go and query it, it's going to be really, really fast, like your other index tables.
So JSONB is usually what you want when you're developing in most cases.
Maybe real quick, do a quick, when should I use JSON in general? You mentioned extras.
It makes total sense if you're splatting logs into a thing or API responses, like just take the API response
and store it as JSON.
So you're going to store the type as JSON.
There's a balance of when it's smart to actually use this
and when, nah, you actually needed a separate table there,
but you denormalized when you shouldn't have
and now you're going to have problems,
usually data consistency problems. What's your heuristics? Like, when does JSON be a good idea,
and when is it not a good idea? Yeah, so, I mean, there's, it dives a little bit into, like,
relational versus NoSQL, right? Like, I think it's worth a quick detour there. It's like,
you always have a schema no matter what you're just like maintaining
that in the database or you're maintaining that in code like right your application expects
something to be there now you're building an if statement to say if this is there otherwise do
this right or write out a default value yeah whereas when you add add a column in postgres
you're adding a default value there's a number of things that hey if you don't rely on that being
there great in a schema there's there's really usually some things that, hey, if you don't rely on that being there, great. In a schema, there's really
usually some things that always exist. Like, hey, you've got a user's table and there's like a
username and a password. That's always going to exist. You've got some really basic fields.
And what I tend to do is like, oh, if it's a temporary use thing, or if it's like a tags
thing, or this is kind of extra, so it doesn't exist on every model. Instead of creating a whole other table,
for us, we run a database as a service
that exists on top of AWS and Azure.
And some things we need for AWS,
some things we need for Azure.
We could create completely separate tables
for those servers, or I could just say,
oh, this extra field I need only for Azure is over here.
So it's kind of when the optional, you know, optional extra fields.
Right. It's a really, really common case that I see quite a bit. That's kind of that extras category.
Right. I use it for settings a lot. So for instance, notification settings on a user,
like here are a bunch of emails we may or may not send. And these are like triggers for them to say
whether or not they want those emails. And I don't really want to have my own table for that.
Some users, they've never filled it out.
It's just the default.
And then every once in a while,
we're going to add a new email that we want to send.
And so it's easy to just add that in your code
and not have to go through a migration process
to add a new column to another table.
So I'll just be like, well, here's a new email.
Just throw it in code.
And it's just flexible that way.
Yeah, I think that's a really kind of good way to think about it. It's basically a lightweight table right there on that, on that object, right? You've got,
Hey, maybe some nested data, but I kind of don't want to go two or three layers deep because now
I'm basically, you know, now I've got to go and figure out and recompute all those dependencies
and constraints and all that stuff. So a layer too deep as a table kind of works really well as a heuristic of if it's light enough weight.
I think the other piece is, you know, how are you going to do analytics on it?
Like this is a big one for NoSQL databases where, hey, even if you get to manage your schema,
you suddenly want to ask a question and say, hey, how many users signed up in the last week?
And now when you're like traversing down a document
three layers deep, that's some,
you can write that SQL, but it's pretty gnarly.
The joints aren't going to be as efficient,
that sort of thing.
So if I'm doing lightweight filtering,
like give me a user that has some Boolean of true
in my JSON, that's really easy.
But if I'm doing aggregation on some like,
give me all the users that signed up based on some data and I'm parsing that out from JSON, that's really easy. But if I'm doing aggregation on some, like give me, you know,
all the users that signed up based on some data
and I'm parsing that out from JSON,
like now I'm doing a lot more overall
kind of on the SQL side,
which is not where I want to be.
A lot of applications start without analytics
and add that on later.
And this comes back to where to me,
part of the reason that Postgres 1
is people started building these applications and said, oh, now I need to do analytics.
Oh, wait, that's hard on a NoSQL database.
Like, how do I group by and aggregate and filter and all that stuff inside documents?
Not so trivial.
And some of those you can actually only query on indexes, right?
Like there's nothing you can't even write an ad hoc query in certain cases.
Like you have to have a predefined index or something on some NoSQL databases. Yeah. I think they're,
you know, they often start there and they get more advanced. So over time, there's not a
foundational limit that they couldn't, but yeah, some definitely start there and try to get more
and more, but it's not going to be as efficient. I look at the roots of SQL and it was really well
designed. It's relational algebra and relational calculus.
And if you want to geek out on it, that's where I say, like, learn those things.
And then you understand all of the power of SQL and it blows your mind.
Most people never need to go that route.
But to throw away all that as a foundation, you lose a lot, you know, going the other side.
So JSON over eight years ago, JSON be a couple years later, but this is like a longstanding,
this is a thing that I now take for granted. So it's not like a shiny new feature. It's still
shiny. It just isn't new. What else about Postgres that's exciting in terms of features
for relational databases? So I think I look in, you know, Postgres just moves forward each year
with something new. JSON, JSONB are the shiny ones.
I can go back and probably each release and there's some interesting new thing.
One big area for me is indexing.
Most databases have an index type.
When you do create index, it's creating a B-tree index.
If you have a CS degree, this is what you learned is the basics of a B-tree.
Postgres has had that for a while, and it's also getting more exotic with its indexes.
So, you know, with JSONB, when you index it, you use a gen index, a generalized inverted index,
which flips it on its head and basically indexes every key and value inside that JSON document.
So really interesting that I don't have to say index this column or this little piece. It indexes everything key and value inside that that json document so really interesting that i don't have
to say index this column or this little piece it indexes everything for me so when i query it it's
fast um postgres has five different index types for a long time you know i would read the docs
and look and i finally got you know um i just kind of like got over my humility and asked like
the core engineers like can you actually
explain to this to me in plain English like English I read the docs and I'm like I don't
I don't get it right yeah and so you've got gen which is really useful when you've got
multiple values inside the same column so if you think about that right like JSON you've got
multiple keys and values,
you've got arrays, that sort of thing, right? Postgres has an array David type,
that's super handy. If you're doing something with like tagging your categories. It's wonderful,
like, please don't go build a whole categories table and join against it. You don't need it, just go ahead and use the array type. So gin really useful on arrays json those sort of things you've got a just index
a generalized search tree this is useful though the way i best describe it is when you think about
you've got records that overlap values between rows so if you think about like full text search
so you've got like a sentence and like maybe you want to index on like the dog but not the right so like you need both parts and
hey v may appear in a bunch of other places but it's not going to be in the index so you've got
parts of the values that that span across rows um geospatial is another one where just is really
useful so if you've got like polygons like how do you find the dot within the polygon, right? So it's useful for things that can kind of overlap in that sense.
You've got SPGIS, space partition GIS.
I only know it's useful for phone numbers.
I keep asking the question, give me other examples.
And they're like, phone numbers, that's when.
Why phone numbers?
What's unique about phone numbers?
It's something about how like things naturally cluster together.
So if you think about like area, then some three-number prefix,
like, hey, this clusters together, this clusters together,
then this is kind of the unique part.
So there's kind of distinct blocked groupings of values there.
So maybe zip codes might be another one,
because those kind of have, you have a certain area,
they'll start with the first two letters or numbers in a zip code, and then the last three will be different or something like that.
Yeah, that's a, I'm going to ask one of the core developers next time I see them if that's an equal use case.
Yeah.
But yeah, typically really, really large data sets.
You also got brain indexes, block range, which is really similar where you've got, you know, billions and billions and billions
of records, some that naturally cluster together. Those are definitely a little more specialized.
So like, if you think about it, you're using JSON B and arrays, you want to use a gen index,
if you're using geospatial stuff, or full text search, you want to use a gist. If you have no
clue, you know, a B tree just create index is kind of for a single column. But it's interesting that each one of these kind of comes in, you know, a new year.
Like for a little while there, I think three years in a row, we had a new index type every year in Postgres.
And it just marches forward.
A lot of this has come out of a group that we, you know, within the community called the russians there's a professor uh from university of moscow that i'm not sure if
he still teaches or used to teach on like astrophysics and then like hacks on postgres for
fun like we have different definitions of fun but he'll show up with some some like hey i read this
research paper i wrote this research paper what do you think about this and for a little while
there it was kind of like well this is an absolutely crazy idea like postgres is stable
and solid and we don't do these crazy things and things and show up with a patch on the mailing list and
do the back and forth and debate it's like here's the performance here's the characteristics of it
it's you know okay great you'll maintain it you'll support it here it is you know there's uh i think
right now um he is russian he says we have. He's working on a type called vodka right now.
He says, you know, we need that.
I hope that's just the working name.
I'm not sure if that's true or not.
But yeah, it just keeps moving forward, right?
Indexing is one of those things that as a developer,
I don't go in like my checklist of features when I'm evaluating a database.
Like how many indexes do you have?
And yet when I need it, it's there.
Really, there's just so much in the box for Postgres.
Some of it I have absolutely no experience with.
PostGIS is one huge area.
Is that a third party thing though?
Or it's like it's maintained as its own separate?
Because I have used it one time, it's been years now,
and I remember I had to install it as an extension
and maintain its upgrade cycle was different than postgres's and there was some
pain there were like upgrading one not upgrading the other i remember being like oh this is a
little bit so it's like it's like a an extension of postgres or is it yeah so it's an extension
and i think um i think we'll probably get to this in a little bit. Like extensions are a whole fascinating area.
Yeah.
Post GIS being one of the largest,
biggest ones.
Like I'm not a geospatial developer at all.
In some sense,
there's a completely parallel community.
Like there's a post just set of committers and a post GIS core team.
And they,
they collaborate.
They're at the same conferences,
but some of the things they do are completely separate from core, right?
And so I think there was a couple of rough years there.
I think it was upgrading to PostGIS 2.5 or it was 2.5 to 3 that it was a really, it was the dark times.
I think you drilled it. I think those are actually the version numbers I was on. Yeah, I think I had words with some of the PostGIS committers then
and they
understand, I think,
the world of how many developers
are using it now and what the upgrade process is
and I don't think we're ever going to see
that again, hopefully, knock on wood.
It was one of those situations where I had
enough competence and confidence
in Postgres maintenance and
administration that it didn't worry me.
But when I pulled the Postgres stuff in, it worked great and I was using it,
but maintaining it over time because I wasn't actively working and doing that.
It was just this thing that I also had to do.
When I upgraded Postgres, I was running into issues and I wouldn't know how to solve them
because it was like this third leg kind of a thing.
It was definitely a rough one.
And I think you've got a range of indexes.
And some things, you know, are core in Postgres.
Some things are extensions.
You've got really lightweight extensions that are a little safer.
Post.js is a huge one.
It's a whole geospatial database.
Like it passes up functionality that like Oracle geospatial doesn't have.
Like an open source thing is better than Oracle at something. And that's geospatial doesn't have like an open source thing is better
than oracle at something and that's geospatial for sure like that's not disputed at all it has
new data types new operators and functions and all sorts of things in there so it's it's definitely
a massive one that kind of has its own path but um i think if you look there's other things that
are are more in core you've've got like full text search.
I think it was a couple of years ago I saw someone write a blog post that was like a deep dive on full text search.
And the title was something like Postgres full text search is good enough.
Like it works.
Like it's fine.
I installed it.
It works.
And it was a wonderful post.
And people were like, yeah, why do i need elastic or or whatever else
when i can just try this and see if it works and you can kind of replace full-text search with
almost anything in postgres like postgres geospatial is good enough uh one of my favorite
small ones that you shouldn't use all the time but but listen notify is PubSub directly inside Postgres.
If you want to use Postgres for a queue, you can do that. Yeah, I've seen that done, and it's really cool
because you don't expand your maintenance surface area at all.
I know we're in the world of serverless and all these things,
so nobody has to maintain servers and stuff.
But if you have to have a Redis instance somewhere in your stack,
you're either paying somebody to maintain it
or you've built that into your infrastructure.
And a lot of times when people go to queue background jobs,
they have to pull in some other thing, whether it's Redis
or people use Memcache for that, probably not.
But there are queue things, Celery, I don't know.
Beanstalk, words are coming to my mind.
But anyways, you can do it right in Postgres.
You're already backing that up.
It's there with everything else.
There's not another cog in your wheel. That's pretty cool. And I've seen people
use that to do background jobs. Yeah. And like you say, it's one more thing. I don't have to
deploy and manage. Like that's as strong as anything. Like even if this thing is 10%, 20%,
even if it's 2x better, Postgres is just this kind of stable workhorse that gets a new feature every
so often that oh i can do that and i can do that and i can do that postgres 13 just came out you
know a few weeks back and it kind of epitomizes postgres to me like there's not of all releases
i think there's not a shiny new feature it's kind of like upgrade and it's just better like it's like you you get
space savings from b3 indexes if you were using you know partitioning before it's just better now
now you can have like better constraints between your partition tables and you know joins are more
efficient it's it's like more and more i think likegres itself, like we've kind of reached the point where it's got,
I'm more of a Python guy than I am a Ruby guy. And I really loved the Django model for a long
time, batteries included. It was kind of everything you needed to run a web app. Like,
cool, here's authentication, here's caching, here's all these things that I normally have
to go grab off a shelf when I'm building a web app. And Postgres is kind of that for databases.
It's like batteries included. Here's all the things. Here's your data types. Here's
your extra indexes. Here's your PubSub. Here's your geospatial. Here's your queue functionality.
Here is all that. And it's like, cool, we've checked all those boxes. Now let's just keep
making it stable, a little new feature here and there, more polished, easier to use.
And a big one's always faster, right?
What's up, friends? Have you ever seen a problem and thought to yourself,
I bet I could do that better? Our friends at Equinix agree. Equinix is the world's digital infrastructure company,
and they've been connecting and powering the digital world
for over 20 years now.
They just launched a new product called Equinix Metal.
It's built from the ground up to empower developers
with low latency, high performance infrastructure anywhere.
We'd love for you to try it out and give them your feedback.
Visit info.equinixmetal.com slash changelog
to get $500 in free credit to play with, plus a rad t-shirt.
Again, info.equinixmetal.com slash changelog.
Get $500 in free credit.
Equinix Metal.
Build freely. As I kind of take a step back, I think more and more is going to happen in extensions.
Like PostGIS is a big one.
There's roughly, I think, 250 extensions that exist on one of the kind of extension networks.
Extensions are unique.
They're low level hooks deep into Postgres. I kind of hate the term extensions because you think like
every database and tool and library has like extensions, right? Like they're like, like
extensions and air quotes. It's like a plugin layer that you can just throw something on top.
But this is like deep, low level C hooks. Like you can write an extension in SQL.
You can write an extension in C. you can write an extension in SQL.
You can write an extension in C.
You can write an extension in other languages.
And you can completely change the underlying behavior of what Postgres can do.
You can have new data types.
You can have new functions and access methods.
And it can move at a separate pace from core.
Core can still have that same mantra of like, I'm not going to lose data.
I'm going to be safe. I'm going to be safe.
I'm going to be reliable.
They can maintain that C code base at a really high quality.
And now we've got this world where something can happen in an extension, improve over time.
And the core community can sit there and say like, well, this is really solid.
Everyone needs this.
Let's put it in core.
Pull it in.
Yeah.
So to me, I think as we, you know, i start to look at the landscape of postgres and what's the future extensions are are absolutely huge and what you can do with them is
kind of unbelievable can you give some more examples of extensions and in addition to post
just i know there was one called h store which i remember using but what are some other things
that people have built in
the community that you can pull in and use that extend for lack of better term Postgres? Yeah.
Yeah. Yeah. Right. It's, uh, it's, it makes sense. And I think it's, you know, the term,
it kills me because it's like, Hey, what extends it? But it does it in a unique way that, uh,
that's the part that hangs me up. Right. There's a bunch. One of my previous employers cited this data, turned Postgres into a sharded, distributed, horizontally scalable database.
So when you were at 100 terabytes of data, hey, that doesn't fit easily on a single node.
How do you get performance?
Under the cover, everything was sharded.
It still looked like a single node database.
But, you know, to your application, you didn't have to
think about sharding, right? You don't have to go hire the experts like Instagram did.
You can just work with it in your, you know, your Rails, your node app, and just pretend it's a
single node database. PostGIS is obviously a big one. There's really simple ones. You know,
one of my colleagues at Crunchy Data, he's on the like the core for PostGIS,
but he wrote one that's like just HTTP basically, like I want to go and curl this website.
There's pgcron. So if you think about this, right, you've got pgcron, which is crawling your
database, you could go in and curl something. And then you can go and like, parse that website,
you could do screen scraping automatically inside your database, without ever having to run like a separate scheduler
web process. You've got different procedural languages, you've got like PLV eight, which is
V eight directly inside Postgres. So that's an extension that you can run JavaScript inside your
database. Okay, it's getting crazy. So how would you trigger that then? Well, let's go back to the
cron one, like, like use a select or something like how do you actually interact
with these things is it using the the query the query language standard SQL query language or
extensions is that yeah so the the basics is you just run create extension right you've got to have
it built and they're available on your system then you run create extension and it's available
depending on what it does it's going to enable something new so hstore is a great one hstore is a data type so you run create
extension hstore and now you have this new data type so now when you're creating tables you use
the hstore data type right and hstore is a key value store directly in postgres so it was kind
of the precursor of json right i think that's when i used it was back before i either json didn't
exist or i didn't know it was in there. So I used HStore.
Similar fashions.
It wasn't as nice because there was all these little edge cases with it.
Yeah.
So it's, you know, and I think it proved the point of like, hey, a bunch of people use HStore.
Yeah.
Maybe we need something more official in JSON, right?
JSON went directly into core because of how the community saw that being used.
Madlib is one that's out of UC Berkeley.
It's a whole analytics package. Like when people
talk about data science, like, cool, I'm going to go and do something in Spark. Like I'm kind of
going to pick up Madlib because it has like supervised learning, unsupervised learning.
Like you want to like look at K-median, you know, run a regression. It's right there. And it's been
maintained for north of 10 years now it's really old and we just had
the ability to do all this data science directly in postgres for a long time now now when you enable
it you get whole new functions right and you basically execute these functions and pass in
the right things and you get something back um something like a plv8 which i'm becoming a bigger
and bigger fan of pl python to me this, this, like you mentioned, like, hey,
things are getting crazy when we've got JavaScript in our database.
Yeah, right.
For a long time, we had this idea of like, no, no,
never put application logic in your database, right?
Like the database is this dumb store.
Think, you know, a big hash in the sky.
And I'm like, no, it's useful.
It has all the data and you can do interesting things.
So like PL Python, a couple of weeks ago
before we launched our product,
I was like, what can I do with this?
And I just started poking
and I installed SciPy, NumPy and Pandas.
And I started live tweeting.
I probably shouldn't have
because it could have gone horribly wrong, right?
Like I'm going to see what I can do.
And then, you know, two hours later, there's like crickets.
But no, I actually was able to in about 20 lines of python which i basically wrote a function that executes python inside my postgres database okay imports pandas i pass in
uh some records to it and i pass in like a history of orders and what's in a shopping cart and it's
basically a recommendation engine of you should recommend these products to this person. 20 lines of Python directly in my database. I didn't have
to go and spin up a Kafka queue to get the data out of Postgres into Spark to run some model to
feedback in to Redis to then, you know, show this to the user. Literally 20 lines of Python directly
in my database to have a recommendation engine for products. And it breaks the mold, I think, of what we've been thinking for
the past five or 10 years. But if I look at it from a practical standpoint, like this gospel of
never put logic into your database, like the large enterprises have been doing this for
years and years and years. If you at things like oracle and sap applications they're almost nothing but huge procedural you know code and with postgres now
that we can do it in things like javascript with plb8 and and python it can be more native to an
app developer so i i'm kind of weird i kind of enjoy writing sequel i ask a lot of people who
enjoys writing sequel and you know if there's a hundred people, there's like three hands. But I have asked the question to people
like, well, do you enjoy writing other people or reading other people's SQL? And there's never
like a hand that goes up. Yeah, no. It's not a pretty language and someone else writes it. It's
probably not well formatted. Like it gets the job done. But if I can write Python to do the same thing or JavaScript, how do you feel
about that? Yeah, I think that's more legible to more people. How do you maintain that? And like,
where does that live in a software system? Basically, it's like a create function call.
So you run a create function here, you define your function directly, then you can execute that
within the database, right? So I've defined my function and then I can just do select,
get recommendations for my recommendation engine.
So a lot of extensions come
and basically they packaged up
all of these functions for you in that C format.
Like Madlib does all this.
So now all you've got to do is go through
and say, what's the function?
What's my inputs and what do I get out?
But if you want to write your own,
you absolutely can.
You can go down that deep path
and create an extension
or writing your own functions and deploying, you know,
just like you would other schema stuff.
Yeah.
Let's say, let's just get real practical with this Python example.
Maybe if you have the code or you have the tweets or something,
you can provide links to somewhere where people can go and look at
what exactly you did because it's fascinating.
Are you just storing that in its own Python file?
Actually, you're writing the Python inside of a function.
So this is like an SQL file that you're piping into the system via PSQL.
Is that how you get it in there?
Yeah, I'm just connecting via PSQL.
And I'm just like, so a few PSQL tips, right?
So I'm a CLI guy.
I think a lot of developers are.
PSQL is really great and powerful.
I don't like, you know,
like you're in there for the first time
and you're just kind of typing.
Like a big tip is if you set your editor environment variable,
just like the, you know, pound kind of editor
and you do backslash E,
that'll open up your default editor.
So like if you want to edit your queries in Vim
or Tmux or Emacs or whatever,
just set that and then do backslash E.
Now you can kind of work as native as possible.
It can be sublime text either.
TIL, I've been using PSQL for years.
I did not know that until just now.
You just taught me something.
There's a whole bunch of things.
Backslash timing will automatically show how long it took to run a query.
I'll give a link as well to like,'s how to customize your pc editor um you probably have like a bash uh profile set up or
a bash rc you can set up a pcql rc that'll customize all this for you i uh a friend just
set his uh his null character value so that when you have nulls in your database,
he set it to the poop emoji
so that you know it's a null.
It's not an empty string.
It's actually a null in your database.
Right.
I have picked up a few things over the years.
That backslash E, I've never picked up.
I'm reading my PS Grill RC right now,
and I have the slash timing on.
I have the pset null,
but I just have like the word
no i need to replace that with poop emoji for sure a couple other things then it unsets quiet
i don't know what it's doing oh it starts with quiet does some stuff unsets quiet i think that's
the set up the prompt so yeah definitely link that up because uh we love to trick out our
environments and if you can be more productive inside your uh p sequel then why not right
yeah i think mine's back to them right now
but like for the longest time it was actually like sublime text which people are like wait how are
you doing that and it's executing it inside psql so it pops open your sublime text in a new window
and then when you say that somehow it pipes it back into your your command line and it's executing
whatever you save like you can basically you know in vim i can quit
not save or i can you know write and quit and if i write and quit it's going to execute it executes
that's cool and so that's what i was doing with the the python i was basically you know
building it up a few lines at a time and i it was create or replace function i've got my inputs i
started with just a couple of inputs and saying okay okay, now I'm going to import pandas.
Like, did that error for me or did that work?
Now I'm going to see, can I, you know,
parse this into a data frame?
Nope, I didn't parse it right.
Okay, how do I get from a set of arrays in Postgres
and transcribe that into what Python wants for a data frame?
So it was definitely, you know, a couple hours of debugging,
but probably not much longer than like
i i haven't written anything in pandas in probably a year so probably no longer than it would take
me directly in pandas and i'm just kind of creating that function and then i'm calling that select as
soon as i create it to kind of iterate and test i'm saying select get my recommendations
oh error or did it execute and do something? Pretty cool stuff, man.
Pretty cool stuff.
So that's in present day.
That's presently available in Postgres.
What's the future look like then?
Is there more like this coming,
or what do you think is going to happen next?
Yeah, I fully expect a lot of this, right?
Like you're going to see new extensions
doing all sorts of things.
You're going to see, you know,
you've got time series ones like PG part man.
You've got things like Citus for shard aid.
You've got post GIS.
There's, there's ones that show up that I'm like, I had no idea you could or wouldn't
want to do that.
You know, your reaction of like, this is crazy when we've got JavaScript in our databases
is spot on.
Thank you.
I thought so.
As you pause and think about it, why not?
Like kind of the art of the crazy is really fun as developers, right?
Like we're going to see a lot more of this.
And I see them, ZomboDB is a fascinating one.
So it'll keep your Postgres data in sync in Elasticsearch.
And you can maintain Elasticsearch indexes and query them from directly within Postgres.
Say that again for us slower folks like myself.
So if you want to use Elasticsearch
for your full-text search, right?
Normally you've got to pipe all that data over somehow.
So what this is going to do is going to,
as you write a record
and it's a transactionally consistent record,
it's going to sync it over to Elasticsearch
and maintain that index.
And now when you query this, you could over to elastic search and maintain that index and now when you
query this you could go to elastic and query it but you've already got you know your your application
connected to uh postgres like why not just use that index so you can basically have a little
like kind of reference to say use this elastic search index it's automatically going to call out
use elastic for full text search and get that back in your standard SQL Postgres query.
I see. So it's like a proxy for Elastic without having to worry about it. Like you feel like
you're just using Postgres, but it's actually proxying to Elasticsearch back there.
Exactly.
That's cool.
And it's, I'm like, who thought you would like need that? Like, it's like, you're still running Elastic,
but like now I don't have to worry about transactional consistency
and keeping things in sync.
And how do I query this?
There's nothing new to learn in that way.
Yeah.
So I think we're going to see a lot of that.
I think we're going to see a lot of that on extensions,
just continuing to advance.
Within Postgres itself,
one huge area is going to be pluggable storage.
So a couple of years ago, Postgres got
this committed into core. It's still early on, but basically you can have a different storage engine.
This is going to unlock a lot in coming years. I think it's, will Postgres core ship with multiple
storage engines? I don't know. That's a good question. I think probably at some point you'll
have a choice. There's a few in development.
I don't think we'll see hundreds and hundreds like we do extensions because it's a higher bar to write.
It's deep C code and understanding how the Postgres storage engine works
and how you change it and optimize it.
One of the biggest pains with Postgres that people complain about is vacuum.
Postgres under the covers, what it is it's it's a giant append only log
like you you do an update and what happens is it doesn't go and update those bytes on disk
it basically flags that record as dirty then writes out a whole new record so it basically
it's like a logical delete right where if you have like a deleted that column that's hiding
everything it's kind of like that now what happens is uh when the systems at low load vacuum comes in
and cleans up all that it says okay let's free up some space now let's free up some space so people
have this love hate relationship with vacuum because they're like oh man vacuum's running my
system's slower well it's actually going deleting things that you didn't want to be there so it's a
good thing but it's uh there is this love hate relationship with it. But there's a new backend type that aims to completely change that
and changes how the heap works, all sorts of things under the cover.
ZHeap is the backend that's under active development,
shows really promising kind of improvements around how Postgres handles vacuum
and you don't have to deal with it quite as much.
Basically, it's some better space savings
and performance on all that front,
which is really, really interesting, right?
So the idea that we didn't have to go
and do a complete rewrite of Postgres
and make this change for everyone,
for some people, this is really beneficial.
Other people, they may not need it, right?
Zed Store is another one, which is Columnar.
So that's in active development right now.
Columnar data stores are, they flip things around on their head. Instead of storing rows,
they store things kind of by columns. What that means is things compress down really well. So if
you've got things like time series data, you could imagine like, oh, I store something for an hour.
I don't have to store a record. I just say at at record 101 i've got like 12 o'clock and 12 o'clock goes all the way to record 2000 so i don't have to
write 1900 records i just say like right here start right here stop and so columnar is really
useful in time series it compresses data down really really tightly from like a 3x to 10x so you're storing less on disk you have to scan less now columnar like all these things are trade-offs
columnar isn't perfect for every application yeah that sounds like a much different way of going
about storage it seems like it would it would be backwards for a lot of what postgres normally is
used to do right like you wouldn't want to you wouldn't be hey, I got my Postgres database and I'm just going to
like swap out some backends and see which one I like.
I don't think the columnar one is going to, for your
normal use case, is going to be advantageous, right?
Right. Well, I think normal
is a tough question, right? Because it's like
that's the world you come from, but like
Postgres is at the core of a lot of
data warehousing tools. Postgres,
because of its license, people take Postgres and kind
of modify the code and change it and make it columnar. If you look at some things like
Greenplum or- That's why I call it abnormal,
because they're modifying Postgres. They're changing the way it works, right? That's abnormal.
Yeah. I think from a traditional Rails or Node or web application, the transactional workload,
yes, right? Now we're completely changing what Postgres,
the bread and butter of it, but there's no reason I can't do this. Right. And it can work this other way, but it's a really different set of trade-offs and this isn't for everyone. I think it's going
to be interesting how it evolves. I don't know if the core community will maintain a bunch of these,
or if it's, you know, some side companies or, you know, whole new companies that evolved out of this. But it really just expands what Postgres can do.
Plugable storage to me is one of those next big frontiers. It's going to be an exciting area for
five or 10 years. You can see remnants of where people have taken Postgres. Amazon Redshift's a
great example. That was Postgres like 10 or 15 years ago. They got modified and modified and
modified. It was a company called
Paracel that got kind of sort of bought by Amazon. And you see hints of it, but it's like, I think it
was like Postgres 8.1, which is north of 10 years old. So it doesn't have things like JSON. So I
think that that extensions world is really interesting and pluggable storage of the next
five, 10 years. We're going to see a lot there because Postgres can keep moving, be safe and stable, reliable, not lose my data as
a database, the most important thing, but it's worth stating. And then we get all these,
like you say, like these kinds of crazy things that, well, yeah, I actually do want this. Like
as my stack evolves and I want to do more,
I do want to write a recommendation engine
inside my Postgres database
and not have to have this ETL job
that feeds into a data lake,
that feeds into Redis,
that I'm maintaining five things.
I just wanted to do this one thing.
So I think extensions, Plugable Storage to me
are a lot of excitement, though I,
a ton of credit to just core Postgres, right? Postgres is just going to keep new indexes,
new polish. There's maybe some really new, awesome kind of sexy feature like JSONB that I'm
forgetting, but I think we've got a pretty good base. Now it's keep working on performance, ease of use,
those sort of things in the core.
And a ton of credit to the people,
the 40 or so committers and all them
that just keep plugging away.
Not a lot of them kind of end the limelight,
just making it available for millions and millions
and millions of developers out there.
Well, Craig, I can tell you're very excited about this.
I want to ask you one last question,
which is you've been doing all of this work on education
and evangelism, for lack of a better word,
getting the word out there,
about all the cool stuff Postgres is able to do
and will be able to do in the future.
What's the best place to get started
for people who are brand new
to the world of interacting with the database?
Surely you have resources, maybe your website.
Where do people go to learn Postgres
and keep up with the new features coming out?
Yeah, so I mean, there's a few places,
you know, one shameless plug,
I help curate Postgres Weekly.
It's not a DBA newsletter.
It's really targeted to app developers.
Like it's like, here's the how-tos,
here's this tutorial,
here's this shiny new feature, right?
We'll talk about extensions and pluggable storage, but it's really targeted at kind of app developers that want to learn more.
It's, you know, once a week, you've got kind of five to 15 articles in there.
It's not a, here's a hundred things to read, so pretty easy to parse.
I blog about things a good bit.
There's a number of companies out there that blog a good bit as well.
There's a Planet Postgres, kind of a syndication of a bunch of people that blog. It's a great
resource if you want to follow every article that comes out. Those are my two top recommendations.
If you really want to learn about the internals, go subscribe to the PG SQL hackers mailing list.
Read that. If you want to get some more of the basics, the PG SQL hackers mailing list. Read that. If you want to get some more of the basics,
the PG SQL users mailing list, like if you want to learn like, hey, how do I debug a slow query?
There's a bunch of great resources there. There's a Postgres team community Slack that there's thousands of people in just hanging out. There's still an active PostgresQL IRC.
There's some really loyal people in there that, hey, if you've got trouble with a query,
they'll come in and help there.
Those are kind of my offhand lists.
There's a lot of blogs, but a lot of those try to hit in PostgreSQL Weekly and that sort
of thing.
Yeah.
So those are my default.
And there's probably a few other links that I can come up with after that I can make sure
to kind of send over and we can get added as well.
Absolutely.
Listeners, all of the links to all the things, including things maybe that he didn't even mention right now, but he thought of later, will be in your show notes.
You know how to access those.
Craig, this has been a lot of fun.
You got me excited once again about Postgres.
Not even just the future, but the present of Postgres.
I want to get out
there and play with some of these extensions. Thanks so much for coming on the ChangeLog. And
thanks so much for keeping us all abreast of what's going on with Postgres over the years.
We really appreciate it. Yeah, thanks so much for having me.
That's it for this episode of the ChangeLog. Thank you for tuning in. If you haven't heard yet,
we have launched ChangeLog++. It is our membership program that lets you get
closer to the metal, remove the ads, make them disappear, as we say, and enjoy supporting us.
It's the best way to directly support this show and our other podcasts here on ChangeLog.com.
And if you've never been to ChangeLog.com, you should go there now. Again, join ChangeLog++
to directly support our work and make the ads disappear.
Check it out at ChangeLog.com slash plus plus. Of course, huge thanks to our partners who get it,
Fastly, Linode, and Rollbar. Also, thanks to Breakmaster Cylinder for making all of our beats.
And thank you to you for listening. We appreciate you. That's it for this week. We'll see you next
week. We'll see you next week. Bye.