Postgres FM - Why is Postgres popular?
Episode Date: September 30, 2022This episode was badly affected by internet issues. Hopefully the edit came out ok, but the quality should be back to a better level from next week.Here are links to a few things we mentioned...: Monthly blog event, PGSQL Phriday (blog post from Ryan Booz) Who or what made Postgres cool? (tweet from Kenneth Cassel) PostGIS Acquisition of Sun by Oracle DB-Engines trendHacker News hiring trends Supabase on GitHub (nearly 40k stars)How I Built This (podcast) ------------------------What did you like or not like? What should we discuss next time? Let us know by tweeting us on @PostgresFM or by commenting on our topic ideas Google doc.If you would like to share this episode, here's a good link (and thank you!)Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I am Michael, founder of PGMustard, and this is my co-host Nikolai, founder of PostgresAI.
Hey Nikolai, what are we talking about today?
Hi Michael, why Postgres? And why Postgres in the past?
Why Postgres had success, still is having success, obviously.
Let's also a little bit talk about what we should expect in the future in terms of Postgres popularity or usage and so on. is going to be, I think, the first topic there. So maybe we can consider this our entry, our audio
entry to that event. And also I saw a tweet not long ago, somebody asking who or what made Postgres
cool, which had a lot of interesting answers. And I think I'm looking forward to hearing your takes
on this as well. But first, actually, quickly, I did want to apologize because last week I said
that Amcheck doesn't guarantee
no false negatives and i was completely the wrong way around it's no false positives mistake very
huge yeah well don't trust me i'm joking i also i also need to confess i also made a mistake and
i actually made it in reality i only recently understood some case when like
Amchek, when we check a lot of indexes, we usually do it in one single thread. And if it's
some temporary server, not production, of course, it does make sense to parallelize it. And only in
Postgres 14, it was actually implemented in CLI to PG underscore Amchek. so dash j appeared so worth checking
and try to use it even for older
Postgres version. I personally never tried
but I suspect it should work as well
for old versions of mcheck and Postgres
you can use on newer client
right. Okay, enough confession
I also want to thank you
everyone again for feedback
this is super important and I also
want to send everyone
some words of support in hard times. I hope work distracts from bad news and so on. And
I hope Postgres is in well condition everywhere, well maintained and so on.
Absolutely. Well, so why Postgres? Should we start with a little bit of history? Maybe
we could do a quick summary.
Or do you want to start with where things started to change?
Or where would you like to start with this?
Well, being a human, I always have some history, right?
When I started and I needed to choose some free database system.
I moved away from Oracle and SQL Server.
My SQL was considered bad very quickly in my mind,
and I chose Postgres because it behaved much better and closer to what I learned at university.
So unlike my SQL, this was my reason why Postgres, but it was very long ago, almost 20 years,
actually. So are we talking SQL standard compliance or are we talking around ACID compliance?
Or how would you summarize that?
SQL standard compliance is one thing and indeed a very good reason.
And it was so, but also like logical, predictable behavior.
You expect something, but with MySQL, for example, this terrible thing we had with MySQL at that time, if you have February 28th and add one day, you end up having March 3 or something like this.
Or date 000, behavior of nulls, although Postgres behavior of nulls is also interesting.
So in MySQL, I started to encounter too many things.
And it's not only SQL standard, but for example, ACID principles.
In MySQL, if you need full-text search, you need to use MySUM.
So MySUM required repair table all the time because it didn't follow ACID principles in terms of wall records and so on.
So Postgres had much better, but it had limitations and lack of some functionality, of course, like replication at that time and so on. So Postgres we had much better, but it had limitations and lack of some
functionality, of course, like replication at that time and so on. So we needed to deal with
Slony, then Launderstedt. It was not fun, but still, this is my path. But at that time, I felt
clearly that Postgres is considered as not number one choice for most people. And I felt it clearly like until like 2014, I was thinking like,
Postgres is for, it's like free BSD, you know, most people choose Linux, but you choose free BSD,
or even even not, maybe it's not a good comparison. It's like Linux compared to Windows for
personal computers. You choose it, you know why, but most people still prefer Windows, right?
This was my feeling at that time.
But of course, I thought it's okay.
We have strong reasons.
We won't switch to MySQL, never.
We know why.
But it's okay that most people around
are preferring different database system.
Well, poor those people, right?
This was our way of thinking
i mean our in terms of like community i remember this and sometimes of course we were even blamed
that we advertise postgres too much it's like sect almost sect you know and it was until 2013 14 15 2013, 14, 15, when suddenly things changed. They changed earlier, I think, but around 2013,
maybe 12, we had some point when Postgres suddenly became a winner compared to my SQL, but why?
Right? Yeah, I want to go back quite far to introduce when some things happened. I think
we won't talk about in a moment moment but for people that don't know
I think some dates will be quite useful I've done some research and just going right back if we go
back to when a lot of this all started there were a lot of research projects in the in the 70s and
80s that a lot of these projects came out of universities Ingress and Oracle came out in the
70s those actually have similar roots in the 80s. Those actually have similar roots. In the 80s, Postgres
started as a post-ingress project. And even from the very beginning, it had a lot of the foundations
I think have proved very important today. So things like its extensibility and MVCC.
Then we've got a few other important dates here that I think we might come back to. We've got
Microsoft SQL Server's first version in 89. Then we've got Postgres being released under an MIT style license, which I
believe is the Postgres license in 94. I think that's been really important. Then Postgres 95
added SQL support or SQL support. Feels like a really important date. And then I've got a bit of a gap. In fact, actually, another important thing
in 95 was MySQL version
1 came out. So I think that lays
a lot of... 95 was absolutely
amazing year, actually. Internet
started to work, actually, that year,
I think. And not only
you mentioned Postgres and MySQL, but
if we look at languages,
Java, JavaScript, PHP
and Ruby all were born in
95. Can you imagine?
Wow. And Windows 95 was a massive release as well, wasn't it?
So it was like, I think 95 and maybe 2014, some changing years, like in many areas.
Yeah. So I was going to say that it's probably massively overgeneralizing. And I think your FreeBSD example is great because for any small system, there are always going to be some really passionate people that, and there have been people that have been using Postgres.
I want to take my words back, not FreeBSD, but Linux versus Windows. This is a better feeling actually. Okay. Yes. I guess the thing here though is we've got with Postgres, we've got a
changing of the guard, I feel a little bit, whereas I don't think that's happened with
Linux and Windows for personal computers yet. Maybe. Okay.
Slowly, this technology is winning more and more hearts because it's just better,
because it follows better principles for
example very strong cli everywhere posgis has excellent cli and linux has excellent cli and
principles like don't write too much if everything is fine better keep silence right like if success
just keep silence exit code like return code and that's it while other tools can be very chatty, right?
This is small things, and these principles,
like Linux is all about a lot of CLI tools, so many tools.
And Postgres has very strong PSQL and a lot of things around it.
There are many common things here.
So 3DSD, maybe it's too different compared to Linux.
Yeah, fair enough. So my understanding is that for a lot of these early years, if you had a serious workload, chances are
you're going to be running on a paid for product, probably something like either Oracle, maybe even
IBM. But then as time went on SQLql server or microsoft sql server became the
main competitor to oracle for these big serious workloads and then for smaller workloads or for
hobby projects lamp stack became popular with mysql as the database i think things like wordpress
picking mysqlSQL as its backend seemed really
important for its adoption. So I think we had like a phase where if you had money and you were
a serious company, you'd go for one of these paid for proprietary systems. And if you didn't,
or you're a startup, maybe if you wanted a SQL database, you'd go with MySQL. Is that roughly your understanding, like how you saw it as well?
Yeah, and also important availability.
And for those who don't want to pay, some websites and apps and so on, startups,
for them it's important availability and how easy, like the cost of maintenance when you start,
like how easy to start and so on.
And Postgres, I remember very well,
it was for a long time,
it was blamed for high barrier of entry.
It's very difficult for new users to start with it.
For example, I installed it and I cannot connect to it.
This was quite long problem for Postgres.
Something for maybe smart people,
but barrier so high,
why should we spend time for it?
MySQL is easy to start
and easy to maintain in the beginning.
But I remember actually in 2009 or 2010,
I attended MySQL user group in Moscow, Russia.
And I remember that it was about performance.
And they, at the time already, they compared themselves
with Postgres and they admitted that Postgres is winning in performance for one node at
that time very well. Like they said, you know, Postgres is better here, here, here, here.
Because at that group, MySQL developers, I mean, hackers, they also were there. And they
were discussing like, we compare MySQL with Postgres and we are losing what to do about it.
And it was interesting because also Postgres had the image that it's also slower than MySQL.
It still has in many minds, actually, but it's not so.
And it's interesting.
I realized that Postgres is good even among MySQL users and fans.
So it's controversial, but I think for popularity,
it's super important how accessible it is. For example, does your operational system
already have it installed or it can be installed in one line? And also providers, if you create
some website, you have some hosting, is MySQL available there or Postgres available there?
If only MySQL, and it was so in many cases, you will prefer MySQL because it's available.
You don't need to install it and the provider gives it to you very quickly.
It's important, right, for small projects.
Yeah, I think so.
I think developer experience is important.
But I also might argue that Postgres hasn't changed that much from a database point of view, but I think a lot
of frameworks and cloud providers and things have really helped there in terms of making it easier
to spin up or to connect to. So I'm, yeah, I don't know if you did, have you noticed a change in the
tooling, for example, like some of the issues you mentioned, like being able to connect to it
becoming more easy? Well, I don't't know like it's hard to me to
like i understand like like availability is super important but i remember when i started
postgres of course i compiled from source i could not install it from packages maybe packages were
not available but over time it became much more available like all linux distributions started
to have packaging up-to-date packaging with all contrib modules, all popular contrib modules, of course, and so on.
Of course, YAM and APT systems are most important here, and they both have Postgres available there, but you don't need to compile it.
And this means that most admins on hosting, they start to just edit because why not? It's there, right? So
somehow it was, I think it was slow revolution. And how to connect, we have some small documentation.
For example, I remember I always end up when explaining people how to install and connect
Postgres on Linux, on Ubuntu, on CentOS, I always found digital ocean documentation,
like already like maybe up to 10 years ago, they had excellent after the documentation how to upgrade to a new version. And so
these how to documentation actually post this documentation,
official documentation is still lacking these how to types of
documents describing for particular Linux distribution,
how to do particular tasks but
others like digital ocean they fill this gap and this is important I think and it was slow process
I think this takes us really nicely onto some reasons why postgres did then succeed in spite
of those things and where it has becomes and is becoming more and more popular each year. I wanted to take us back time-wise to a few other important dates.
I think PostGIS has been really important for Postgres.
I think a lot of people describe it as,
even compared to the different commercial offerings out there,
the most advanced GIS system out there, and that's a Postgres extension.
And I think if it wasn't for postgres's extensibility
at a deep level that wouldn't be possible so i think that's been huge and it came out in 2005
which i couldn't believe and i i don't know when it became or when it started to become competitive
with the commercial offerings but that feels really important to me and then the the other
dates i thought were super interesting were couldn't believe it was as early as 2007 that Heroku chose Postgres as its default or the database that they offered in 2007, could you believe?
But I would then point back at why could they do that?
Probably because of its MIT style license.
It was extremely permissive and they could do that without having to consult with anybody. They could choose it because of those decisions that Postgres made early
on. It was reliable and very usable and all those things. And they could take away some
of those developer experience problems of getting started with it because they could
implement them, make them easy for the people on Heroku.
Right. And this influenced a lot of Ruby developers, Ruby engineers, right?
Ruby support was excellent with Heroku.
And it's so natural for you to choose Postgres if you already chose Ruby and chose Heroku.
It was the number one choice, of course.
So this provider's influence is very important.
I agree with you with Post.js and extensibility and features, like all these parts, but it
only parts of like only small parts of whole puzzle.
Right.
But what happened with MySQL, I think also important because MySQL couldn't consider
any more as free software at some point.
Right.
Well, I think this might be, this might might be my if it's not the biggest i think
this might be the biggest bit of luck that postgres got in the whole you know i think everything else
it's from deliberate choices and being very very good but this i think was quite lucky that of all
the companies that could have bought uh well so oracle acquired, didn't they, which included MySQL.
And that could have gone a few different ways, but people had a distrust of Oracle already, I think.
And crucially, one of the creators of MySQL, the main person, I believe, behind it, split off to create MariaDB. And my understanding is he couldn't choose
as permissive a license as MySQL.
He didn't want to, to avoid MySQL
taking all of the changes from his fork onto theirs.
So that split in the community,
in the development and in the trust of the community,
I think really opened an opportunity for Postgres
to become the
the default choice if you if you were a startup and you needed a free database or you wanted
you maybe you couldn't afford sql server oracle or maybe you you really valued open source i think
suddenly people were questioning should we go with that and then when people were looking for options
postgres was a really good alternative so we had a little internet outage, but we're back.
Where we left off was around...
We haven't...
We considered growing popularity as some well-known fact,
but let's discuss why we think it's obvious that Postgres is winning.
I have two sources.
One is dbengines.com, and their methodology is quite complex, maybe not clear to me.
But still, people consider this as a reliable source of truth, and obviously Postgres is growing, growing popularity is growing, good.
But there is another source of information, which is very simple and methodology is quite straightforward.
It's Hacker News job postings.
They have every month they have who is hiring.
And usually it's more than 1000 replies and it includes everything.
But mostly from startups, sometimes it's a grown startup.
Sometimes it's like new startup.
But there is an HM Trends, hackernewstrends.com.
We will provide link. And from time to time, they analyze all those texts. It's just text, right?
And extract technologies. Everything from React to remote world, how many postings have remote.
So can you imagine Postgres has 15% in all job postings, including marketing people,
support people, non-technical people at all.
And this is 15.
It's very, very high.
I think it's top 10.
And it's on par with...
Word remote probably is the number one still after COVID.
But React is React also quite a word, right? GitHub stars is also some
reliable source of truth and super base, which I consider are the champion in terms of productizing
Postgres itself and many extensions of it. And products like Postgres, they productize it very
well. And they have a lot of GitHub stars. I think it's how many dozens of thousands?
And the growth is better than React growth.
Yeah.
Right.
All these numbers are quite reliable.
I mean, these job postings and GitHub stars.
Yeah.
I think it's very hard to refute that.
And I think there's also a growing feeling on places like Hacker News.
But I know it's a very biased source,
but the sentiment around Postgres feels like it really changed in that timeframe as well.
It wasn't just that these numbers went up. It also felt like there was almost a, not quite a backlash against the NoSQL movement, but I think people realized there was like
a phase that people went through where the words big data got thrown around a lot and NoSQL gained a lot of momentum.
NoSQL is interesting.
Mongo is interesting.
Some people think that JSON is the main reason why Postgres is winning hearts of developers.
And I think it's a very important contributor, but maybe not the main one. We chatted a little bit before we
started this episode and it looks like if you check, JSON appeared in 9.2, JSONB appeared in
9.4. But if you check when the spike of popularity happened, according to dbengines.com data,
it happened when RDS released postgres version right
yes now i i think this is a really interesting chicken and egg problem now did rds have to add
postgres because it was gaining popularity did postgres popularity spike because they chose it
themselves they chose themselves for their own like how they chose Postgres and actually Aurora version of it, which is quite different. But they chose Postgres, not Aurora MySQL, they chose Aurora Postgres as the database for themselves instead of Oracle, right? Well, I think this is a critical point. And yeah, you brought up Apple in our previous
chat as well. So big, huge companies choosing to migrate to Postgres around that time definitely
had a big effect. JSONB, I agree. I think the biggest impact JSONB had was it gave people an
answer. If they needed a document store, they didn't have to have MongoDB in addition to Postgres.
They had a really good answer for the team that wanted to do some document storage.
They could say, well, you can put it in Postgres,
and we have indexing support for it.
And it became an easy answer to anybody on that front.
But yeah, I agree with you, it's been important, but not the primary factor.
Postgres is good in reacting to challenges.
You mentioned 1995 when SQL support was added. important, but not the primary factor. Posgus is good in reacting to challenges.
You mentioned 1995 when SQL support was added.
Without SQL, it wouldn't survive, I guess, in terms of a popular system.
Then there was some hype like object databases, object relational databases.
Posgus also adopted some things and it's still considered as object relational databases
and has some features
and behavior of object-relational system.
But then semi-structured hype started, and it started with XML, actually.
And some Russian developers, Egbertunov and Tedorsigay, they added support of HStore in
2004, can you imagine?
Very long ago.
I personally participated in bringing XML support to Postgres, but both
HStore and XML right now may be not that relevant because JSON is here already and it's standard
factor for unstructured or semi-structured data. So Postgres is good in reacting. Postgres is doing
hard work in reacting to challenges. I completely agree. And I think that leads me to my, I've got
two more things I wanted to make
sure we covered. One is that
I think a lot of things we've discussed so far
explain how Postgres
was taking over from MySQL for
the projects that were considering a free
open source database, but it doesn't
yet explain how
larger companies are choosing it instead
of Oracle, instead of SQL Server.
And that, I would say, is another example of it reacting.
Larger companies don't care about wide popularity, right?
They care in terms of how many engineers they could hire.
But also sometimes I will see some companies prefer languages like Erlang.
For example, WhatsApp, Erlang, and they still keep, or Scala, some big companies use Scala or Erlang, not very popular, but they choose it and they bring expertise inside.
So they care less about hype, right?
Yes, but things that did matter to them, I think PostgreSQL started to ship around that same time. So 2016,
9.6, parallel
queries came in. 2017,
we had partitioning, logical
replication. But by 2016,
17, the game already
was done.
The game was done
in 2014 when RDS
appeared. Versus MySQL,
yes, but I think versus
Oracle and SQL Server, no.
I think actually these larger companies...
I have a different opinion.
I observed in Russia, Yandex,
when I returned to Russia for
like 10 months or so because of visa
issues and economy issues after
Crimea invasion. I returned
to Russia and I was bored and
decided to relaunch
Russian user group and I was bored and decided to relaunch a Russian user group.
And I asked which big company can host us. And Mailotario and Yandex both said yes. And
I chose Yandex. But Yandex said, can we also squeeze a couple of like 10 minutes of
small lightning talk? I said, of course. What's the topic? And they said, how we migrated Yandex Mail from Oracle to Postgres. And then on the very first event, they gave this talk.
It was amazing. And I remember in the first row also a guy from Avito was, and they said,
Avito is running Postgres as well. These companies are big and they don't care a lot about hype.
And they also, they are not like enterprise.
They can count money, startups, right?
So they can count money and so on.
And in 2014, they already, they also don't care about this replacement for political reasons, replacement of Oracle and SQL Server in Russia.
They didn't care about it at all at that time.
It was before, actually.
They chose it already.
And they chose it due to many time. It was before, actually. They chose it already. And they chose
it due to many reasons. And they were big already. And they made decision in 2014, right? And I was
completely surprised. Yeah. So it's kept growing since then, though, right? So I think there have
been an increasing number of use cases that can switch to Postgres from Oracle or SQL Server, or
maybe companies that switch for new projects
because the performance is
on par for their workloads, because
they now have the features they need.
Features, performance. Let's agree
with this. When Yandex
was ready to discuss
how they migrated from Oracle to Postgres, by the way,
later they gave a talk at
PGCon in Ottawa,
and those were very good talks, like next year, maybe 2015.
But by that time, they were already working on it a couple of years.
So they started in 2012.
This brings us back maybe to this acquisition of MySQL
by Oracle, like indirect acquisition, like chain of acquisitions in 2010.
And also I consider Apple's decision to migrate internal things from MySQL to Postgres in 2011
is also some turning point, right?
Some signal, very strong signal, like some enterprise decided.
But we also have a case when some big company, big startup,
maybe one of the biggest ones migrated from my
sql to postgres and a couple of years back right you know what i'm i'm talking about the other
direction uber case from my sql to postgres and then back back was the one that got a lot of a
lot of attention yes a lot of good criticism as well by by the way. Yeah. I had one more point that I think kind of takes us into the future as well,
and that's that I think Postgres has, the last few years,
it's released major improvements every year.
I'm not just talking about the fact they call them major releases.
There have been big improvements every major version every year.
I used to product manage products for SQL Server and Oracle, and they didn't come out with major versions every year. I used to product manage products for SQL Server and Oracle,
and they didn't come out with major versions every year.
They didn't.
It was every two or three years at a push,
and even then they didn't always include game-changing features.
So I think it's very interesting the speed of the –
it's sometimes considered slow, and I think that's unfair,
but the speed of improvement with Postgres, I think, is probably the highest of all the relational databases at the moment.
And also quite well structured as well. It's impossible to see some feature. In MySQL,
they had check constraints, which did nothing. They didn't check anything until a couple of
years ago, they brought it in minor release. How come you add feature
to minor release? It's impossible
in Postgres, right?
It's very well structured and principles
are followed. It's good.
But there are many downsides as well.
Maybe we should talk about it at a different
time, about the process of
development, and I will explain
why I cannot participate in it.
Yeah, that will be a fascinating discussion.
Is there anything else on the future that you want to talk about?
Like, why?
Future, nearest future is bright.
Distant future, I have concerns.
And because, like, one of those concerns is this development process.
It's outdated through email.
Only the strongest, the biggest will survive and i think it it's not attractive
to many many people who could help i strongly disagree and i'm looking forward to discussing
that with you well let's we don't have time to right now to dive into it but let's dive in
another time but i i have my own doubts so i'm still like big fan. Like last 18 years, I choose Postgres. I recommend
choosing Postgres for everyone who deals with OLTP. For analytical, there are questions,
but for OLTP workloads, Postgres is number one default choice. We have cases when a company
grows to a multi-billion dollar evaluation. We have many such cases. We have cases when company grows to multi-billion dollar evaluation.
We have many such cases with just single database.
Yeah.
It's scale up.
Yeah, so easy.
And Postgres has so many features that others don't have.
So just choose it.
And as a bottom line, in my opinion, as with any startup, which became a great company, as we listen to their founders, there is, by the way, a good podcast called How I Built This by NPR.
You know it, right?
Like Starbucks story and many, many others, like Instagram story.
People usually answer, why success?
Combination of hard work and a lot of luck. So I consider this MySQL unfortunate timeline
as a big lack of Postgres,
but also we have a lot of hard work.
We are 100% sure.
That's a really nice ending.
Well, thank you everybody for joining us.
Thank you again, Nikolai.
And see you next week.
Thank you.
Don't forget to follow, comment, like, and provide you next week. Thank you. Don't Don't forget to follow comment like and provide
feedback on Twitter or anywhere else. We'll listen to you. We
already made several episodes based on feedback. So we're
ready to continue this practice, right? Yeah. Thank you so much.
Thank you. Bye bye.