Postgres FM - Why is Postgres popular?

Episode Date: September 30, 2022

This episode was badly affected by internet issues. Hopefully the edit came out ok, but the quality should be back to a better level from next week.Here are links to a few things we mentioned...: Monthly blog event, PGSQL Phriday (blog post from Ryan Booz) Who or what made Postgres cool? (tweet from Kenneth Cassel) PostGIS Acquisition of Sun by Oracle DB-Engines trendHacker News hiring trends Supabase on GitHub (nearly 40k stars)How I Built This (podcast) ------------------------What did you like or not like? What should we discuss next time? Let us know by tweeting us on @PostgresFM or by commenting on our topic ideas Google doc.If you would like to share this episode, here's a good link (and thank you!)Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I am Michael, founder of PGMustard, and this is my co-host Nikolai, founder of PostgresAI. Hey Nikolai, what are we talking about today? Hi Michael, why Postgres? And why Postgres in the past? Why Postgres had success, still is having success, obviously. Let's also a little bit talk about what we should expect in the future in terms of Postgres popularity or usage and so on. is going to be, I think, the first topic there. So maybe we can consider this our entry, our audio entry to that event. And also I saw a tweet not long ago, somebody asking who or what made Postgres cool, which had a lot of interesting answers. And I think I'm looking forward to hearing your takes
Starting point is 00:00:58 on this as well. But first, actually, quickly, I did want to apologize because last week I said that Amcheck doesn't guarantee no false negatives and i was completely the wrong way around it's no false positives mistake very huge yeah well don't trust me i'm joking i also i also need to confess i also made a mistake and i actually made it in reality i only recently understood some case when like Amchek, when we check a lot of indexes, we usually do it in one single thread. And if it's some temporary server, not production, of course, it does make sense to parallelize it. And only in Postgres 14, it was actually implemented in CLI to PG underscore Amchek. so dash j appeared so worth checking
Starting point is 00:01:45 and try to use it even for older Postgres version. I personally never tried but I suspect it should work as well for old versions of mcheck and Postgres you can use on newer client right. Okay, enough confession I also want to thank you everyone again for feedback
Starting point is 00:02:01 this is super important and I also want to send everyone some words of support in hard times. I hope work distracts from bad news and so on. And I hope Postgres is in well condition everywhere, well maintained and so on. Absolutely. Well, so why Postgres? Should we start with a little bit of history? Maybe we could do a quick summary. Or do you want to start with where things started to change? Or where would you like to start with this?
Starting point is 00:02:31 Well, being a human, I always have some history, right? When I started and I needed to choose some free database system. I moved away from Oracle and SQL Server. My SQL was considered bad very quickly in my mind, and I chose Postgres because it behaved much better and closer to what I learned at university. So unlike my SQL, this was my reason why Postgres, but it was very long ago, almost 20 years, actually. So are we talking SQL standard compliance or are we talking around ACID compliance? Or how would you summarize that?
Starting point is 00:03:11 SQL standard compliance is one thing and indeed a very good reason. And it was so, but also like logical, predictable behavior. You expect something, but with MySQL, for example, this terrible thing we had with MySQL at that time, if you have February 28th and add one day, you end up having March 3 or something like this. Or date 000, behavior of nulls, although Postgres behavior of nulls is also interesting. So in MySQL, I started to encounter too many things. And it's not only SQL standard, but for example, ACID principles. In MySQL, if you need full-text search, you need to use MySUM. So MySUM required repair table all the time because it didn't follow ACID principles in terms of wall records and so on.
Starting point is 00:04:01 So Postgres had much better, but it had limitations and lack of some functionality, of course, like replication at that time and so on. So Postgres we had much better, but it had limitations and lack of some functionality, of course, like replication at that time and so on. So we needed to deal with Slony, then Launderstedt. It was not fun, but still, this is my path. But at that time, I felt clearly that Postgres is considered as not number one choice for most people. And I felt it clearly like until like 2014, I was thinking like, Postgres is for, it's like free BSD, you know, most people choose Linux, but you choose free BSD, or even even not, maybe it's not a good comparison. It's like Linux compared to Windows for personal computers. You choose it, you know why, but most people still prefer Windows, right? This was my feeling at that time.
Starting point is 00:04:50 But of course, I thought it's okay. We have strong reasons. We won't switch to MySQL, never. We know why. But it's okay that most people around are preferring different database system. Well, poor those people, right? This was our way of thinking
Starting point is 00:05:06 i mean our in terms of like community i remember this and sometimes of course we were even blamed that we advertise postgres too much it's like sect almost sect you know and it was until 2013 14 15 2013, 14, 15, when suddenly things changed. They changed earlier, I think, but around 2013, maybe 12, we had some point when Postgres suddenly became a winner compared to my SQL, but why? Right? Yeah, I want to go back quite far to introduce when some things happened. I think we won't talk about in a moment moment but for people that don't know I think some dates will be quite useful I've done some research and just going right back if we go back to when a lot of this all started there were a lot of research projects in the in the 70s and 80s that a lot of these projects came out of universities Ingress and Oracle came out in the
Starting point is 00:06:02 70s those actually have similar roots in the 80s. Those actually have similar roots. In the 80s, Postgres started as a post-ingress project. And even from the very beginning, it had a lot of the foundations I think have proved very important today. So things like its extensibility and MVCC. Then we've got a few other important dates here that I think we might come back to. We've got Microsoft SQL Server's first version in 89. Then we've got Postgres being released under an MIT style license, which I believe is the Postgres license in 94. I think that's been really important. Then Postgres 95 added SQL support or SQL support. Feels like a really important date. And then I've got a bit of a gap. In fact, actually, another important thing in 95 was MySQL version
Starting point is 00:06:48 1 came out. So I think that lays a lot of... 95 was absolutely amazing year, actually. Internet started to work, actually, that year, I think. And not only you mentioned Postgres and MySQL, but if we look at languages, Java, JavaScript, PHP
Starting point is 00:07:04 and Ruby all were born in 95. Can you imagine? Wow. And Windows 95 was a massive release as well, wasn't it? So it was like, I think 95 and maybe 2014, some changing years, like in many areas. Yeah. So I was going to say that it's probably massively overgeneralizing. And I think your FreeBSD example is great because for any small system, there are always going to be some really passionate people that, and there have been people that have been using Postgres. I want to take my words back, not FreeBSD, but Linux versus Windows. This is a better feeling actually. Okay. Yes. I guess the thing here though is we've got with Postgres, we've got a changing of the guard, I feel a little bit, whereas I don't think that's happened with Linux and Windows for personal computers yet. Maybe. Okay.
Starting point is 00:07:56 Slowly, this technology is winning more and more hearts because it's just better, because it follows better principles for example very strong cli everywhere posgis has excellent cli and linux has excellent cli and principles like don't write too much if everything is fine better keep silence right like if success just keep silence exit code like return code and that's it while other tools can be very chatty, right? This is small things, and these principles, like Linux is all about a lot of CLI tools, so many tools. And Postgres has very strong PSQL and a lot of things around it.
Starting point is 00:08:38 There are many common things here. So 3DSD, maybe it's too different compared to Linux. Yeah, fair enough. So my understanding is that for a lot of these early years, if you had a serious workload, chances are you're going to be running on a paid for product, probably something like either Oracle, maybe even IBM. But then as time went on SQLql server or microsoft sql server became the main competitor to oracle for these big serious workloads and then for smaller workloads or for hobby projects lamp stack became popular with mysql as the database i think things like wordpress picking mysqlSQL as its backend seemed really
Starting point is 00:09:26 important for its adoption. So I think we had like a phase where if you had money and you were a serious company, you'd go for one of these paid for proprietary systems. And if you didn't, or you're a startup, maybe if you wanted a SQL database, you'd go with MySQL. Is that roughly your understanding, like how you saw it as well? Yeah, and also important availability. And for those who don't want to pay, some websites and apps and so on, startups, for them it's important availability and how easy, like the cost of maintenance when you start, like how easy to start and so on. And Postgres, I remember very well,
Starting point is 00:10:08 it was for a long time, it was blamed for high barrier of entry. It's very difficult for new users to start with it. For example, I installed it and I cannot connect to it. This was quite long problem for Postgres. Something for maybe smart people, but barrier so high, why should we spend time for it?
Starting point is 00:10:27 MySQL is easy to start and easy to maintain in the beginning. But I remember actually in 2009 or 2010, I attended MySQL user group in Moscow, Russia. And I remember that it was about performance. And they, at the time already, they compared themselves with Postgres and they admitted that Postgres is winning in performance for one node at that time very well. Like they said, you know, Postgres is better here, here, here, here.
Starting point is 00:10:56 Because at that group, MySQL developers, I mean, hackers, they also were there. And they were discussing like, we compare MySQL with Postgres and we are losing what to do about it. And it was interesting because also Postgres had the image that it's also slower than MySQL. It still has in many minds, actually, but it's not so. And it's interesting. I realized that Postgres is good even among MySQL users and fans. So it's controversial, but I think for popularity, it's super important how accessible it is. For example, does your operational system
Starting point is 00:11:31 already have it installed or it can be installed in one line? And also providers, if you create some website, you have some hosting, is MySQL available there or Postgres available there? If only MySQL, and it was so in many cases, you will prefer MySQL because it's available. You don't need to install it and the provider gives it to you very quickly. It's important, right, for small projects. Yeah, I think so. I think developer experience is important. But I also might argue that Postgres hasn't changed that much from a database point of view, but I think a lot
Starting point is 00:12:05 of frameworks and cloud providers and things have really helped there in terms of making it easier to spin up or to connect to. So I'm, yeah, I don't know if you did, have you noticed a change in the tooling, for example, like some of the issues you mentioned, like being able to connect to it becoming more easy? Well, I don't't know like it's hard to me to like i understand like like availability is super important but i remember when i started postgres of course i compiled from source i could not install it from packages maybe packages were not available but over time it became much more available like all linux distributions started to have packaging up-to-date packaging with all contrib modules, all popular contrib modules, of course, and so on.
Starting point is 00:12:49 Of course, YAM and APT systems are most important here, and they both have Postgres available there, but you don't need to compile it. And this means that most admins on hosting, they start to just edit because why not? It's there, right? So somehow it was, I think it was slow revolution. And how to connect, we have some small documentation. For example, I remember I always end up when explaining people how to install and connect Postgres on Linux, on Ubuntu, on CentOS, I always found digital ocean documentation, like already like maybe up to 10 years ago, they had excellent after the documentation how to upgrade to a new version. And so these how to documentation actually post this documentation, official documentation is still lacking these how to types of
Starting point is 00:13:39 documents describing for particular Linux distribution, how to do particular tasks but others like digital ocean they fill this gap and this is important I think and it was slow process I think this takes us really nicely onto some reasons why postgres did then succeed in spite of those things and where it has becomes and is becoming more and more popular each year. I wanted to take us back time-wise to a few other important dates. I think PostGIS has been really important for Postgres. I think a lot of people describe it as, even compared to the different commercial offerings out there,
Starting point is 00:14:17 the most advanced GIS system out there, and that's a Postgres extension. And I think if it wasn't for postgres's extensibility at a deep level that wouldn't be possible so i think that's been huge and it came out in 2005 which i couldn't believe and i i don't know when it became or when it started to become competitive with the commercial offerings but that feels really important to me and then the the other dates i thought were super interesting were couldn't believe it was as early as 2007 that Heroku chose Postgres as its default or the database that they offered in 2007, could you believe? But I would then point back at why could they do that? Probably because of its MIT style license.
Starting point is 00:15:00 It was extremely permissive and they could do that without having to consult with anybody. They could choose it because of those decisions that Postgres made early on. It was reliable and very usable and all those things. And they could take away some of those developer experience problems of getting started with it because they could implement them, make them easy for the people on Heroku. Right. And this influenced a lot of Ruby developers, Ruby engineers, right? Ruby support was excellent with Heroku. And it's so natural for you to choose Postgres if you already chose Ruby and chose Heroku. It was the number one choice, of course.
Starting point is 00:15:38 So this provider's influence is very important. I agree with you with Post.js and extensibility and features, like all these parts, but it only parts of like only small parts of whole puzzle. Right. But what happened with MySQL, I think also important because MySQL couldn't consider any more as free software at some point. Right. Well, I think this might be, this might might be my if it's not the biggest i think
Starting point is 00:16:06 this might be the biggest bit of luck that postgres got in the whole you know i think everything else it's from deliberate choices and being very very good but this i think was quite lucky that of all the companies that could have bought uh well so oracle acquired, didn't they, which included MySQL. And that could have gone a few different ways, but people had a distrust of Oracle already, I think. And crucially, one of the creators of MySQL, the main person, I believe, behind it, split off to create MariaDB. And my understanding is he couldn't choose as permissive a license as MySQL. He didn't want to, to avoid MySQL taking all of the changes from his fork onto theirs.
Starting point is 00:16:54 So that split in the community, in the development and in the trust of the community, I think really opened an opportunity for Postgres to become the the default choice if you if you were a startup and you needed a free database or you wanted you maybe you couldn't afford sql server oracle or maybe you you really valued open source i think suddenly people were questioning should we go with that and then when people were looking for options postgres was a really good alternative so we had a little internet outage, but we're back.
Starting point is 00:17:27 Where we left off was around... We haven't... We considered growing popularity as some well-known fact, but let's discuss why we think it's obvious that Postgres is winning. I have two sources. One is dbengines.com, and their methodology is quite complex, maybe not clear to me. But still, people consider this as a reliable source of truth, and obviously Postgres is growing, growing popularity is growing, good. But there is another source of information, which is very simple and methodology is quite straightforward.
Starting point is 00:18:06 It's Hacker News job postings. They have every month they have who is hiring. And usually it's more than 1000 replies and it includes everything. But mostly from startups, sometimes it's a grown startup. Sometimes it's like new startup. But there is an HM Trends, hackernewstrends.com. We will provide link. And from time to time, they analyze all those texts. It's just text, right? And extract technologies. Everything from React to remote world, how many postings have remote.
Starting point is 00:18:42 So can you imagine Postgres has 15% in all job postings, including marketing people, support people, non-technical people at all. And this is 15. It's very, very high. I think it's top 10. And it's on par with... Word remote probably is the number one still after COVID. But React is React also quite a word, right? GitHub stars is also some
Starting point is 00:19:06 reliable source of truth and super base, which I consider are the champion in terms of productizing Postgres itself and many extensions of it. And products like Postgres, they productize it very well. And they have a lot of GitHub stars. I think it's how many dozens of thousands? And the growth is better than React growth. Yeah. Right. All these numbers are quite reliable. I mean, these job postings and GitHub stars.
Starting point is 00:19:36 Yeah. I think it's very hard to refute that. And I think there's also a growing feeling on places like Hacker News. But I know it's a very biased source, but the sentiment around Postgres feels like it really changed in that timeframe as well. It wasn't just that these numbers went up. It also felt like there was almost a, not quite a backlash against the NoSQL movement, but I think people realized there was like a phase that people went through where the words big data got thrown around a lot and NoSQL gained a lot of momentum. NoSQL is interesting.
Starting point is 00:20:10 Mongo is interesting. Some people think that JSON is the main reason why Postgres is winning hearts of developers. And I think it's a very important contributor, but maybe not the main one. We chatted a little bit before we started this episode and it looks like if you check, JSON appeared in 9.2, JSONB appeared in 9.4. But if you check when the spike of popularity happened, according to dbengines.com data, it happened when RDS released postgres version right yes now i i think this is a really interesting chicken and egg problem now did rds have to add postgres because it was gaining popularity did postgres popularity spike because they chose it
Starting point is 00:21:01 themselves they chose themselves for their own like how they chose Postgres and actually Aurora version of it, which is quite different. But they chose Postgres, not Aurora MySQL, they chose Aurora Postgres as the database for themselves instead of Oracle, right? Well, I think this is a critical point. And yeah, you brought up Apple in our previous chat as well. So big, huge companies choosing to migrate to Postgres around that time definitely had a big effect. JSONB, I agree. I think the biggest impact JSONB had was it gave people an answer. If they needed a document store, they didn't have to have MongoDB in addition to Postgres. They had a really good answer for the team that wanted to do some document storage. They could say, well, you can put it in Postgres, and we have indexing support for it. And it became an easy answer to anybody on that front.
Starting point is 00:21:58 But yeah, I agree with you, it's been important, but not the primary factor. Postgres is good in reacting to challenges. You mentioned 1995 when SQL support was added. important, but not the primary factor. Posgus is good in reacting to challenges. You mentioned 1995 when SQL support was added. Without SQL, it wouldn't survive, I guess, in terms of a popular system. Then there was some hype like object databases, object relational databases. Posgus also adopted some things and it's still considered as object relational databases and has some features
Starting point is 00:22:25 and behavior of object-relational system. But then semi-structured hype started, and it started with XML, actually. And some Russian developers, Egbertunov and Tedorsigay, they added support of HStore in 2004, can you imagine? Very long ago. I personally participated in bringing XML support to Postgres, but both HStore and XML right now may be not that relevant because JSON is here already and it's standard factor for unstructured or semi-structured data. So Postgres is good in reacting. Postgres is doing
Starting point is 00:22:59 hard work in reacting to challenges. I completely agree. And I think that leads me to my, I've got two more things I wanted to make sure we covered. One is that I think a lot of things we've discussed so far explain how Postgres was taking over from MySQL for the projects that were considering a free open source database, but it doesn't
Starting point is 00:23:19 yet explain how larger companies are choosing it instead of Oracle, instead of SQL Server. And that, I would say, is another example of it reacting. Larger companies don't care about wide popularity, right? They care in terms of how many engineers they could hire. But also sometimes I will see some companies prefer languages like Erlang. For example, WhatsApp, Erlang, and they still keep, or Scala, some big companies use Scala or Erlang, not very popular, but they choose it and they bring expertise inside.
Starting point is 00:23:55 So they care less about hype, right? Yes, but things that did matter to them, I think PostgreSQL started to ship around that same time. So 2016, 9.6, parallel queries came in. 2017, we had partitioning, logical replication. But by 2016, 17, the game already was done.
Starting point is 00:24:17 The game was done in 2014 when RDS appeared. Versus MySQL, yes, but I think versus Oracle and SQL Server, no. I think actually these larger companies... I have a different opinion. I observed in Russia, Yandex,
Starting point is 00:24:33 when I returned to Russia for like 10 months or so because of visa issues and economy issues after Crimea invasion. I returned to Russia and I was bored and decided to relaunch Russian user group and I was bored and decided to relaunch a Russian user group. And I asked which big company can host us. And Mailotario and Yandex both said yes. And
Starting point is 00:24:54 I chose Yandex. But Yandex said, can we also squeeze a couple of like 10 minutes of small lightning talk? I said, of course. What's the topic? And they said, how we migrated Yandex Mail from Oracle to Postgres. And then on the very first event, they gave this talk. It was amazing. And I remember in the first row also a guy from Avito was, and they said, Avito is running Postgres as well. These companies are big and they don't care a lot about hype. And they also, they are not like enterprise. They can count money, startups, right? So they can count money and so on. And in 2014, they already, they also don't care about this replacement for political reasons, replacement of Oracle and SQL Server in Russia.
Starting point is 00:25:39 They didn't care about it at all at that time. It was before, actually. They chose it already. And they chose it due to many time. It was before, actually. They chose it already. And they chose it due to many reasons. And they were big already. And they made decision in 2014, right? And I was completely surprised. Yeah. So it's kept growing since then, though, right? So I think there have been an increasing number of use cases that can switch to Postgres from Oracle or SQL Server, or maybe companies that switch for new projects
Starting point is 00:26:08 because the performance is on par for their workloads, because they now have the features they need. Features, performance. Let's agree with this. When Yandex was ready to discuss how they migrated from Oracle to Postgres, by the way, later they gave a talk at
Starting point is 00:26:23 PGCon in Ottawa, and those were very good talks, like next year, maybe 2015. But by that time, they were already working on it a couple of years. So they started in 2012. This brings us back maybe to this acquisition of MySQL by Oracle, like indirect acquisition, like chain of acquisitions in 2010. And also I consider Apple's decision to migrate internal things from MySQL to Postgres in 2011 is also some turning point, right?
Starting point is 00:26:56 Some signal, very strong signal, like some enterprise decided. But we also have a case when some big company, big startup, maybe one of the biggest ones migrated from my sql to postgres and a couple of years back right you know what i'm i'm talking about the other direction uber case from my sql to postgres and then back back was the one that got a lot of a lot of attention yes a lot of good criticism as well by by the way. Yeah. I had one more point that I think kind of takes us into the future as well, and that's that I think Postgres has, the last few years, it's released major improvements every year.
Starting point is 00:27:34 I'm not just talking about the fact they call them major releases. There have been big improvements every major version every year. I used to product manage products for SQL Server and Oracle, and they didn't come out with major versions every year. I used to product manage products for SQL Server and Oracle, and they didn't come out with major versions every year. They didn't. It was every two or three years at a push, and even then they didn't always include game-changing features. So I think it's very interesting the speed of the –
Starting point is 00:27:59 it's sometimes considered slow, and I think that's unfair, but the speed of improvement with Postgres, I think, is probably the highest of all the relational databases at the moment. And also quite well structured as well. It's impossible to see some feature. In MySQL, they had check constraints, which did nothing. They didn't check anything until a couple of years ago, they brought it in minor release. How come you add feature to minor release? It's impossible in Postgres, right? It's very well structured and principles
Starting point is 00:28:31 are followed. It's good. But there are many downsides as well. Maybe we should talk about it at a different time, about the process of development, and I will explain why I cannot participate in it. Yeah, that will be a fascinating discussion. Is there anything else on the future that you want to talk about?
Starting point is 00:28:49 Like, why? Future, nearest future is bright. Distant future, I have concerns. And because, like, one of those concerns is this development process. It's outdated through email. Only the strongest, the biggest will survive and i think it it's not attractive to many many people who could help i strongly disagree and i'm looking forward to discussing that with you well let's we don't have time to right now to dive into it but let's dive in
Starting point is 00:29:17 another time but i i have my own doubts so i'm still like big fan. Like last 18 years, I choose Postgres. I recommend choosing Postgres for everyone who deals with OLTP. For analytical, there are questions, but for OLTP workloads, Postgres is number one default choice. We have cases when a company grows to a multi-billion dollar evaluation. We have many such cases. We have cases when company grows to multi-billion dollar evaluation. We have many such cases with just single database. Yeah. It's scale up. Yeah, so easy.
Starting point is 00:29:53 And Postgres has so many features that others don't have. So just choose it. And as a bottom line, in my opinion, as with any startup, which became a great company, as we listen to their founders, there is, by the way, a good podcast called How I Built This by NPR. You know it, right? Like Starbucks story and many, many others, like Instagram story. People usually answer, why success? Combination of hard work and a lot of luck. So I consider this MySQL unfortunate timeline as a big lack of Postgres,
Starting point is 00:30:31 but also we have a lot of hard work. We are 100% sure. That's a really nice ending. Well, thank you everybody for joining us. Thank you again, Nikolai. And see you next week. Thank you. Don't forget to follow, comment, like, and provide you next week. Thank you. Don't Don't forget to follow comment like and provide
Starting point is 00:30:47 feedback on Twitter or anywhere else. We'll listen to you. We already made several episodes based on feedback. So we're ready to continue this practice, right? Yeah. Thank you so much. Thank you. Bye bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.