Postgres FM - Postgres year in review 2022

Starting point is 00:00:00 Hello and welcome to Postgres FM, a weekly show about all things PostgresQL. I'm Michael, founder of PGMustard, and this is my co-host Nikolai, founder of Postgres AI. Hey Nikolai, what are we talking about this week? Hi Michael, this is episode number 26, and that means since we didn't miss any week, and since each year has roughly 52 weeks, it means that it's exactly half a year doing podcasts every week. How good is that? Yeah, it's pretty good, right? Like I, when we started, of course, my goal was not to miss any weeks, like, because when you start missing weeks, then you start missing weeks then you start missing two weeks and like it's kind of but we are we have good study progress in in this term right so we started in june i guess or or in july

Starting point is 00:00:54 i think that's the end of was it beginning of july i'm not 100 sure well congratulations to you thank you and thanks everybody who has been giving us encouragement and feedback to keep going. Yeah, this is our fuel, definitely. And we receive it every week, which is very encouraging. So this was just some side note, but since it's the end of the year, let's maybe wrap up some summary of the year in terms of Postgres ecosystem and broader community. It will be probably very like a pioneered list, not ideal. We don't have the goal to have ideal list. And it's just something which, for example, I remember

Starting point is 00:01:34 and consider important, interesting, entertaining, and so on. So just how will we name it? Postgres 2022, most interesting facts and events. Right. Okay, I have a list of five items. You have also a few items to add. Let's start from my list, because it's my turn to define the topic. And I will start from the observation that Postgres ecosystem has very strong startups. And despite of issues with global market, I mean, fundraising and so on, this year showed very, very good results in terms of money raised by Postgres related startups and also new startups appeared. And I would mention, I would start from Ivan here. They are not strictly like Postgres only thing, but Postgres only company, of course, but they

Starting point is 00:02:34 started, as I know, from delivering services, first of all, Postgres and Kafka, then Extended and so on. So they are quite strong in Postgres. And they previously in late 2021, they achieved $2 billion evaluation and total raise this year achieved $420 million. It's quite impressive. And in February, two more companies reported that they raised more and they had new rounds and they achieved $1 billion evaluation. These two companies were Hasura and Timescale, if I'm not mistaken, right? Yeah. And also we have a bunch of other companies who also raised the load. For example, Superbase.

Starting point is 00:03:16 And we have new startups. Very recent one is Hydra, right? Hydra. How to pronounce it? I say the latter. I say Hydra. hydra how to how to pronounce it i say the latter i say hydra hydra because i think the yeah and they and interesting that also neon of course had released this year they didn't appear this year but they had released this year and we will talk about neon in my different item also a little bit. And I like how these three

Starting point is 00:03:46 three startups has not super base neon and Hydra. They choose their style of value proposition we are this but open source right? So super base is open source Firebase alternative. Neon is open source of Aurora alternative and Neon is open source Aurora alternative. And Hydra is open source Snowflake alternative. And all of them are built based on Postgres. And this is

Starting point is 00:04:14 super great. And this, I think, like in my opinion, this item in my list, this is the strongest, biggest, and most interesting because these achievements bring a lot of energy, not only money, but a lot of minds and a lot of new small projects, open source projects and so on and so on. This is great. This shows how Postgres community grows and Postgres ecosystem grows.

Starting point is 00:04:41 And this is new heights for postgres ecosystem. But it also, I think, in my personal opinion, this also means that postgres community is much bigger than postgres project itself, because they are different companies, different organizations, different people. And postgres is just in the center, right? But there are many things around and they are growing in terms of business, in terms of user base and so on. What do you think? Yeah, it's really cool. It's an exciting time to be in the Postgres space for sure.

Starting point is 00:05:18 Something I don't think I realized coming in that a lot of this money would be around, a lot of them are hosting providers, which is super interesting. That seems to be where a lot of the or at least that's how they're monetizing. It's kind of that's the cloud offering that they're charging money for. Hence, they can be open source in a lot of the cases. But it's really cool for Postgres core as well. I think a lot of them are hiring maintainers or trying to hire people to work on Postgres Core. I think it's really healthy for that group to be gradually populated by people

Starting point is 00:05:54 from different backgrounds. I think in the past, it's been a mixture of people in the community, but a lot of consultancies and a lot of industry, but not so much actual hosting providers contributing back to Postgres. And I think this year with the companies you mentioned, but also even with the likes of Amazon hiring Postgres core team members, I think it's an exciting time and hopefully that group will be able to achieve more

Starting point is 00:06:23 with more resources behind them, with more money involved. Yeah, it's good that you mentioned. I think this is just the model that works best today to provide cloud offering, like managed Postgres, but with some additions or very modified Postgres version. But it's just the model that works best today. And indeed, RDS is, I think, still, of course, obviously, the leader here, but others try to be somehow different, to add something and so on,

Starting point is 00:06:57 and they are interesting in various senses. Not all of them are just hosting. If you talk about Superbase or Timescale or Hasura, they're not just Postgres. They provide Postgres plus something, like time series extension. It's extension, but it feels like a different database, maybe I mean, timescale. Or Hasura and Supabase, they provide API or GraphQL out like

Starting point is 00:07:33 immediately you have speeding up development. That's interesting. But indeed, yes, the model works best now. It's cloud offering. So we have a lot, a lot, a lot of options to choose from. If you want just Postgres database plus something, maybe.

Starting point is 00:07:53 It's a headache right now already. So dozens of options to choose from. You need to compare and it will take time. But it's good. I think it also emphasizes my idea that Postgres is not just a single something. It's very big and consisting of very different parts of Postgres, I mean community ecosystem. It consists of many different things. And I see it as a huge bazaar. It's indeed open source.

Starting point is 00:08:25 Like if you recall this Bazaar vs. Cathedral from Richard Stallman, right? So Postgres is indeed a huge bazaar and have a lot of players. It's great. And also I wanted to mention, if we mention RDS, for example,

Starting point is 00:08:41 also wanted to mention that this year I changed my mind about Google Cloud SQL because previously they had only like eight knobs and it was, it was feeling, my feeling about this offering, their offering of Postgres was it's quite weak. I cannot run serious project. This year I completely changed my mind. Maybe because Hanoi Crossing visited our Postgres TV episode. It was a great episode about vacuum. I can't recommend it enough. But also, obviously, it's improving in terms of product and also interesting.

Starting point is 00:09:18 Okay, this was number one item in my list. The second item is actually our podcast started this year, right? So this is super important news for Postgres community and ecosystem, I guess. And also Postgres TV started to become more active as well. So we, first of all, those who don't maybe know, we also publish our episodes on YouTube. Sometimes it's more convenient, sometimes less depends. But besides our wonderful podcast episodes, positive is a YouTube channel, it exists for for a few years, but this year, it started to be much more active. We invite various guests. Sometimes

Starting point is 00:10:02 it's interview, sometimes it's to redo the talk, to record it and distribute it. It's called Postgres Open Talks to provide good talks to wider audience. Also, we have a good collection of playlists, like for example Postgres Backups or Postgres Replication

Starting point is 00:10:20 and so on and so on. And those are materials from other channels. And I think it's already approaching 500 videos about postgres so so go check out postgres.tv it will redirect you to youtube and you will see a lot of interesting stuff always to something to learn about i i actually when i invite some guests i learn a lot this is I do it also, like it's also a selfish goal. It's not only for community reasons to share, but also like, okay, I have like one hour

Starting point is 00:10:51 and I can dive into some topic as deep as possible because the guest who is working in this area may be one of the best experts in this area in the world, right? So it's really great. So TV, FM, and maybe other others also they also publish materials online these days maybe covet helped and realize that online distribution online events they also matter because not everyone can come to to conference offline and also you can you can stop listening pause and then return next day if you need to

Starting point is 00:11:27 interrupt right so it's also much more convenient to consume information in such way of course offline events they give benefits from live communication but online events are also good and recordings are also good so this is this is item number two what what do you think about it yeah um really i like it a lot i think there's been not just us i think there's been a lot of people providing a lot of good postgres content this year and i wanted to give i think the postgres tv thing is if it's worth a special shout out that you do quite a few of the sessions live so if people they're quite they're from quite advanced topics but if people have questions if people want to be able to ask

Starting point is 00:12:09 questions of those experts as well they can join live and ask you them in the chat and you'll you'll pass those on so that those are quite a unique opportunity i think that you often don't get unless you can go in person to a conference so that's really great a couple of others i wanted to give a shout out to were tobias petri who's doing more beginner friendly tips on twitter and his website sequel for devs that's really gained a lot of momentum this year he's doing great and hussein nasa who has a really popular youtube channel for back-end developers i'll link that up as well. I think he does fantastic work and explains concepts really simply. And then also wanted to give a shout out to the Postgres Weekly newsletter. I think still every week, I know it's been going quite a few years now,

Starting point is 00:12:55 but every week they provide a really good newsletter for all kinds of topics around Postgres. Yep, agree. Okay, item number three, sharding. So in the beginning of the year, I had a strong impression like we definitely need sharding. And MySQL has VTS. A lot of big companies who use MySQL, they use VTS. Postgres likes it. And it was like we have Cytos. I mean, Postgres ecosystem has Citus,

Starting point is 00:13:25 but it's only partially available because open source community edition, it doesn't provide, for example, online resharding. It was before. I mean, I'm discussing what was in the beginning of the year. But this year changed it. First of all, Citus published everything as open source, like Microsoft published it because it's part of microsoft now

Starting point is 00:13:47 and this is good news so it's fully available as open source great and a couple of young projects in this area started first is spqr guys who i know the Russian-speaking guys, and with very good experience, they try to build very simple yet powerful sharding system that exists on GitHub. And then also some guy from San Francisco, I think, I don't remember the name, sorry, this system called PgCat. I'm subscribed to both projects,

Starting point is 00:14:23 and they are very active in terms of development so interesting to check them spqr is in go and pgcat is rust and they are adding almost every day something this is this is super great to see and pgcat actually originally the idea of sharding there was pgcat was created as I understand as some proxy like replacement for PgBouncer probably, connection puller but sharding was there originally only like

Starting point is 00:14:54 some very simple explicit sharding you say I need to route to that node, that's it but as I understand later it became more complex and comprehensive and so on so now it's already something interesting to if you need sharding but of course only few projects need sharding because many companies showed you can grow to billion dollar evaluation having sas or e-commerce and still have one or few monolith

Starting point is 00:15:28 big postgres databases or split to services big services but at some point any company if it needs to grow a lot it will need sharding some service will need to be sharded so sharding is needed it's just needed only to a few users, but it's still needed. And we have some progress this year. It's not perfect, of course. There's no obvious default solution answering all questions, no.

Starting point is 00:15:54 But there is a very good promising movement this year. Citus plus these two projects. Agreed? Yeah, I think PGCAT sounds particularly exciting and really cool that it's the idea of putting it in a pooler with, I think it has also load balancing and failover support as well. That seems really smart to me that you could put it in place and benefit from those features before you need sharding and then make use of sharding later if even when you need it that sounds very sensible to me right and this middleware should be very light in my opinion if you if you aim to

Starting point is 00:16:32 work in otp context because if you use for example postgres there it's also possible but it's quite heavy and it adds latency overhead. So, of course, worth testing each particular case, but having some... And MySQL VTS also has proxy layer. It's called VTGate, and it's good if it's very light. The problem is how to support all Postgres syntax in this case because when you start from zero, Postgres syntax is this case. Because when you start from zero, Postgres syntax is quite rich.

Starting point is 00:17:08 Yeah. A couple more things I want to mention on this front, actually. There was a really good Notion blog post. I'm not quite sure if it was the beginning of this year or a little bit earlier, but they implemented, I believe people refer to it as application side sharding. Not a great acronym.

Starting point is 00:17:25 Yeah, yeah, yeah. Well, I promote this approach, this acronym. It's application level sharding is less offensive, right? But yeah, the Vitesse stuff is super interesting. And I think this, for me, is actually a real threat to Postgres, not necessarily on a technical level, but on a marketing level. I'm getting kind of flashbacks to the MongoDB days where people would be offered kind of web scale. You know, you're only a startup, but you'd never have to worry about scaling in the future. And suddenly we've actually got…

Starting point is 00:18:04 And it doesn't matter. It doesn't matter that one node behaves 10 times worse than one node in progress. It doesn't matter, right? Yeah. But people don't take that into account when they're starting up. And actually, they don't notice

Starting point is 00:18:17 because the performance when they have very few users is fine. And it is fine. The solution is fine. But I'm seeing startups that i that last year i would have expected to be on postgres picking vites and my sequel because of planet scale have really nailed their marketing and i think also have got some really good developer friendly features but mostly i think it's that same marketing angle as mongo had and mongo i think postgres had a really good answer too with jason b support a little while later and that i think was was an excellent move i'm really looking forward to seeing

Starting point is 00:18:58 what we can do in the postgres ecosystem kind of this time and see if see if that um because i think postgres is built on better fundamentals than my sequel and has a lot of features that and that i love so i'm really interested to see how we address that yeah and when you talk about some start new startups startups which choose vtes and my sequel is it like very very, very new startups or some very new this year or so? Yes. Well, interesting. Interesting. Well, okay. I don't see such startup. I see constant discussions about like, oh, it would be good to have Vitesse. And Vitesse, I think later last year, like a year ago, they announced that they don't have plans to work on Postgres support themselves, open to community contribution.

Starting point is 00:19:48 But in my head, I don't see. Maybe I have some filters. I don't see companies who these days would choose MySQL without strong pressure from I don't know who. It's strange. But I can imagine. If, for example, some CTO wants to be protected in terms of growth risks and how to handle growth, and for this CTO it doesn't matter that you can handle dozens or thousands of transactions per second

Starting point is 00:20:16 using a single simple cluster on modern software, actually in already hundreds of transactions. And when I say transactions, I don't mean already hundreds of transactions. And I, when I say transactions, I don't mean read-only transactions. Of course, I mean, social media like traffic, like 90% are selects, right, or 80%, but others are rights. And you can handle perfectly, you can scale up to, like, you can handle 100,000 TPS or a few hundred thousand TPS especially if you take for example modern epic epic processors from AMD epic Milan epic Rome as well so Rome is previous generation Milan a lot of CPU power, a lot of memory, and you run like 10 nodes and clusters, a lot of replicas. It can be very, very, very powerful. So you can postpone this case when you need to be sharded. And also

Starting point is 00:21:20 microservices. People sometimes choose microservices and postpone the need to shard their databases, right? Because databases become smaller. So, yeah, I can imagine the reasons behind the choice to choose Vitesse. But is it Vitesse or PlanetScale, by the way, speaking of managed services? They are, the startups I'm seeing are choosing PlanetScale. I don't even think they would know. They wouldn't necessarily know what Vitesse was, I think. It's different because, as I see,

Starting point is 00:21:53 PlanetScale now has two value propositions combined. One is web scale, PlanetScale sharding, and another is we will handle your changes without problem. They call it database branchinging but it is not but they provide you like ability to change schema and then have some similar to pull request merge request concept deploy request so your colleagues review change and then you they run change online without downtime and you don't think like you don't need to spend efforts this may be also a big motivator because postgres in postgres it's a big headache to to avoid locking issues and so on this is what i meant by their developer friendly features and i think

Starting point is 00:22:38 that's a similar value proposition that mongo were offering back in the day as well. Yeah, interesting. I think many Postgres companies, Postgres-related companies also already working on this because, of course, yeah. So in this case, let's proceed to next item, number four, and this is database branching is coming, right? So database branching is coming, And originally, I think last year, it was PlanetScale who started to use this term. And in my opinion, in the wrong direction, because they talked about branching schema only. So their branching, database branchingcalled data branching, which is something strange to me, but it's another story. But also several other companies in Postgres ecosystem, because we are mostly interested in Postgres ecosystem, they started to talk about database branching. And for me, of course, this is very close to home, right?

Starting point is 00:23:39 Because my company develops the Database Lab engine, which provides thin cloning. And actually, yesterday, we released first Alpha, which supports database branching. And branching is not cloning. There are differences. Another story, but I'm glad that many companies already think in this direction. They just want to work with, like, the original need is to be able to work with databases and non-production with bigger size databases. Same as with Git. So you have independent database.

Starting point is 00:24:14 You can have multiple independent databases and you can experiment, develop, test, and so on. And cloning, which we had for a couple of years already, supported it. But branching allows you also to have some progress and commit it, like to have snapshot and then share with your colleagues. They can branch from there. So it's like nested cloning already. Nested cloning with commits, it's branching. This is what happens. If you look at Git, it's very close to it.

Starting point is 00:24:52 And I would like to mention Neon, which released branching a few weeks ago. And it's also already publicly available. It works very well. I think they don't have commits yet, but I'm sure they are already thinking or working on it. It will be natural. But overall, this year, I think something like big movement in this direction started. So I predict in future few years, we will see a lot of progress in this area. So a lot of roadblocks in development and testing when we don't have big database to play with when we need to develop or test it will be like available to most teams i think in future thanks to several companies who work on this area

Starting point is 00:25:34 including mine of course so i've seen super base also mention branching in think, maybe in one of their funding announcements. In their roadmap it exists, yes. And other Postgres-related things in this area, I would have expected if Heroku had kept investing in that or if they'd kept innovating on the lines that they were going, I think they would have done this. They have a very good concept, so-called preview apps. It's basically environments which are deployed by request, explicitly request or inside CICD pipeline.

Starting point is 00:26:19 So imagine for each branch, when you have Git push, CICD pipeline is running, and Heroku can deploy preview app on specific URL, and it has adjusted code from that branch. But there is always a problem what to do about databases, and naive approach is let's have one database for all. But there will be conflicts, of course. And if you want to delete something

Starting point is 00:26:47 or to change schema, especially, like a lot of conflicts. And branching, database branching and thing cloning, it's exactly what can fill this idea to complete state. So you can have preview apps or environments as a service, not only in terms of code, but also in terms of data. And that's exciting. It's super cool. Yeah. And then it seems like the natural successor to Heroku Postgres is going to be Crunchy Bridge. They've got a lot of the same team involved. So I'd be interested to see if they're planning to do anything on this next year.

Starting point is 00:27:27 Yeah, it's interesting. I didn't see anything about it in this direction yet, but maybe there's something. I would like to know, to learn about it. But anyway, I see obviously several companies looking in this direction. I hope finally this problem will be solved in the next five years because it's something that should be solved. And it will unlock problems with, for example, some product manager wants to try it with data, right?

Starting point is 00:27:58 Or you develop multiple things simultaneously, or you have multiple developers. Even if you want to optimize SQL to check performance it's also a way to go because without data performance is different so you cannot check your performance

Starting point is 00:28:16 if you don't have enough data and this unlocks many many things like that and eventually it shifts a lot of stuff to left in terms of DevOps infinite loop. So it's definitely shift left testing, shift left activities.

Starting point is 00:28:33 I'm excited about this. Yeah, if anybody's new to our podcast, we do have an episode specifically on database branching. So we can link to that as well. We have two episodes, I think, right? So from time to time we touch this topic. This is my favorite topic. So I think we will continue doing this, I hope.

Starting point is 00:28:51 Because there are interesting things about performance testing, for example. Okay. And my last item from my top list, top five list is actually Postgres. We mentioned that in the beginning, Postgres is everywhere. All cloud vendors actually Postgres. We mentioned that in the beginning, Postgres is everywhere. All cloud vendors provide Postgres.

Starting point is 00:29:10 And even Oracle started to provide managed Postgres service. So like I knew about SAP, for example, it was like a few years ago. So everyone, literally everyone. And for example, also interesting news, like Google released new database, cloud database. And for example, also interesting news, like Google released new database, cloud database called AlloyDB, which has something from Postgres. Like, of course, it's like can be considered as from Postgres family of database systems. And it's also super

Starting point is 00:29:38 interesting. It has very interesting concepts to handle HTB, hybrid transaction processing. So it's like analytical and operational processing combining. They have a combination of row store, column store memory. So it's interesting. I haven't tried it yet. I still in my to-do, but sounds super interesting. And overall, as I've said already, we have dozens of options to choose from. Okay, you need Postgres, but which one?

Starting point is 00:30:08 So many, right? So choosing the right option is not simple. It's not a trivial task. It's not like before you download binaries or you download source code, compile, make, make, install, and you're done, right? No. These days we have so many options. It's not only like managed versus self-managed.

Starting point is 00:30:29 If you say managed, so many options. And self-managed as well. There are many Kubernetes operators already, like maybe five quite active projects, which all of them also interesting. More than five, actually. More than five. There are other use of them. So if you five actually more than five there are other use of

Starting point is 00:30:46 them so if you want to self-manage question will be old school self-manage or kubernetes right because there are many operators and they provide they bring things like backups replication monitoring out of the box they are eventually will compete with managed services. And managed services should be afraid of these operators, of course. So this is it with my list. Do you have something to add? No, I think that was great. A couple of smaller things that probably don't warrant a big discussion,

Starting point is 00:31:23 but I thought it was worth an extra shout-out to the team behind the Postgres 15 features, all of the people that contributed to that. I know it was more of a kind of lots and lots of smaller features this year, but I really liked some of them. Who mentions this? We all know every year we have a new Postgres version. It's not in use already, right? I'm joking, I'm joking.

Starting point is 00:31:45 There's Merge there, Merge and other things, right? Yeah, but I don't want to take it for granted either, right? Like it is every year that we do have a major version, but that's not necessarily guaranteed to happen forever. It's really cool that we do. Also, I think there's probably not so much for this past year, but I think next year I think we'll see some really interesting serverless, or I don't really like that word,

Starting point is 00:32:08 but that seems to be the standard word, use cases. If people can use Neon to send Postgres queries in an almost serverless nature, I don't understand how it works, but that could be really cool, I think. And then the other interesting project I'm keeping an eye on is Aureole db which looks like it the whole aureole db also has branching in the roadmap

Starting point is 00:32:31 and they have uh there is uh in in github there is a file like markdown file describing the concepts and they already have copy and write checkpoints so if they implement database branching it will be Postgres native database branching so you don't need to think about separation of storage from compute you don't need ZFS like we do you have it inside Postgres

Starting point is 00:32:57 I'm super excited about this as well Yeah, very exciting but yeah, I think you've done a great job of wrapping it up it's been i think it's been a really good year for postgres and looking forward to another one next year i agree i agree thank you well everyone have a good new year much better than last one this one and see you soon thank you so much absolutely happy new year bye now

Postgres FM - Postgres year in review 2022

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.