Postgres FM - Rails + Postgres

Starting point is 00:00:00 Hello and welcome to PostgresFM, a week's show about all things PostgresQL. I am Michael, founder of PgMusted. As usual, I'm joined by my co-host Nikolai, founder of PostgresAI. Hello, Nikolai. Hi, Mike. And today we are also joined by a special guest, Andy Atkinson, Andrew Atkinson, as we were saying earlier, who has been a software engineer for 15 years at the likes of Groupon, Microsoft, and Fountain, a user of PostgreSQL for the past 10, speaker at several conferences, and now a published author with the brand new book, High Performance PostgreSQL for Rails. It's a pleasure to have you on, Andy. Welcome. It's a pleasure to be here. Thanks for having me, guys. Really fun.

Starting point is 00:00:40 Nice. Well, I'm going to let you in on a little secret here that I didn't tell you beforehand. You are actually our first guest who has been requested by a listener. So somebody specifically asked for you to come on, which is awesome. Wow. And with the topic Postgres plus Rails. So that's what we're going to be talking about. I guess asking my mom to do that really worked out. No, that's great to hear. Yeah.

Starting point is 00:01:08 Awesome. Well, yeah. Who better at the moment? moment you know you've got this stuff fresh in your mind so yeah in terms of where to start i thought it'd be really interesting to hear from from your side how popular choice is postgres for rails like what's it competing with and yeah how's that been changing over time? Yeah, good questions. Well, the ORM, the object relational mapper in Ruby on Rails is called ActiveRecord, and it supports Postgres and MySQL slash MariaDB and SQLite. So whenever you're generating a new Rails app, that's one of the first decisions you'd make, I believe, by default at SQLite. But most folks that have apps in production, I still think it's usually they're using Ruby on Rails. They're usually working with MySQL or Postgres. And kind of prepping for this,

Starting point is 00:01:51 there are a couple of surveys where folks have been responding to surveys about how they deploy their apps. And there was a 2022 survey from Planet Argonne, which I had pulled up here. And there was about 2,600 responses. And from 2014 onward, Postgres has been the most popular relational database with Rails apps. And I guess what I've seen is kind of this shift, you know, starting in maybe the early 2010s, where I think in large part, thanks to Heroku having Postgres support and Heroku being a really popular choice. EasyPlace, as you guys have talked about it before, made deploying your Rails app onto the Internet much easier. Pretty much a git push command and took away a lot of operational toil that a lot of folks might otherwise take on themselves.

Starting point is 00:02:40 And I think a lot of folks may have switched to Postgres just because it was part of the package there. And maybe they weren't, you know, deeply using relational database features. And so I think there was a lot that had to do with Heroku. And then, of course, also as you guys have talked about, Postgres is gaining a lot of features, a lot of the big Ruby on Rails, the companies that famously use Ruby on Rails, like GitHub, Shopify, Basecamp, that kind of started in the 2000s. They tended to choose MySQL at the time, and they've stuck with it for the most part. They maybe are using clustering solutions and things, but companies that started more in the 2010s or mid 2010s, I feel like a lot of times they're, they're running Postgres. And so I'm certainly, you know, now I'm doing independent consulting with Rails teams that use Rails and Postgres. And I'm certainly, you know, I'm clients are

Starting point is 00:03:41 coming to me that are using Postgres. So for whatever that's worth, there's definitely companies out there using it. Yeah. I was going gonna ask nicolai as well and then i realized that all of us have a super biased sample but that does that definitely does make sense that data you mentioned that survey is really cool as well i saw the the question below was about which one would you like to be using in production? And it changes just two years earlier. So you can see that there's that desire and people wanted to switch or wanted to use it in production. And then it did eventually take over a couple of years later and hasn't lost first spot since, which is nice. You mentioned the RRM already. Should we dive into that? How are folks generally

Starting point is 00:04:24 sending queries across to Postgres from Rails? And where's Postgres? If Ruby on Rails, then Postgres is under Rails or where? Yeah, so with ActiveRecord Oh yeah, where is it? Let's see.

Starting point is 00:04:39 Where's the place for it? On again, isn't it? Ruby on Rails on Postgres. Yep. The basis for Rails. Okay. Modweb. Yeah. As a Postgres enthusiast,

Starting point is 00:04:54 I tend to think a lot about the database operations and the relational data model and almost like Rails is kind of wrapping. Developers are writing Ruby code that's doing database interaction and other, other things. But I almost tend to think about things now more from a database first perspective. So it's almost like Postgres with Ruby on rails or something. But, uh, I think, um, you know, depending on how you tend to most, most developers, I think though, it's, it's the other way way around they're mostly most rails teams i've worked on they're writing active record code in the early days

Starting point is 00:05:28 i've been around in ruby on rails for a long time almost 15 years with some divergences but in the early days there actually was a little more competition around the orm some of these projects are still around but sql sequel and data mapper some other ORMs that were written in Ruby and allowed developers to create queries and evolve their schema with an alternative to ActiveRecord, different pros and cons, performance benefits, maybe that kind of thing. But ActiveRecord sort of steamrolled everything or consolidated everything over time, as can happen with open source projects and so most developers then in my experience you know they're thinking you know they're working more like with objects and interactions and algorithms and they're building you know background jobs and

Starting point is 00:06:21 working with message queues and things like that. So they're not necessarily writing a lot of SQL. And as I was saying, most Rails teams, they're going to write ActiveRecord. So ActiveRecord then generates SQL and it can do things for us as queries are generated, like annotate them and say where they're coming from in the app, which is nice. If we look at queries within Postgres and we want to kind of go backwards and say, well, where did this come from in the app? And then another big way ActiveRecord is used, of course, is it is the de facto schema evolution tool as well, which is interesting because I think Rails developers just kind of take that for granted. Like, of course I control the schema

Starting point is 00:06:59 and of course I could ruin everything with a bad migration or whatever, you know, like an incremental schema change. But that's not always the case when I've worked on other teams. Like I've worked at, when I was at Groupon, actually the main applications I was working on were Java and there were DBAs and we used Postgres. But sometimes due to the scale there, I mean, usually developers, unless you're working on a small microservice, maybe that you might have ownership of, if you're working on any

Starting point is 00:07:31 of the core services there because of the scale of the operation or how the company was set up, usually there were DBAs that would do riskier database changes and that sort of thing. So you might actually just kind of provide them an example of the change that you want, and then it might just be done and they let you know, maybe by a ticketing system or something. So I think, you know, it's interesting, like if you come to, you know, your background might be where as a Rails developer, you may at a large company, you may not do as much of the direct schema control, but certainly for a lot of small to medium companies, that is what developers do as well. So they do need to be informed about good schema design, of course, creating indexes, constraints, all those sorts of things.

Starting point is 00:08:15 And that's where it kind of, that's where then the ORM can start to be limited in its scope, you know, and you got to kind of go beyond and, like, okay, well, what are the capabilities I have at my disposal within Postgres? And at some point, you need to switch from schema RB to structure SQL, right? Yeah, Michael asked about that. I saw several companies switching. Like, why do we have schema RB by default at all? Because everyone is switching at some point while growing, right? Yeah, well, and I kind of agree with you, but I didn't always feel that way.

Starting point is 00:08:50 It's actually an interesting little microcosm of the whole spectrum, I think, of being more of an application developer, working within your programming language, which is Ruby in this case, and kind of thinking of the relational database as just a thing that you don't really directly work with, but it's just there to like store your data and access your data. And then on the other extreme, kind of like I was saying, like you're thinking of completely in the database operations, you're thinking about the schema, the queries, the indexes, you're running query plans in your head, you're thinking about, you know, how do we have this high growth table. Should we use partitioning? Should we split it out? Like that's the opposite end of the spectrum. And

Starting point is 00:09:29 yeah, I think so the, to briefly for anyone that's not familiar with the schema RB and the structure file by default in Ruby on rails, as you make changes, let's say you add a table or you add an index, et cetera, you generate what Rails calls a migration. And that would be that incremental change. So here's the new table definition. Here's the index definition. And it's expressed in Ruby, but of course it generates SQL statements to run and they run against your local database. And then what happens is your local database is then dumped. It's a schema definition or its database definition is then dumped into a file. And by default, that's a Ruby file. So it kind of like translates it back to Ruby and

Starting point is 00:10:11 represents it as Ruby. And it kind of insulates you from the SQL, but really in Postgres, what's happening is it's just running, well, I should say, if you move to the SQL form, what it's doing is it's running PG dump. And it's basically just taking the raw PG dump output and putting that into a SQL form of the file with a little bit of extra stuff at the end, which are those migrations that you're creating. Each of them have a version. So it dumps those versions as insert statements into the end. But otherwise, it's basically just the PG dump output. So, yeah, what happens a lot of times for teams is they start out with the Ruby schema file. They start to use things that are beyond what it covers. And so they might use like a materialized view or like a triggers.

Starting point is 00:11:00 Triggers. Maybe, right? Yeah, so stuff like that. And then what can happen, too, is the open source community can, can spring in and can say, Oh, we can fix this. We can, we can actually extend the Ruby form of the file so you can keep using it, but you got to also run this Ruby gem with your app. And now we can express triggers in the Ruby file, but I kind of tend to just encourage like, okay, at that point, just switch away or, you. Or maybe even just start that way.

Starting point is 00:11:26 But just switch to the SQL file because it's going to give you the highest fidelity information. It's essentially what pgdump is. And if some database guy is performing code review, it's much easier to see the changes in structure SQL than in this language you don't fully understand. And then you, like, basically, if you do it in the project with SchemeRB, you end up asking, provide me full log or full dump or everything. Right.

Starting point is 00:11:57 Yeah, not to get too philosophical, but as we're on a team building an application, the code is kind of our method of communicating with the computer and with the team building an application, the code is kind of our method of communicating with the computer and with the team members as well, you know, as we express a domain concept, or in this case, as we're expressing the database design. So if your team has people that are more, you know, database only people, then giving them a Ruby file is, I mean, they would have to learn it. And it doesn't actually

Starting point is 00:12:25 have all of the information in it often. So yeah, it's a little, it's a little weird. Um, however, as I was writing the book, I was thinking about a lot of these kinds of things too, cause I have been on a lot of different teams and I was kind of trying to not be too preachy about one thing or the other, just kind of saying like, well, if your team is mostly Rails developers, you know, just know that there are limitations to the Ruby schema dump, that there may be some information that you're not getting in here. However, you know, and then I sort of made the pitch for the structure file maybe. And, but yeah, that's a good point. Like if you're working with a DBA, you know, you can hand them a dump file and they'd have a clear understanding of what's in

Starting point is 00:13:01 the database. Yeah. I'm curious if things would be changed with LLM and so on, which can easily translate or something. But it's a different story. Yeah, that's an interesting idea. Do you have understanding why do we need to keep two tracks of changes? So first is each migration is kind of deep. And if we keep all of them, we always can build from zero, from ground. We can build our final schema, right?

Starting point is 00:13:28 Just in steps. But also we keep this structure SQL or schema RB doesn't matter, which is a snapshot of the latest version. Why do we need both? I was very curious. I was trying to understand this all the time. Why do we need both? Yeah, well, some people even say, so as your

Starting point is 00:13:47 incremental changes contribute to the single file representation, schema RB, like if we're talking about the Ruby one, some people say that the schema RB, it is basically the same as the incremental changes. It's just everything at once. It's kind of this intermediary Ruby representation of your database's structure. And you don't actually need both, really. The incremental changes do serve as the log, which you'd otherwise have to, you could get through Git. Like if you looked at, you know, if you pulled changes as a developer on a team and you noticed that the schema RB changed, but there wasn't the incremental file that represented the change you could look into the git history and try to discern what happened but i guess the file kind of makes it easy and also when ruby on rails was started

Starting point is 00:14:35 we didn't really have git so we were stuck with like subversion or something else so you know i i think uh git made doing that kind of version-to-version investigation easier, in my experience anyways, like being, I think, a better version control tool. But yeah, I mean, technically you don't need it. And so what some teams do after a while is they just throw away the incremental ones. And so Ruby on Rails also allows you to just load the schema. If you're setting up a brand new machine, you're a new developer on the team, you can just load directly from that single file, the schema RB. So that would be like a structure load command.

Starting point is 00:15:10 And that would work the same regardless of which type. So Michael, you were asking about like a beef. I think there's, it's like not a real beef. It's like a, it's like a faux beef that programmers invented, I think, but it's kind of like, do you want to preserve the beauty or elegance of the Ruby code? Or do you want to just like, you know, throw it away for the ugliness of a SQL file? Maybe what some Ruby programmers might say, or what I might say is like, I actually think the SQL file is pretty, it's elegant. You know, it's got, like I said, it's like high fidelity. It's all of the information, you know, and you can even customize it. Active record allows you to pass flags through to PG dump. So if you need even more information or you want to change the output,

Starting point is 00:15:51 actually, I did think of, there is one reason maybe the beef emerged is if you have two developers running slightly different versions of Postgres. So maybe the same minor major version, but slightly different minor versions. Postgres changes the PG dump format over time, different, like it could be the ordering or I think it's usually the ordering. And what can happen is you get these annoying diffs as a developer or two

Starting point is 00:16:16 developers are essentially it's noise. It's not a real meaningful change, but it's like, Oh, the triggers we added now they're on top of the constraints. And before they were after the constraints or something like that, I'm just making that up. But I have a solution for that though, too. Lucas fiddle created, this is several years old, but it's a Ruby gem that basically does some post-processing on that

Starting point is 00:16:39 dump process and does like an explicit ordering of all the content. So that if you have a team that uses that tool, then you should have consistent ordering amongst each other. And this was for the.SQL? Yep, that's for the SQL version. Makes sense. Yep. Cool.

Starting point is 00:16:55 I could see Nikolai exploding as you were talking about how beautiful the Ruby was. So for our podcast listeners, I felt like that couldn't go unheard. There are two of the ugliest languages in the world, JavaScript and SQL. They also happen to be the most popular ones. That is interesting, isn't it? Yeah. Or even C, right?

Starting point is 00:17:18 Like C is still, you know. Another thing is that Ruby was created, 1995, same as Java, JavaScript, and Postgres 95, and what else, right? So many things, same year. Just random fact. Did you know? Yeah. So you mentioned way back about some limitations of the ORM. I think it's worth talking about. ORMs have a bad reputation if you talk to database folk.

Starting point is 00:17:50 Is it worth talking about some of the more common issues there or ways around those issues or limitations when you need to break out, that kind of thing? Sure, Yeah. Well, that was actually a pretty big premise I wanted to cover in the book is to show people, show readers like there are these other things in Postgres that you may not be aware of if you've limited your kind of research area to just what's supported in ActiveRecord. I mean, one that comes to mind right away is table partitioning. I mean, there's not really any support in active record for table partitioning. It doesn't mean you can't do it with the Rails app, but you might run into a couple of small issues, especially prior to recently composite primary keys became supported in active record. But I was performing a table partitioning project on a Rails, older Rails project about a year ago. And there were some issues with assumptions code would make about primary keys, for example, just like there's only one column that is the primary key definition.

Starting point is 00:18:52 But yeah, so it's worth noting that if you have a high growth table that you want to look at table partitioning for, you would be likely doing that a bit on your own with writing SQL commands or kind of maybe researching like a Ruby gem that you would add into your Rails project that has done some of that work for you around creating the table structure or making sure your queries have the partition key column or things like that. Yeah. I mean, definitely active record. I think ORMs generally try to bring some of what the database does into the application. I think that's fair to say. For example, triggers in Postgres that have different trigger types that fire at different times and that have different scopes. lifecycle callbacks that are similar in their purpose where you might want to persist an object, which would be taking an in-memory Ruby object, turning it into an insert or update statement,

Starting point is 00:19:53 basically. But you could intercept that before that event happens and do something within the application code with an active record callback. So you could have a before save or a before commit. And some developers might not, then they might try to design things within that scope where maybe a trigger could be a solution where like if two applications were sharing the same database, not a great idea sometimes, but maybe you'd want to put the trigger in instead of having it at the application levels, you have to duplicate code or that kind of thing. What about transaction control in this case? Is it inside one transaction and Rails controls this or you can do trigger outside of transactions so it's not guaranteed that it will be consistent? Yeah, ActiveRecord supports

Starting point is 00:20:42 a transaction concept that maps pretty much straight up to Postgres begin, commit, or rollback transaction. And I actually recently learned you can do transaction control if you want to change the transaction. The isolation level. Yeah, the isolation level. Yes, thanks. If you want to change the isolation level, I actually just recently realized ActiveRecord supports that. You can pass it in as an option, and then I kind of verified it myself and made sure that the SQL statements were generating those things.

Starting point is 00:21:19 Yeah, I guess if you do want to have some kind of nested transactions or if you want a little more control, then I think part of what, you know, you certainly might want to take that on yourself. And, but I wanted to actually then tie that into, I think part of why active record has been successful is it doesn't, at a certain point you might just say, well, I just want to write SQL,

Starting point is 00:21:42 but I still want maybe some active record objects to work with. And active record doesn't prevent you from just writing SQL within your active record code. And so that can be beneficial if you want to just say, well, I'm just want to write my own SQL statement here for a query, or I want to even just use it as kind of an interface to run some commands, like maybe, you know, opening a transaction, although that is supported directly. But you can write SQL commands within Active Record as a string that then get invoked or get sent through the Active Record connection pool, et cetera. And then what you can do is you can leverage then taking a result set and taking advantage of mapping all of those database types into your Ruby object types and have an object to work with. Or you can even use primitive types like having a simple lists and strings and that sort of

Starting point is 00:22:31 thing. So it's kind of like it allows you to, has these nice helpers to do things like, you know, perform joins and limit fields and things like that. But it also doesn't prevent you from just saying like, I want to, I'm'm going to take over here i'm just going to kind of write my own sequel within this huge recursive city i want to write right yeah you could do that yeah there's yeah what proportion of the time do you tend to see for yourself and for others do you find that you're yeah you're using one versus the other yeah it's interesting because like 10 years ago, I did see more of, I'd say, writing plain SQL within ActiveRecord. And I think it was because a lot of folks that were using Ruby on Rails then, they had more that were more senior engineers.

Starting point is 00:23:16 They had experience with working with SQL and other databases, and it's just how they worked, you know. And ActiveRecord was also more limited in its capabilities. And then I think there's been this blend maybe over time where this interesting, like parallel tracks of things happening. One could be active record has gained more. It's continually adding more and more helpers. They, the documentation refers to it as helpers, but like, you know, as I was mentioning before, recently, common table expressions or CTEs gain support, which I would have thought maybe they would have been around for a lot earlier. But ActiveRecord has a first class method, helper method for CTEs now. And then composite primary keys, I mentioned. So ActiveRecord has gained more support even when you move beyond one database.

Starting point is 00:24:02 If you want to work with multiple databases, you can configure that in your application. There's even the ability to take advantage of automatically sending read-only queries identified by the HTTP verb for your web app to a read replica. If you have that configured, ActiveRecord lets you set up what they call a writer and a reader role. And then they even added that to the sharding capabilities too, where if you have, if you take advantage of horizontal sharding in ActiveRecord, you can do some automatic shard distribution.

Starting point is 00:24:35 And so I think because there are more capabilities that ActiveRecord supports, they're also, I tend to see less writing of SQL, but again, I think it kind of comes back to the team's composition too. If the team's very familiar with SQL and also myself, as I've gotten more experienced with SQL, I think my patience for like, okay,

Starting point is 00:24:54 is this supported an active record? Like, okay, fine. I'll do it an active record because that will work well for the rest of the team or whatever. But if like, if I can't find it pretty quickly,

Starting point is 00:25:03 like I'll just write it as a SQL statement. Like it's no big deal to me. Makes sense. I've worked with Rails projects a lot, and one of them is GitLab. Shameless plugin of ads for their migration helpers,.rb, is great, and documentation is open and great but someone in rails community should finally rename disabled ddl transaction uh exclamation mark disabled to do a transaction because people just because of this weird um say word or literal right A lot of people tend to say, we are going to deploy this without transactions,

Starting point is 00:25:50 in non-transactional mode. Postgres cannot work without transactions. So it's just becoming like single transaction. Each statement becomes single transaction. And just like, I don't like this part of Rails at all. For many, many years, it should be renamed. But at the same point, levels of understanding. I also understand at some point, understood, create index concurrently,

Starting point is 00:26:15 and you need disabled DDL transaction to run create index concurrently. It's not transactional, right? Right. Because it can leave leftovers. So, yeah, things are difficult. But I see people just like Ruby developers, they say we will execute this without transactions.

Starting point is 00:26:33 And it just breaks my ear, Postgres here. It's not right. Is it possible to change at all, to create pull requests or something? Oh, you definitely could. I've actually seen... I mean, yeah, we could after this call. We could create a pull request to Rails

Starting point is 00:26:48 and try to get that approved, but it'd be an uphill battle, I think. It's probably a huge battle. But I already have one battle won, and I remember the 37signals team or Basecamp or who is that? Yeah. They added support of the logic when we have replicas, asynchronous replicas

Starting point is 00:27:08 and a write happened in a session. So you need to stick to the primary for some period of time because otherwise if replica is lagging, you will see, you won't see your own write immediately after. So they implemented

Starting point is 00:27:24 this. Before that, many companies, and I also implemented in several languages this logic in PHP and Java in my past. But I see this path finally Rails is going through. And it's obvious to me that they implemented

Starting point is 00:27:40 this lag as a constant written in code. It should be configurable. So there was a big battle and DHH supported my proposal. I was happy to see, like, let's make this configurable. So maybe we can win one more battle, right?

Starting point is 00:27:56 Maybe. It's not a question, sorry. It's just my story with Rails. I like that the design of this replication lag, it is actually a good, I like that the design of, um, so this replication lag, there's like a resolver class concept and the documentation talks about, actually, I want to get back to your question to Michael about, are there any other, about other benefits we didn't really cover, but, um, I like how the documentation says, Hey, we purposefully

Starting point is 00:28:21 have this simple resolver class. And if you have greater needs, then we've, they've built in an extension point. They've said like, okay, just create your own resolver class, implement this method, and then you can do whatever you want, you know? And if I remember correctly, that's how the automatic replica switching and the shard switching works is they both have kind of a official extension point so that you do kind of get some default behavior. They took a guess at what a reasonable replica lag would be, I guess. And then, but they said like, if you want to do something else or omit some certain,

Starting point is 00:28:54 you don't want to make this type automatic, or you do want to make this type automatic, et cetera, you could do that in your own implementation class. Just so we could cover this quickly too, Michael, you were asking before about other benefits, like if you do stay kind of within the Rails world. I think because I wanted to mention that a lot of these, what I would call enhancements for higher scale operations are coming from developers at GitHub and Shopify and companies that do have, you know, internet scale operations, and they're still using Ruby on Rails with their relational database. And they're building a lot of things into ActiveRecord. And then I think those are, you know, great gifts that we receive as users then of the framework. Even if we don't have that scale, we have some pathways that we could grow into that if we need.

Starting point is 00:29:43 And we also get them the benefit of the ORM there where it's like, well, multiple database adapters are supported by the framework. So you need to make sure that this works in both MySQL, even though they don't use Postgres at GitHub, per my understanding, you know, the active record capabilities support multiple relational databases. So, you know, we kind of get that even though they're not using Postgres like within the framework. Yeah, that's really cool. And it used to be like, I'm old enough to remember

Starting point is 00:30:12 when people used to ask, does Rails scale? And, you know, like those old questions. And it just feels like we have so many good examples now that doesn't seem to come up as often anymore, which is nice and refreshing. I did actually almost want to ask from the other perspective quickly, though,

Starting point is 00:30:29 as a Postgres community, what can we keep doing or what can we do better to make sure Postgres stays the number one choice for Rails? Or how can we make things easier for developers at both large and small organizations? Yeah, that's a good question. I do think that there are some, there's a little bit of a recent popularity with SQLite within Rails. It's gained some features that I think make it more scalable. And then also depending on your deployment setup, you know, it may offer you a

Starting point is 00:31:03 simpler deployment configuration. And then on the MySQL side, I think that there setup, you know, it may offer you a simpler deployment configuration. And then on the MySQL side, I think that there's, you know, there's certainly I hear a lot about PlanetScale. They've been doing great with, they're the only, I actually tweeted this a while ago, but like, they somehow made foreign key constraints look cool. I don't know if you saw that video or not. They have this like, they have this well-produced video where they're like, boom, we added foreign key constraints.

Starting point is 00:31:25 And I was like, it was actually kind of cool. And I mean, I think that databases are, you know, it's hard to make, to generate a lot of enthusiasm maybe around it. But I would say that some of the companies offering Postgres as a service these days, there's a number of new startup companies. They're kind of either advertising more like we can help you with multi-tenancy or we can help you with full-text search, or we can help you with these other sorts of like capabilities or, you know, usages that Postgres supports. It just might be a lot that you need to kind of build yourself. And so I do think Postgres has like it just might be a lot that you need to kind of build yourself. And so I do think Postgres has like the raw ingredients for a lot of stuff. But I do think

Starting point is 00:32:10 that if you're not going to use a managed service and you want to take this on, you got to do a lot of work to build skills. And, you know, like if you want to build in your own full-text search at a good scale with good performance, Postgres has a lot of built-in capabilities and is extensible. You can add extensions. But I think just continuing to maybe think about the developer experience, I guess, at least for Postgres users that are web applications. Anywhere the Postgres community can contribute guides and tutorials on, or if companies that use it successfully with some of these use cases, if they can publish on their engineering blogs, like I always love to read those, like DoorDash is a company I know that is a Rails and Postgres

Starting point is 00:32:56 company and has a lot of great engineering blog posts. And I think those are ways that show pathways to leverage some of these capabilities. Yeah, and then I guess I've been thinking about, I think the open source solution still around end-to-end query observability is still like, it can sometimes be limited in my experience where I'll like, you know, we know of tools like PG stat statements, but then we want to collect samples and it can be a little bit difficult. You kind of go back and forth between relying on logs, using system catalogs, but Postgres keeps investing in that. There's new catalogs and there's new things. And so I think like anytime, any of those sorts of

Starting point is 00:33:35 things that make it easier to see what my query workload is like, where are the costs coming from, especially if they're connected to things I can take action on. Like Nikolai, you mentioned recently, like how PG stat statements shows the number of rows returned. You could go through and look at queries where you could say, well, we should really probably add limits to these queries.

Starting point is 00:33:55 Things like that can make really meaningful, you can draw a real clear line between like, here's information that's being provided to you within Postgres that helps you out on, as you're building and scaling and operating your system. But this approach is reactive. If we see PgStore statements on production and it's already happening, it would be great to try to guess based on Rails code. It's not about migrations already. Just regular, some code, RB file, which serves some page or API endpoint.

Starting point is 00:34:31 And we can just look at it, we can say, oh, limit is missing here, right? Yeah. People at GitLab, for example, they build a lot of additional things in CI, which help them. Yeah, that's right. I think something can be found on public. So basically, for example, a select plus one thing, which is very common for ORM, it can be automatically detected before you deploy. And even if you test it on a very small database, not on a full-size database.

Starting point is 00:35:05 Or probably workload generated is not yet in production, but we already can see something is dangerous here. I'm trying to say this is probably not Postgres' job to detect such workloads. Yeah, it's probably not Postgres' job. I agree with you. I was kind of thinking there's the Ruby open source community. There's some great command line tools. I'll shout out rails-pg-extras is one.

Starting point is 00:35:30 And a common interception point where you might want to take a more of a proactive approach would be when you're creating a migration. Like, oh, do you realize now you've created an inconsistency between your schema and your application? Like maybe you were checking over here, it was a string type. And over here, you've created an integer type, that kind of thing. Or that's not the most exciting example. But that command line time, like when the developer is working, is kind of one touchpoint. Or like you said on CI, you shipped off some code to a system running test,

Starting point is 00:36:00 and it starts to detect things and gives the developer feedback before it goes to production. That's always a good thing. Yeah. Well, I'm trying to reflect my experience with Rails. I can show you another one from 2017 I found. So Shopify is on MySQL, right? Yep.

Starting point is 00:36:17 You mentioned this. So let me describe it. There is a gem or library, I don't know, called DelayedJobs. Yep. DelayedJob, called Delayed Jobs. Yeah. Delayed Job, just a single one. Delayed Job. And in 2017, I saw the issue. I had a client in 2017, which was on Rails and on RDS already.

Starting point is 00:36:41 It was maybe around the first time I had a good production experience with RDS, which I actually found liking because it provided some good automation for cloning, right? For experiments. Experiments matter a lot. So I found this problem. It was not performing well in delay jobs under scale. So you have a lot of jobs, a lot of entries, also like high insert rate. You need to have high processing rate. It was super easy for me to find what was the problem. And I also found this issue, it was discussed on GitHub since 2013, like saying with half

Starting point is 00:37:21 a million jobs, it gets really slow. Half a million is not a huge number at all for Postgres. It's a small number. So I popped up saying it's super easy. Like you just create additional index, this one, and you have already select for update. Just add two words, keep locked. And index plus this, you have massive boost for your library.

Starting point is 00:37:45 You don't need to migrate to Sidekick or Kafka or something. That's it. You don't need to learn PGQ from Skype. And they didn't do it. Now I understand because it's from Shopify and from MySQL world. They started discussing some Postgres versions. And I see that I got a bunch of likes. And I keep having email notifications, people thanking me,

Starting point is 00:38:11 because this saved their life, right? Because this improved. But it looks like not only we talk about connection of Ruby and Postgres worlds and the problem of how to connect better, but also this idea of ORM, let's be abstract, let's keep agnostic from database. This makes life worse actually. We cannot use what Postgres can offer. What do you think about this?

Starting point is 00:38:40 So again, not a question, reflection of my experience, but maybe it's a good discussion, I think, right? Well, you wouldn't be able to, I don't think you'd be able to use Postgres if Rails only supported one relational database. It would not be Postgres, I don't think, based on who runs the Rails project. Yeah, it certainly wasn't the most popular in 2005 when Ruby on Rails was kind of really getting out there for the first time, which now is a long time ago. So it does provide you that indirection point where you can say, well, we have

Starting point is 00:39:11 a generic adapter, and then we have the MySQL adapter. And maybe it doesn't support skip-locked. So maybe we can't do skip-locked at all because it's not supported across all three of our databases, which is a trade-off. You can write if if it's postgres add two words yeah and actually that's there's been um i should have had a couple examples ready to go but there's actually a couple of things that

Starting point is 00:39:38 are actually only supported in postgres that active record supports which is pretty exciting because it's like a little bit of a philosophical switch to me where you say like, okay, well, we have these really useful capabilities, but we're just going to support them in Postgres only, you know, sorry, my SQL users. And that way then you get the best of both of those categories. But of course, like then, then we aren't really getting into this, but there's a whole discussion about SQL standards and stuff like that. So I know like, for example, the returning clause is not supported in... My understanding is it's not supported in MySQL. I've just checked the docs for MySQL, like the latest version,

Starting point is 00:40:12 and they already started to support skip locked. Okay. Yeah. Well, and it could be that a framework that chooses to offer conditional support, as long as it's, of course, well-tested and supported, then that could actually apply pressure to another open-source database to add support for a SQL standard command, right?

Starting point is 00:40:32 It's like, if this gains support, probably not one framework with Rails, but if everything Rails and Laravel and Prisma and all the language communities are all clamoring for additional support. You mentioned SQL standard, but the problem here is that SQL standard doesn't care about performance at all. Nothing about indexes and skip lock is not there. So they don't care about performance.

Starting point is 00:40:59 And here we talk about performance, right? So it's not easy sometimes. There should be some addition to standard talking about performance, right? So it's not easy sometimes. There should be some addition to standard talking about performance, but I don't see how it can happen. Yeah. I think it's great that you've provided that feedback Nikolai into the community, you know, and I think there is to the earlier question too, I think like Postgres is going to, you know, has its core set of objectives with each release, you know, where the development effort investment is going and that sort of thing, which is going

Starting point is 00:41:29 to be a completely different track than Ruby on Rails. But like, if we have some overlap, you know, the folks that tend to maintain the Postgres adapter code are Postgres fans in the Ruby on Rails community. They happen to also write Ruby. And it's great to have those kind of overlapping folks. There's kind of this discussion around, I know we don't have, we're out of time to get into this, but there's an example that comes to mind too of an in-clause SQL query that has a large list of values. And then some folks noticing that from Ruby on Rails and on Postgres. And that was one where I felt like... I think Nikolai, we might have both tweeted about this at some point, but there was

Starting point is 00:42:11 kind of this crossing the streams for me where I saw Postgres people I follow and Ruby on Rails people were like, we need to do something about this and improve query performance for these types of queries. Nikolai Winters Right. Also,.plug, or how is it called? Plug, right? This is like when you retrieve everything, then you do something like Postgres can do it much more efficiently. Don't do

Starting point is 00:42:36 work instead of database on application side, right? I guess it's a small thing. Any application language can be used for this not efficient approach. What do you think about the future of Ruby in general? I think it's achieved that kind of cockroach status where it just never goes away now.

Starting point is 00:42:58 I don't know. There just seems to be enough, not cockroach DB, but the actual insect. There's such a large amount of companies that there's just a lot of work to maintain the applications if they're successful that use Ruby on Rails. Of course, the growth is not what it was in the early days, but there's a huge community of Rails developers and there are new people entering it. So if someone enters it, because maybe it's like their bootcamp program, or they joined a company that's using Rails, despite being 20 years old as a framework for Rails anyways. First of all, I think Ruby is a great language outside of Rails.

Starting point is 00:43:32 But despite being a 20-year-old framework, there's a very steady development clip. There's new features being added that are exciting. I just think they tend to be more front-end oriented. Ruby on Rails these days is trying to capture more attention from what might have been JavaScript-only apps before and offer a lot of capabilities as you're building your web application screens, giving you a lot of interactivity and low latency, but doing it within Rails and getting back to some more of the full stack kind of application building capabilities. But there are things happening too

Starting point is 00:44:09 on every release related to Postgres and I certainly track those and it still feels active, just not where everyone's flocking to. I thought the active might be a Rails joke. Oh yeah, it could be, yeah. On a serious note though, and thanks so much for your time.

Starting point is 00:44:26 Where can folks go to learn more? Yeah, if anyone's interested in this topic area and they'd like to explore the book that I wrote, it's at pragprag.com. That's the Publishers Pragmatic Programmers. And I do a fair amount of blogging on Postgres and Rails topics. My blog is andyatkinson.com and I'm also on most of the social media apps. And this year I'm going to be at Sin City Ruby

Starting point is 00:44:54 and PG Day Chicago as well. So probably more for Postgres people. If anyone's at PG Day Chicago, I hope to make it to some more Postgres events, but I love the in-person events too and being able to meet more community members and learn from other people. And the book is called High-Performance Postgres QL for Rails, right?

Starting point is 00:45:12 Yep, that's right. Awesome work. Awesome publisher as well. And thanks again. Thanks for having me. A lot of fun, guys. Thank you. Thank you.

Postgres FM - Rails + Postgres

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.