Software Huddle - High Performance Postgres with Andrew Atkinson

Starting point is 00:00:00 fundamentally, I do enjoy the process. Like I write on my blog a lot and I always have, I'm sure like you, I have a huge list of things I always want to investigate. And for me, like writing about them is how I do a proper investigation. How did it take you to write the book? I'd say the first draft was about like nine months, I think, to a year. And then surprisingly, I thought it would be much faster. But that period from when the first draft was done to when actually it went through all the technical editing and then later the layout and that sort of thing and the additional edits, that was actually almost a whole nother year, which is really wild. In terms of consulting, are most of your engagements with Rails and Postgres specifically, or is it even like Postgres more broadly for things like that? Yeah, well, I've just started to sign some client workup that is outside of Rails. specifically or is it even like postgres more broadly for for things like that yeah well i've

Starting point is 00:00:45 just started to sign some client workup that is outside of rails like they're not using ruby on rails at all and um but they're using postgres and um yeah i think that's kind of where i want to get to i feel like there's this giant ecosystem of extensions what do you think of that whole um ecosystem we're seeing there hey folks this is, this is Alex got another great Postgres episode for you today. Got Andrew Atkinson on who just released high performance Postgres for Rails. He's great. I think he's super useful if you're like building in some sort of application framework using Postgres a lot and just want to know more about performance optimization and scaling and operational stuff, all those sorts of things. He calls himself accidental DBA or an application DBA. And I just think it's really fascinating.

Starting point is 00:01:30 So we talked through a lot of just like different Postgres features, what you should be using, what you shouldn't be using, performance tips, all sorts of things like that. Really great. Hey, check out the book. I bought the book. It's awesome. It just came out. It's really great if you're sort of in that world. So as always know if you have any comments suggestions questions feel free to hit up me or sean if you have any guests that you want on feel free to reach out and with that let's get to the show andrew welcome to the show hi alex glad to be here yeah so i'm excited to have you on kind of continuing our postgres series here and you are the author of High Performance Postgres for Rails, which is printing as we speak. By the time this episode is released, it'll be available, you know, probably next day

Starting point is 00:02:13 shipping or two day shipping from Amazon, whatever. So get that if you want it. But you do some Postgres consulting, but maybe for everyone, everyone else here, give us a little bit of background on who you are and what you're up to. Sure, yeah, I'd be happy to. Well, I've been a Ruby on Rails web application developer for more than a decade and done some other things as well within web applications like other languages and frameworks. I also did some mobile development. But the last kind of handful of years, I've been more of a backend developer. And then I've been really enamored with Postgres for a number of years I've been more of a backend developer and then I've been you know

Starting point is 00:02:45 really enamored with Postgres for a number of years having the chance to use it at higher scale running into a lot of common issues and I decided at some point to sort of specialize in that within the backend engineering realm and specifically for web applications and just kind of understanding how to maximize our use of Postgres. And now I'm doing that as a consultant for companies that want to hire me to help them out. Yeah, absolutely. Tell me, can you tell me more about like the origin there? And like, how did you get into it?

Starting point is 00:03:15 Like, did you raise your hand when it came up like of, hey, database issues are sort of our problem here at the company right now? Or how did you get into the deeper Postgres stuff? Yeah, I mean, I think it's, it's, it was definitely a little hand raising, it was having opportunities, of course, like that, I guess, preceded the hand raising. It, you know, it was kind of like trying to look at challenges as opportunities, technically, but also to be helpful on the team and sort of compliment a team that, you know, sort of where I cut my teeth mainly was a small team. And we had, we could not everyone on the team could do everything. So I was sort of like, well, I'd love to actually help more in terms of our scalability at the time, one of the past teams I was on with the database operations

Starting point is 00:04:02 in particular. And it sort of scratched my personal itch on, you know, learning to read query plans, learning to design indexes and that sort of thing. But then it was immediately useful and helpful for the team. So that was a big part of it. And then I think also just like philosophically, as I've worked at, you know, like there's things that sort of line up with my, my kind of perspective in life. And that happened to be how databases work too. Like they're, you know, they, they sort of hold the company's most valuable asset in data. And I like the idea of working close to that, but then there's this interesting challenge too, where as an application skills, you also need to have great performance to hand the data back out to clients, to client applications.

Starting point is 00:04:51 And that's really an interesting challenge to both make sure that it's consistent and preserved properly, backups and all that, but also can be served back. And so I just think it's really interesting. It's really practical. Every company I just think it's really interesting. It's really practical. Every company I've worked at faces these challenges. And so, yeah, I just sort of felt like, you know, I really want to do this better as an engineer. And what do I need to do to get there? Yeah, yeah, for sure.

Starting point is 00:05:17 I empathize with that. I have like a similar road, you know, background stuff, but just got more and more interested into the database stuff. Because I think it is really fascinating. Like the end stuff gets a little repetitive after a while but the database stuff i think is just like so fascinating like how many different areas there are and it sort of like makes sense logically but you just need to like you know understand all these different pieces and then and then think how they they go together and i like how you called it uh on a different podcast like the accidental dba or like an application dba of

Starting point is 00:05:45 like hey you sort of accidentally fell into it and now you're sort of like the person but you're still like very application focused as well right like you're you're you're a dba but you're not just like you know ivory tower like sitting in in in the you know in the back office like writing the queries and all that stuff you're like doing it in the context of of shipping some stuff and and you know um understanding user needs and things like that, which I think is great. Thank you. Yeah. Yeah, for sure. Like, um, I learned that term from charity majors, the accidental DBA term. And then I also learned this term from hockey Benita, um, app application DBA. And that was kind of like a light bulb moment for me, both of those things where it's like, like yeah you can kind of accidentally fumble your way into this from real needs from an application that's scaling and needs to write and you know

Starting point is 00:06:30 retrieve data quickly and and then you also can say you know i really like working with this part of the tech stack happens to be a database maybe you don't tend to do so much like you know security and backups and restores and stuff but certainly the applications usages of the app of the database like mostly around indexes and schema design and uh data modeling and that sort of thing like that's that's critically important for the success of the the growth and and performance of the application as well yep yep absolutely well on the book i want to say like the book is great. I got the early access version

Starting point is 00:07:07 and went through it. I think it's great. I want to talk about it more, but like I thought like it was useful for lots of different types of people. And I would say like three ways, like, hey, if you're like where you were maybe seven,

Starting point is 00:07:18 eight years ago, where I was like five, six years ago, and it's like, hey, you're starting to get interested in the database and want to know more about how that works. Just reading it straight through is just great, great right because it has this like sample application and you're like actually running these commands and understanding

Starting point is 00:07:31 these concepts but like applying them well i think that's that's super useful number one i think there's like a second group where it's like hey if you're having operational stuff or scaling stuff you see that on the future and you want to know some of the patterns for thinking about that better. I think that's like a great reference material for that. And then just like, if you're having specific performance issues, man, appendix one, I think it's appendix one. It's just like this amazing resource to like point you in the right direction. It's like, Hey, if you're having slow queries, go read like these four chapters, right? If you're having a high rate, right latency, or you're having invalid data, right? Cause your scheme is not tight enough. Like go read these specific chapters like i thought that was that was super

Starting point is 00:08:07 useful as just like i'm drowning right now and i need something to help me out i don't have time to read through the entire thing today but like if i could just like fix that problem that'd be great as well so i think i think like on all those things it's it's really a great book thank you yeah that's that's really great to hear and the field, we called it the field guide at the end, the appendix one that you're mentioning. That was actually like a late edition. And it was based on some reader feedback that was like, hey, what if I don't really want to read the book front to back? Which I get, I mean, a lot of tech books. So they were like, what if I have specific issues I want to dive into?

Starting point is 00:08:43 And that made so much sense to me. And I was like, why, why did we not do this earlier? Like, but yeah, I mean a lot of, of course, like a lot of, um, you know, manuals and things you pick up or they have like a troubleshooting guide and, um, yeah, it's great if it gets more people into the book that way, you know, even if they don't read the whole book, like, but they can get some value out of starting from a specific issue that that would be a great outcome yeah yeah and the one thing i would say too is like this book is high performance postgres for rails i have never written rails in my life i've written like 10 lines of ruby in like some chef scripts once that i inherited for

Starting point is 00:09:17 someone and that's it and it was still like super used it's not like super rails heavy to where it's not going to be useful to anybody else so if you're like an application dev you want to be an application dba it's somewhere in that realm like definitely check it out um like in terms of consulting are most of your engagements with rails and postgres specifically or is it even like postgres more broadly for for things like that yeah well i've just started to sign some client work up that is outside of rails like they're not using ruby on rails at all and um but they're using postgres and um yeah i think that's kind of where i want to get to and i i guess like both in writing the book and then in launching consulting which i've really just been doing now for not quite six months um you know i just felt like ruby on rails is my my homeland so to speak like it's what I know, so I would start there.

Starting point is 00:10:05 But I do think that a web application, especially if there's an MVC type of framework with an ORM, whether it's PHP and Laravel or Django or it's Node and Prisma or that sort of thing, I'm sure that there's going to be some of the same problematic query patterns because of this kind of object relational sometimes mismatch and that sort of thing. And so I do think I could help teams out that use Postgres, that use other web application frameworks. And I'd be happy to do that, of course, both on the database side

Starting point is 00:10:44 and then also, if needed, hopping into the code base and that sort of thing. Yep, sure. For most of these engagements, what are the developers like and what's their relationship with the database like? Are they curious but just need a guide? Are they indifferent and just like, hey, somebody come fix this for me? Where are they sort of at with that? Yeah, I've worked with teams that have um you know somewhat to some extent i get it like i've also been on the other side and have hired

Starting point is 00:11:10 uh i was part of a team at a past company where i was an employee a full-time employee and we hired a database person outside to come in and take a look at our stuff and help us out and whatnot and um we they had sort of an a narrow insulated perspective into the company and i i think like practically speaking that is usually how i start with a company is like i do have somewhat of a narrow um perspective so i don't necessarily know a lot about the dev team beyond people i'm initially or i'm i'm meeting one-on-one or might be part of a kickoff or that sort of thing but i certainly have had you, it's been very fulfilling to have sometimes, um, like I recently started helping out a company where someone told me right away, they're like,

Starting point is 00:11:53 Oh, Hey, I bought your book and I'm a big fan of some of your stuff. And I'm, I'm really excited that you're, you're here. They weren't the person that the, um, they weren't the person that directly hired me, but they were part of the team. And I felt very good. Like this is going to be a positive engagement for all of us, right? Like I'm going to, this person's kind of primed to want to learn more like they're, you know, they, they have, um, you know, they have observability in place.

Starting point is 00:12:19 They have some query issues. They have some indexes on their tables, but they're still slow queries. They're not sure why they're wondering about different schema designs like should they should they add their tenant identifier column everywhere like you know different things that can lend themselves to opening up improved performance and so i think like in those, those are, those are great. I think I also though have been, um, hired where, um, on some teams where they, they're pretty good. And I think they're sort of just looking for, in terms of their skills and their experience.

Starting point is 00:12:58 And I try to call that out right away. Like I, you know, in some cases, like one client, like I didn't really actually after a bit of time, like I didn't really think I was super needed by them and i think that's fine too like there could be a i know another consultant i follow kind of mentioned like the project that he was on arrived at a natural conclusion like because he was helping out with performance you know performance was generally improved it's not you know there will be ongoing opportunities however you know if you make a big improvement for a company like it might sort of just reach a conclusion so yeah i'd say i have kind of a spectrum of experiences there yeah yeah absolutely what do you have any like common

Starting point is 00:13:35 mistakes that you see um kind of over and over is it the same couple things or is it is a pretty wide variety um yeah i guess like a few things would just be um over indexing you know i i know we have this field in a where clause in a sql query somewhere so i added an index on it but it's not used at all by any queries um under indexing you know like these queries are slow we've got two indexes on the table but you know that it might actually benefit from having around 10 based on the queries that how the table's accessed how the row data is accessed um you know never tuning any parameters which you know i i think isn't the very first thing you should do in my experience but i think having awareness of what the tunable parameters are there's there's

Starting point is 00:14:22 about a dozen or so that are in my kind of core list that I would recommend. Never archiving any data or really talking about any kind of data retention policy. I think it's hard because you got to usually have like a product discussion with kind of sometimes with a lot of stakeholders about like, you know, changing a system that has access to data forever, but it's years and years old and there's years and years of data that's never actually accessed, but it could be. And having that conversation, I'd say those are some of the patterns that come to mind. Yeah. Yeah. Okay. Let's dive in. I want to dive in. I like a bunch of questions,

Starting point is 00:15:00 but these are all some good ones. So you mentioned over-indexing. And so if I have too many indexes, is that mostly going to show up with, with right performance, just worse, right performance? Cause it's, it's hitting all these different indexes. Yeah, it's right performance, but I actually think it's more about, it's, it's a little bit of a symptom of, you're not really able to tell that your indexes are not being used, you know? And, um, the, and you know, if it's, it's unlikely that sometimes i think folks think that they might be adding a benefit to having an index that could be used but it's if if a certain amount of time has passed and the application is pretty well stabilized um and you know you're not i'm not talking about a situation where you're like in a rapid scaling period but

Starting point is 00:15:43 if you have a pretty stable application that's mature it's a likely a b2b kind of sas app and you added an index years ago that's never been used like i think it's very unlikely it's ever going to be used and you know and i think sometimes there's a you know a little bit of a mindset shift in this this actually is worsening right performance a bit, but it is providing no value. And the other thing that can happen is for heavily updated tables is there can be very high bloat in indexes, and they can grow. And it's not uncommon, and it can be very surprising to people that indexes can actually exceed the size of the table if they're never rebuilt. And so what can happen is besides the right performance, which is an ongoing concern, as you start to do, you create backups of your database and you want to restore those

Starting point is 00:16:35 for test environments, or if you start to become more concerned about your ability to restore from a backup and how long that might take those are things where you know you have this unused index that is possibly very bloated and very heavy in terms of how much space it's consuming like let's let's get rid of it and i'm talking about like for clients i've met with it's not uncommon to have dozens or even hundreds of unused indexes oh wow yeah you know if they've got a database with 500 indexes, it's not uncommon to have 200 that are unused. And usually it speaks to just more of like,

Starting point is 00:17:11 you know, we're just kind of spraying and praying indexes here. And we don't really have the observability or the skills to know that they're not being used. Yeah, yeah. Do you, you sort of mentioned like, hey, if you've had an index around and haven't written to it,

Starting point is 00:17:24 it's probably, you know, in the last couple of years, it's probably, or haven't used it in the last couple of years, it's probably time to drop it. Do you ever like do you recommend having sort of currently unused but potentially used in the future for some feature index? Or are you just like, hey, let's create that index once we specifically know we need it? Or is that like too expensive an operation to do even concurrently that you'd rather like sort of build it early on and keep it updated up front? Yeah. I mean, you hit on all the points really. I mean, I think that the, it's, um, it's tricky to answer because I'd say my, I have like my really quick answer and then my like Cadillac you know deluxe answer my really quick answer is um if you know I'd say just if if you're really concerned that the index may be useful and um and be used by

Starting point is 00:18:15 queries but you're not yet able to determine that concretely and you don't currently have a use case for it but maybe you have a feature in development that you expect it will be useful. It's a big table. I mean, you know, you can keep the index. What I would try to do is get the observability in place that will show that it's unused. And then on an ongoing basis that it's unused, even after your feature launches. That's my like attempt at a short answer. My Cadillac answer though, is it's, it's more of a holistic problem. We want to get like better observability in place that shows all of our indexes that are used and the ones that are unused and then we want to like develop kind of a hygiene approach where it's like we're always kind of removing the well i mean usually the unused indexes is kind of a big one-time change but then if we have the monitoring in place then as as, you know, application code changes and evolves over time,

Starting point is 00:19:06 we also can, you know, do a periodic maintenance kind of check and make sure we don't have a lot of unused indexes. Yep. For monitoring unused indexes and things like that. Is that like, you know, a script I'm running every once in a while? Is that PG hero or something like that? Or like, how are you monitoring those? indexes? Yeah, you can DIY it, and you can certainly check within the Postgres system catalogs when indexes are scanned. That information is tracked. So you can check periodically and even capture that data you want

Starting point is 00:19:38 if you want to store your own scan data. You could say, go through all my indexes now, check their scan data. You could say, go through all my indexes now, check their scan counts. And then what you could do is you can build a metric or a timeline view as to a particular index or set of indexes that haven't been scanned over a particular set of time. And you can do that both on your primary instance and any replicas that you have. And then like pghero is an open source tool that does some of that for you so it essentially comes with a bunch of those queries baked into it that are checking against your system catalogs or another tool that i've used with a couple clients that i'm a big fan of

Starting point is 00:20:18 it is a commercial tool though is pg analyze and pg analyze will you set up an agent and it will consume your log files it'll perform some of those types of queries and it'll actually turn turn that into sort of like work units and it'll say here's an unused index and it's actually an alert and then you can have a little bit of workflow around that and that kind of thing so i i'd say, and of course that tool is going to cost money on a monthly basis. So it makes sense for companies that are heavily using Postgres and looking to squeeze as much as they can from it. But if you're just kind of getting started, I think running those queries by hand is perfectly reasonable. Yep. Yep. Okay. You mentioned rebuilding indexes. Is this similar to vacuum?

Starting point is 00:21:05 Is it contained within a vacuum? How does sort of rebuild and vacuum, how do those work together? Yeah, it's similar to vacuum in that the idea is to efficiently store your physical row data or to store it physically efficiently. So what are like logical rows in a table you want those to be stored physically as efficiently as possible without gaps and that sort of thing which vacuum doesn't do but it makes space available for reuse vacuum without any additional commands there is a vacuum full command that will actually rewrite your table and that would be the closest that would be the closest corollary but we'll talk about why it's not easy to do but

Starting point is 00:21:49 vacuum full would be the closest to a re-index operation on your index and that it's going to rebuild your index and any references any index entries that would have pointed to row versions that no longer exist or are reachable those are going to be gone because it's just going to rebuild your index from the current point in time and it's it's pretty if you have a table like it is like it's always worth mentioning to people because um first what i recommend is you get rid of all of your unused indexes but then i do recommend that you you know if you have a database that's been around for a number of years and you have tables that especially are heavily updated or there's frequent batch deletions from them to go and take a look at

Starting point is 00:22:31 your indexes and you can also check out we've got queries in the book and you can find these easily online that help determine the kind of bloat percentage or the estimated percentage of your indexes how much of them are non-useful content, basically. And it's pretty wild how sometimes they can actually be cut in half in terms of size. And then that helps both with, again, with space consumption and access to the indexes. You're just accessing what you need from the index more efficiently.

Starting point is 00:23:01 Yep. Yep. Cool. Okay, we dove right into the deep stuff right away. I'm going to go back to some warm-up questions i had um so one tip you had in the book was around hey single column indexes and you said like hey single column indexes often is the right choice to just use like hey just index that one column for me like my dynamo brain i always think of like partition key and then sorting. So I think of that. So it's weird not to have like, hey, some sort parameter on that. I guess like does this advice, is that single column advice mostly for if you are getting just a single row at a time, like maybe getting a user with this particular email address?

Starting point is 00:23:37 Or does that also apply when I have some sorting in there? And I'm thinking like, hey, an e-commerce store where they want to get the orders for a customer, right? And so like i'm indexing on that customer id but in that case should i be using a multi-column with like a sort on the created ad or something like um date yeah it's it's a i mean you're thinking about it in the right way and um i think like having it's like what I found is I start to develop a nose for, you know, sort of like what a good candidate multi-column index would be like based on a query or a situation where it's not going to be so beneficial.

Starting point is 00:24:16 Like a single column index will provide the same value maybe because there's a lot more columns that need to be scanned or that sort of thing. Or maybe there isn't an ordering and the query or that kind of thing. And so there's going to be that second heap fetch phase after the index scan that's going to go and get all the other column data that we need. But yeah, I mean, for single columns, if you have an order by operation, it's possible that you are in and a filter. Like if you're filtering on all the rows with a date greater than a certain value and your query is ordering by that, a single column index on that column might work out great. And you can even control the column direction in the index too.

Starting point is 00:24:59 So if you're sorting by order by descending, you can set your index entries up that way as well in your index definition but if you do have like a where clause that is filtering by an attribute on the table and and then you have an order by for a second attribute that could be a good spot to have your leading column be your filter column and your second column be your ordering column and um the what i try to help clients do what i always do myself is try to get as close to production in terms of row counts and instance resources that i can if i'm if i'm not able to kind of validate a design on production which you can do but uh you know like most companies don't want to do that um but if we can get like a pre-prod postgres instance then we can if we can get the

Starting point is 00:25:52 sql query with representative params then we can actually just try both indexes and run the explain plan and and see what postgres likes better yep yep on that same note okay i'm gonna i'm gonna skip a little ahead a little bit like but how do you think about performance testing and things like that with with a relational database when you have there are just so many factors like you like you're saying you can replicate the row counts and things like that but replicating the concurrent queries that are going on against it maybe even like the bloat stuff depending on yeah how much is happening yeah yeah like how how do you do sort of accurate performance testing? Or is it a lot of looking at the query plan and understanding that?

Starting point is 00:26:30 Or how do you think about some of those things? Yeah, I mean, it is really, really hard. I don't think there is a great solution to have a really super 100% high fidelity pre-production performance testing environment. So I think like with databases, it's helpful or with like Postgres anyways, it's helpful to know the kinds of things that influence outcomes and then basically kind of zero in on those, you know, like a lot of times what I look at is the, you know, the cardinality of a column, like the unique set of values and then how selective a query might be

Starting point is 00:27:06 and how selective an index definition might be. And there's this kind of rule of thumb that I always kind of kick around in my head. I heard from somebody, it's just, this is just like not a low, not a very sophisticated rule of thumb, but it's like, generally when we access the table, even if we have not the same sort of row counts, but depending on, well, we do generally want to have similar row counts. But I tend to think of this 30% figure as a proportion of what we're trying to access and what the total row count is for the table so if we're trying to access something that's like 30 or less it's likely the query planner is going to choose an index scan over scanning the whole table and you know in terms of like in terms of performance problems i mean commonly the the main problem is indexes that we're hoping are used or we're not able to design a good index are not being used.

Starting point is 00:28:06 And so the full table is being read and, um, you know, that's where we might start from. And if we can do, even if we don't have a perfect fidelity pre-prod environment to test in, but we know that the problem is in production that the full table is being read and then it's filtered down to five rows or something from you know millions um then as long as we can simulate that as a starting point uh and we can get an index that is you know matches kind of like a narrow set then what we can do is typically what we can do is be pretty confident that if we can recreate the slow query in the pre-production environment, like i.e. the full table scan, and then we can design either sometimes through a combination of query changes, but a highly targeted index using tactics like,

Starting point is 00:28:58 multi-column indexes, partial indexes, indexes on expressions expressions different tactics that could be very specialized for a particular query but like as long as we validate that that query is then used and lessening the cost of that in the pre-prod environment then usually it's like almost always the case like let's add the index and prod and the the problem goes away but we can also we can also of course add the index in production and be super confident that the plan is going to be set. And then, of course, we want to have observability, too, that tracks what the query plans were for a query over time, which we can talk about as well. Yeah, yeah. If I'm adding an index in production and doing it concurrently, is that going to take up a lot of resources?

Starting point is 00:29:44 Like if it's a big table, is it going to take up a lot of resource? Like if it's a, Hey, it's a big table. Is it going to take up a lot of resources or is it just going to use a smaller amount of resources and just take longer? Or like how worried should I be about adding a new index to a, you know, a decent size table? Yeah, there are some caps in there. Like you'll have, um, you can only do one per table concurrently. I think if you start a second one from another session, it'll get canceled. I haven't tried it lately. But you can only do one. And then you have maintenance work memory as one of the Postgres parameters that will control memory for maintenance operations. I believe that is

Starting point is 00:30:19 maintenance that's used for, or that memory allocation is used for create index concurrently but i i tend to actually see it elsewhere like for vacuum operations and that sort of thing so i'm not 100 on that i might be wrong about that um but generally no i mean that is what we need to do for big tables like for example i've got a client with a 600 million row table it does take like can take like a couple hours to build an index. So we might set up a session with Screen or Tmux and manually kick off a crate index concurrently in the background, sort of monitor things.

Starting point is 00:30:51 There's going to, of course, be IO and CPU that's going to be used during that period. Yeah, absolutely. All right, let's talk about schemas a little bit and just basically how strict should I make my schema is the, is the overarching view here. But, um, some specific questions like, should I allow Knowles Knowles on my, on my columns or should I try to avoid that as, I mean, obviously sometimes you

Starting point is 00:31:14 need it, but like, should I, how do you think about Knowles? Well, yeah, I mean, I generally try to default to being restrictive, I guess. I mean, I think that's like the, the, the DBA mindset is as many restrictions as possible, because restrictions breed better performance typically, and actually sometimes more information about your data and sometimes better quality data and that sort important to the application's goals, then from a read perspective, then that's a good spot to think about partial indexes where we could, if we don't need to, like if maybe it's information we don't yet have, but we want to capture a row, but that particular field is not set yet.

Starting point is 00:31:57 If we're not querying on that, then we could exclude those using a partial index and say, give me all the rows, but where this value is not null. Yep, absolutely. What about foreign key references should i always use those yeah i i think this is funny i guess because it it seems like for some people it's or it seems like it's controversial that foreign keys are onerous and there's certainly people recommending not using them i mean typically what i see is i I'm an advocate of foreign keys. I think they should be used. I think if you truly are experiencing,

Starting point is 00:32:32 you know, if you have the type of scale where you're moving beyond one instance and you can't use them, or if you can, or if you have the type of scale where you're concerned about the impact of the foreign key constraint enforcement, and you've done the things like add indexes to foreign key columns and that sort of thing, or make sure that cascading deletes are supported by indexes. You probably also are at the level of sophistication where you know how to monitor their impact and what the benefit is in removing them. But I think most people that I've worked with, they're not there.

Starting point is 00:33:11 It's more like there's maybe some fear, uncertainty, and doubt around using them. And so what I try to do is encourage the use of foreign key constraints, check constraints, with the emphasis being on having the best quality data that you can

Starting point is 00:33:25 get at the front end, helping describe your data for other use cases, like data analysts that want to use it later from your relational database. And they're working in, you know, BigQuery or Redshift or somewhere else. And they're like, I don't, this data is a mess. Like, we're sort of like setting ourselves up for success if we have good quality data on the front side. So yeah, I generally recommend them. Yep, yep. What what you mentioned you know um cascade deletes and things like that is that something i should be worried about um you know the downstream impacts of deleting some top level items and then just like deletes 12 000 items elsewhere or um is that overblown as an issue um well i guess like it is something that should be on your radar

Starting point is 00:34:05 if you do take advantage of cascading deletes. I mean, deletes can catch people off guard that deletes can be a somewhat expensive operation in Postgres on big tables if you're deleting large ranges of values. Because of the multiversion concurrency control, there'll be a new row version that's created. So your old row version needs to

Starting point is 00:34:25 be cleaned up and that can create the fragmentation or the bloat problem and that sort of thing and if you do have like you said if you touch a bunch of tables and um yeah like it's a it's a situation where you want to know how to look at the queries that will be generated and use the explain, look at an explain plan and make sure that they're going to be when they're accessed via the cascading deletes. So that's going to be an index supportive query if it's a big table. So yeah, it's not, it's not like, I think if, if you're dealing with tables like in the sub 1 million to 10 million row range,

Starting point is 00:35:02 and you got a few that's doing cascade deletes on a reasonably sized modern instance you're probably fine but i think if you get into the you know if you're doing big delete jobs on 100 plus million row tables it should be on your radar as something to be aware of yep yep what about check constraints um how often do you use those do you use them for like all kinds of application level validation or do you say like hey rails can handle a lot of that um yeah how often do you use check constraints yeah i would say that you know as i've learned more about postgres over the last few years like it's it's something that i'm trying to advocate for more usage of because

Starting point is 00:35:40 i'd say in my real world experience i don't really see a lot of usage of them. And, you know, like I first learned about them more as like a intermediary use case with where you want to introduce a constraint that doesn't let you add it only for new row changes. And you can do that with check constraints. So you can use check constraints kind of as a transitional element, which is cool. But you can also use check constraints as a data quality check mechanism by writing a rule that kind of describes what you expect. And so as a fan of them, I kind of think they should be everywhere on all the tables, like anywhere you have an application rule. Like why not throw a database level check constraint in as long as you're capturing those post what will be postgres errors when they

Starting point is 00:36:30 violate the constraint as long as you're capturing that in the application and handling that gracefully which is you know does take some work i i think they're a great tool yep yep yeah i've sort of um have you heard of tiger beetle in that database? Uh, just very recently, but I don't really know anything about it. Okay. Yeah. I would say like they're unique in a lot of ways. And one of their, their thing, like, you know, they have this thing called tiger style about how they write code and things like that. And one of the things they do is, is they use asserts very heavily in their code. Like all they're, they're like asserting all these conditions, you know, and this is actually in their application code, but it sort of makes me like think about that with postgres too it's just like

Starting point is 00:37:07 you know why wouldn't you just sort of validate that um you know the i think an example in in the book of like hey the the ride completed at is after when the ride started out right so you just don't have some wonky data that way and things like it's like why wouldn't you just make sure your data doesn't get in a bad bad state and in some of those different ways even if you have the application level checks just making sure elsewhere um so i've come around on like hey more more asserts more checks more just ways to like validate that you're not putting garbage into your database and causing a problem later yeah and i think if you that that's a thanks for bringing that example up because it's an interesting one because it's it, it's, you know, it's, I think that like for newer,

Starting point is 00:37:49 newer folks that don't think about data quality as much, or maybe haven't really been burned by that as much in their career experience, it might be a little odd. It's like, well, of course, like, can't we expect that? But like, it's, it's a little wild when you think about how it's like, no, you can't expect anything. Like you don't know what kind of data you're going to get on the front end from the application. Right. And, and sometimes it's not like it's malicious, it's just errors, you know, and that sort of thing with like forms or just, just human data entry, you know, whether it's even outside your company or inside your company. And so it's, it's almost like sending a message in a bottle to the future, you know, to me, it's like, here's what we expect. And I do think like, if you know, those tools are there, it might not make sense to add them all right away, you know, because maybe it does a little heavy handed, but if you start to see some issues like, Oh, we got our timestamps out of order, like they're nonsensical. It's like, well, let's fix it once. And then let's put the check

Starting point is 00:38:43 constraint in. So it never happens again. That's kind of my mindset. Yep. Yep. Yep. Okay. Another feature that's kind of interesting, and I think you see mixed opinions on it, are like functions and triggers. Right.

Starting point is 00:38:55 Where like stored procedures kind of out of pretty out of favor for a while. When should I use functions and triggers? When should I have? Should I avoid them? Yeah, it's hard to provide generic evidence or advice on this, I guess, to be frank, because, yeah, I mean, I think that sometimes what I tend to experience is

Starting point is 00:39:17 if I'm bringing more database awareness of different capabilities, I'm probably not going to go with functions and triggers first it's more just even like check constraints or something like that that's a little bit softer you know if you're already a team that's you know using check constraints and has a nice schema design and has good performance and can read explain plans and stuff then i think like that might be the circumstances for saying okay well, well, do we have some background

Starting point is 00:39:46 jobs that are client code? Like maybe it's written in a programming language called from Quran. That's like doing a lot of back and forth between the client and maybe doing some processing on the client and then making another call to Postgres, doing some processing, writing files out. Like it might be interesting to think about like, well well what if we did this as a procedure as transactional control it's maybe it's relatively simple it's like bulk updating some rows or generating refreshing a materialized view or i don't doing something that's relatively simple but being able to save kind of the client round trips and then in the book like we looked at how you can still manage that as source code in a sense there's a couple open source tools that um like allow you to write you know just like you might write a method in ruby or another programming language like put your function into version

Starting point is 00:40:34 control you can version it you can have prs you know you can treat it like first class code and i think you know i think if you can show a benefit to like the execution time for that approach over an application code approach, like it, it makes sense to me, like, why not? You know, you, I didn't really get into this in the book, but there's even like some unit testing capabilities, PG tap, for example, in Postgres, um, where if you want to write some kind of like branching logic unit tests, you can do that too. Yeah. Yep. Very cool. You mentioned explain plans a few times like what's the best way to tell what i could never parse the darn things so like what's

Starting point is 00:41:11 the best way to learn how to you know make sense of what's going on there yeah it's they're they're super dense and complex you know it's really cool that you know it's databases are a bit weird you know where you have this declarative model and it's a runtime decision like what's going to happen with your query is like post is going to choose that runtime and so like the best we can do is basically say like what are what are you planning to do postgres with this query you know using explain and then what i've recommended to people is like, it's, um, or I've shared my experience, which is that, you know, for me, it's taken a long time to learn bits of a plan, you know, like learning it sort of not necessarily trying to learn everything right away, but, you know,

Starting point is 00:41:59 the basics like is my index used, you know, so making sure I can see where it's mentioned in the explain plan and then the plan is composed of plan nodes and then it's kind of like i don't you know depending on how much time and interest you have and how important the query is you can just go deep on on more plan nodes so understanding what a bitmap heap scan and bitmap index scan are and how that's different from an index scan or a sequential scan you know if you have a need to understand that it's there's some great resources out there like um the postgres docs are a bit challenging there when you're getting started but like i'm a big fan of pg mustard which is a explain plan visualization tool it is a commercial tool but they do have a free

Starting point is 00:42:46 glossary that they maintain with a lot of great terms and then their kind of help information is is kind of a public service too i think where they'll describe what a bitmap heap scan is for example and um a little bit of information about like when you might see that and what you might want to do with that um and so yeah i like, and then the other thing too is, is I'm not beyond admitting that. I mean, I use chat GPT a ton. I mean, like if, if it's like, if I have a specific targeted question, it's an amazing tool that we have available now where I can say like,

Starting point is 00:43:22 I'm getting a bitmap heap scan, I'm filtering, you know'm accessing this many rows the column is this type like how can I access this like you know you can the more you know about how to read explain plans the better prompt you can you can supply and it's not like I typically it's kind of my research assistant you know like I I'll like get some ideas jot them down design a little experiment play around a bit it's not like I typically it's kind of my research assistant, you know, like I I'll like get some ideas, jot them down, design a little experiment, play around a bit. It's not like it's like I type it in and boom, it's the answer. And I copy and paste. It's it's more like it's a starting point. And then I can validate it, try out some other things and kind of iteratively approach it.

Starting point is 00:44:00 Yeah. Oh, man, that's genius. I hadn't even thought about that. That's a that's a great one. Cool. OK. Common table expressions. CTEs, see these a lot. Are these purely for humans and query structure designs that might involve more left joins or outer joins as opposed to subquerying. And then usually what I... Sometimes I've done it more as a human where I've read a complex query with a lot of left joins and a lot of subqueries.

Starting point is 00:44:46 And I've taken out an inner part. And I've essentially written it as a complex query with a lot of left joins and a lot of sub queries and i've taken out an inner part and i've essentially written it as a simple query and and understood its performance characteristics like whether it needs another index or whatever or you know we can structure it and then you can just kind of wrap it in a ct and you can say you know with my simple query as sql and then you can kind of like take the results from that and feed it to the next query. And sometimes I might not actually end up shipping that, but it helps me understand what the query is doing. And,

Starting point is 00:45:16 and then I could maybe apply the tactics from that simple version to the former query and maybe just more easily identify what it needs to lessen its cost but yeah i guess like as far as benefits to the planner there is i'm not so familiar with this but you can materialize cte results and um i don't think we actually covered that in the book but from a past presentation someone mentioned that to me they're like hey do you ever play around with materialize which is an option in the, when I believe when you create the CTE. And so I, I don't, I also don't want to say that there aren't concrete benefits to the planner

Starting point is 00:45:55 as well. I think that there are, I, I don't know that I have that in my bag of tricks yet, but yeah. Yep. Do you, do you feel like you're fighting the planner very often or is it is it not that like do you feel like you have to use hints very often or just like write queries different ways to to force the planner more down a different route or or not too much of that i mean not really like i i don't i i know that there's a lot of narrative around that like um and if you look at a lot of stack overflow posts or you ask people like why why is the planner doing this or that sort of thing like

Starting point is 00:46:28 i don't actually identify with that so much it's more like it's to me it's like um there are there has been a occasional really challenging uh kind of hard to explain result but usually it's like i think of it as there's kind of a deterministic result we can get to as if we can make a narrow enough selection which might actually involve changing the queries or pulling in some tricks like we might actually add a redundant filter column just so that we could like pick it up on an index and lessen the cost. Or recently, I just did a trick of breaking up a query that's just scanning a huge range of data. We could get it to a certain point with indexes.

Starting point is 00:47:15 That was better, but it wasn't dramatic. But by actually rethinking it as, what would this look like as a view that would be materialized that we got indexes to i could take that big range and broke it into two queries with materialized views that were indexed and then union those results together and it was actually dramatically faster to do that because each individual query was fast so it's kind of like doing two queries and combining them basically and um yeah i guess that's that's generally how i go it's it's more like if something's maybe it's because like if

Starting point is 00:47:51 something's not working i just try to go like a different direction a little bit you know yep absolutely um what about okay so now we've seen in the last i don't know five ten years json json b columns um do you i guess like how do you think about json data do you recommend I don't know, five, 10 years, JSON, JSON B columns. Do you, I guess, like, how do you think about JSON data? Do you recommend using it? Do you not recommend using it? Yeah. What do you think about there?

Starting point is 00:48:12 Yeah. Well, I think if I recommend it in that, if like the alternative is, you know, we do 95% of our stuff in Postgres, but we're going to add like MongoDB because we want to do the schemaless thing over here. And now we're running Mongo or, you know, or like there's other examples too, but just kind of using Mongo as an example of, we want to store like data that we don't really know the, um, totality of what we want to store. We don't really want to maybe do schema changes all the time to introduce new schema definition. Um, we expect this to be like maybe a high right table. I guess in situations where you could stay in Postgres, I think it's worth considering.

Starting point is 00:48:54 Like, what would this look like as a JSON B column? We can store JSON data and we can index it later. We can access it pretty quickly still. So, yeah, I tend to think about it more as like, this is a capability we have in Postgres, but there are some operational trade-offs like storing larger amounts of text in Postgres. There's this whole toast mechanism that can add latency.

Starting point is 00:49:17 It's more kind of like your data is kind of more split up behind the scenes and those tables, just like other tables need to be vacuumed and that kind of thing. So depending on the scale and stuff, there can be some significant trade-offs but it's also really amazing like the amount of json capabilities postgres has both for storing and generating json and i also wrote about this in the book like i think um json doesn't mean i didn't actually know this when i started writing the book but like json b doesn't mean that you're kind of schema-less or you have no structure

Starting point is 00:49:51 that can be supported forever you can always add that in later with there's a couple tools like json schema uh validation i believe i forgot the name of it now but what you can do is you can come in later and add a JSON schema compliant schema definition and then those check constraints we talked about before and actually add some kind of constraint validation to your data later, which is cool. Interesting. Yep, that is cool. Okay. You know, one of my favorite posts is from this guy named Nelson L. Hodge. he used to work at stripe um and he's talking about sql and relational databases generally and and you know they're

Starting point is 00:50:31 awesome for a lot of things his biggest concern is like there are so many dang features and some of them it's hard to know like the operational implications of and i would say like full text search aggregations things like that that work well at some scale and i'm not going to say you know where that line is but just like the the performance profile changes as you get larger as compared to like you know single item look up on a on an index value like going to be pretty consistent as you go like are there how do you think about those are there are there features you give like sort of caveats on or avoid or things like that or how do you think about those that concern yeah for sure i mean another one too would be partition tables which are like you know i kind of like like the terminology of like a sharp knife for

Starting point is 00:51:16 some of those things where it's like if you know what you're doing and you know what the trade-offs are like you can cut yourself you know like with a sharp knife you want to you don't want to cut yourself but you also like you could cut your vegetables much faster because your knife is very sharp and if you're a skilled operator um now that being said like i think a lot of people they they do learn from experience i mean i do as well i think like um something like yeah full text search uh partition tables there are capabilities that postgres supports natively and that does mean that it does have pretty good it has better documentation than it might if it was not natively supported so um like i know in the postgres documentation for sure

Starting point is 00:51:59 there are caveats so i think it's important to you know not just wild you know willy-nilly adopt advanced features but to familiarize yourself with the caveats as well and maybe design in some like observability or metrics that might help you avoid the painful parts so that you can receive the benefits as well um you know and then i think yeah like a lot of things with databases is like try to start small if you can like a real use but maybe on a lower criticality area maybe like when we're doing the huge uh index creation we do that at an off peak time um you know start small and then i think expand and that's that's how i've done a lot of like work is not only does that limit your kind

Starting point is 00:52:45 of blast radius if things do go wrong but it also on the other hand the other or the other way to look at it is it builds your confidence that you can kind of become an operator using these advanced capabilities and you have real tangible experience to speak to and then you're just sort of incrementally growing from that as opposed to going from zero to 100 yeah yeah absolutely um on that same vein of like kind of like in the postgres for everything camp right we talked about json and full text search and things like that um what do you think about foreign data wrappers yeah we i've used them in a workplace setting for helping us glue different Postgres databases. I just checked the time and realized we've been talking for a long time.

Starting point is 00:53:30 I just love talking about Postgres. Yeah, good. And your questions are great. They're obviously from a place of experience, so you're hitting on the main points. So foreign data wrappers, I haven't really used them in a ongoing context, although my, you know, people do use them that way, but certainly for like, you know, we have this Postgres data, we have this multi-services architecture at the company. And like, we have Postgres A over here and Postgres B over here. It's a really cool facility to be like, we can actually treat Postgres B over there in our a database as if it was a local

Starting point is 00:54:05 table by connecting to it but like i've done that more in a one-time sort of thing like um moving some data or or going the other direction where we want to split out some data um but i and then that's one way to look at it the other way is like it is a general mechanism where um you can have foreign data wrappers for like all kinds of things it doesn't have to just be a postgres database i think the only other one we did in the book was a file but like treating a file as a read-only table and um i think that can be a really cool um component of a of a solution to kind of tiering your data where you have old data that you may want to be accessible but you might want it out of your main postgres database but into a file

Starting point is 00:54:50 but i know that there's other companies like tembo is big on tembo's a hosted postgres provider and they have these different kind of uh stacks that they make available and they've written about some of their foreign data wrappers for external services and technologies yeah yeah that that pattern always like yeah it's kind of surprising me because like i think the benefit of one of the big benefits of just use proscribes is hey i have one piece of technology it's operationally sound i know how it works and and things like that but it's like hey if you have one piece of technology. It's operationally sound. I know how it works and things like that. But it's like, hey, if you have a foreign data wrapper, it's like, well, you're routing everything through Postgres,

Starting point is 00:55:29 but you still have your Redis or your Elasticsearch or whatever. Sometimes people are piping it through, and it's like you still have to manage that somewhere. So it's not like you've really saved a lot there. Yeah, like if it is, you know, a file is not as, you know, as obviously much less concerning operationally than like a whole system, like you said, but yeah, I wouldn't. If someone was going to be doing that, like connecting to Redis or whatever, I would definitely say like, well, you're still going to need observability on that side access to it, you're going to need to connect to it and you know check things out time from time to time yeah yeah absolutely what about sort of on that same vein is like i feel like there's this giant ecosystem of extensions that are like not quite forks but they but they change postgres like pretty significantly i'm thinking of like situs for distributed or timescale for time series or now like parade db who i talked to last week for search and analytics like yeah like what do

Starting point is 00:56:25 you think of that whole um ecosystem we're seeing there well it's it's uh it's cool that postgres the extensibility seems there seems to be like energy picking up into let's really leverage this extensibility and there's companies that are trying to make you know extensions i think are can be as simple as like a function like a string formatting function like that could be an extension you know but like you said some folks use the term super extension where it kind of changes the operational model or it brings like a whole set of capabilities to um to core postgres or community postgres and um yeah i mean i just see it as favorable as positive it's like it's broadening the um it's broadening the possible usages of postgres

Starting point is 00:57:14 and that's going to be good for some people other people may not use it but it's a great way to reach some folks that might not otherwise be able to use it. Maybe they run like the parade DB example, like they run Postgres for their main OLTP database. They want to do some full text search. They're comfortable operating Postgres like they know how to add roles and they know how to size their instances and they kind of know index design. And then it's like, well, what if we used it in this more narrow use case? We run a separate dedicated instance just for full-text search um i mean it to me it's not any more complex than saying like we're going to start with elastic search like that's a whole operationally complex

Starting point is 00:57:57 beast that has a lot of a lot of concepts to learn like a lot of challenges that can happen once you're in production and you have big indexes and bigger search volume. So, I mean, going into something like, let's take this core Postgres, extend it. You know, if I'm like an engineering leader at a company, like they're kind of in the same realm to me as terms of complexity and that stuff. And that's the thing we're going to take on. And, you know, you could argue maybe there's less actually on the Postgres path if you're

Starting point is 00:58:24 a comfortable operator there. So yeah, I think those are, those things are great. And then like, you know, more, more extensions ecosystem. Like there's a couple extensions. I can send you a list if you're not aware of these. A couple of like extensions, repositories, or websites like that, list them and categorize them and um you know helps give you a peek into like how well maintained they are and what their usage is like and that sort of thing

Starting point is 00:58:54 is a good thing and then like you may be aware of aws uh at reinvent i think two years ago launched this trusted language extensions capability. And that's another way to get some types of extensions onto cloud providers where you might not otherwise have as much free reign to add extensions that you would if you were self-hosting Postgres. So yeah, I think all that kind of stuff is good. It's like Postgres has these extension points and, you know, like it could make sense to use it in a lot of different ways. Yep. Yep. Absolutely. Okay. Let's, let's shift gears a little bit and talk about the book. Like what, what inspired you to take on this, you know, labor of love to write a 450, 500 page book? Yeah, well, it, it, uh, I guess like I, you know know i had kind of felt like i reached this um staff level

Starting point is 00:59:49 engineer level in like a lot of ruby on rails uh a lot of different companies and a lot of different teams and you know part of it is like i did want to kind of do something you know maybe a little more meaningful or impactful even outside of individual companies i was working for and i have this like energy around learning these topics and then you know how to make them you know i feel like i have the perspective of application developer and then trying to make them more appealing or accessible to a developer that's probably not going to ever pick up a databases only book like a postgres dba sort of book so yeah i just sort of felt like um like i have this background too where i also studied uh the spanish language in university had the chance to live in spain for a while and lived

Starting point is 01:00:37 with a family and there's i guess like part of my brain is like i like the idea of bringing two different communities together you know like i I like meeting people from different cultures and that speak different languages and that sort of thing. And it does sort of feel like two different cultures, like application developers and database administrators or kind of data enthusiasts that want to have high quality data. And then there's this middle ground of like performance enthusiasts, you know, where people like want to make sure that their application can scale well.

Starting point is 01:01:12 And, um, they want to understand why is this API endpoint slow? You know, if it's a database issue, not like a front end browser issue. Um, so yeah, I guess like it was kind of this amalgamation of a lot of different factors of my career spot and then having this opportunity and having a lot of work i put in to um improve our operations and then feeling like i had something to share yep yep how how did it take you to to write the book well it uh as of since we we know like today it's it's in print which is exciting um the the official start i'll have to look back i think it was april of 2022 so today is in june of 2024 so for sure more than two years two to two and a half years and in total i'd say the first draft was about like nine months i think to a year and then surprisingly i i thought it would be much faster but that period from when the first draft was about like nine months, I think, to a year. And then surprisingly, I thought it would be much faster.

Starting point is 01:02:07 But that period from when the first draft was done to when actually it went through all the technical editing and then later the layout and that sort of thing and the additional edits, that was actually almost a whole nother year, which is really wild. And I was working full time, I think, like with for most of it, I think with, you know, a dedicated push, like it could have been much less time. And also as a first time author, like I had a lot of like inefficiencies in my process that I feel like I got more efficient as I went. But it is a very time intensive task and I couldn't really afford to work on it while not working. And so I, you know, I'm really, really fortunate to have had like, you know, some supportive managers and then you know, supportive spouse and yeah, I just feel really happy that I've been able to get it done. Yep. Yep. If you like, if you could tell yourself back in April, 2022,

Starting point is 01:03:03 would you like still say, go do it? It was like two years you're going to invest into it. Did you enjoy the process and that whole thing? I mean, I think I would because fundamentally, I do enjoy the process. I write on my blog a lot. And I always have. I'm sure like you, I have a huge list of things I always want to investigate. And for me, writing about them is how i do a proper investigation like i'll create

Starting point is 01:03:26 a little sample and you know do something a little hands-on and so i write like a lot of my blog was just sort of unfocused and all over the place like whatever tech stuff i was looking at or even just like going to conferences and that sort of thing but yeah lately i i it's felt actually kind of comforting to have more of a special area that I focus within so I'm trying to really know exactly what's coming like in the next version of Postgres what's coming in Ruby on Rails you know I'm thinking about weight like on things that haven't been covered that well maybe that are like Postgres capabilities that app developers aren't using or even within Postgres like under covered sort of features and

Starting point is 01:04:05 stuff so the truth is like i always want to write and do more it's just a matter of like trying to balance that with you know also like income generating activities since i haven't really cracked that nut as much and then um and yeah just kind of keeping things in balance yeah yeah yeah any recommendations for folks that are interested in writing a book um yeah i think that most publishers if you go the publisher route of course they're self-publishing which i don't really know a lot about i think i looked in really briefly into i think it's called lean press is that what it's called yeah impressed as as one of the options that might help you um but for the publishers i worked with a couple they each have a proposal process so if you've ever worked on like a conference proposal um it's going to be somewhat similar they'd like

Starting point is 01:04:51 you to know you know who you're going to market the book to what kind of ballpark estimates you have for the size of the market and then you're trying to make a pitch basically that is like um here's why i think this book should exist here Here's some books that are similar, but don't quite overlap, you know? And then, um, and then, yeah, be prepared for a very time intensive, uh, journey, you know, and it's, it's really fulfilling to, it feels like a privilege it did for me to like have the purpose to go deep on certain topics and really, you know, turn over all the stones, but like it also,

Starting point is 01:05:28 you know, it comes, it needs to be balanced. It comes at a cost of like, it takes away time from other activities too. Like whether it's working or, you know, your family,

Starting point is 01:05:34 like I had to cut out some things temporarily, you know, and then just kind of had the mindset like this will come back eventually. I got to just get through this. So yeah. Yep. Absolutely. I totally agree. That's great. Cool. Let's do some quick fire questions that we always end with here first of all if you could

Starting point is 01:05:50 master one skill that you don't have right now what would it be um can these be non-technology or yeah absolutely anything you want yeah i know you always you asked these from watching past ones and i even wrote out some answers but now i'm blanking on all them um i i started to learn the guitar and covid and learn like a few chords and i'm like this is really fun it's relaxing i wish i could play the guitar but i can't i can just play a few chords yeah yeah cool that's good one what wastes the most time in your day um probably myself just using uh social media and you know like not necessarily being disciplined and having a good game plan and so i think i like to have you know meetings and structure and goals and that sort of thing and if i don't i i'm pretty good at wasting time wasting

Starting point is 01:06:43 yeah yeah has that been hard in the shift to like independent consulting like dealing with with just some of that time management stuff yeah i think so in that like it's it's not actually that i'm just mindlessly browsing it's more like i'm i do think i'm a creative person and that i always have ideas i want to write about and explore and things and i can easily just work on a blog post for three hours that doesn't go anywhere you know but like and so i think um i think having a purpose whether it's i'm just actually starting to dabble in paid writing so stay tuned on that but um if it's if it's writing like having a purpose like i'm writing this for my blog i'm writing this for a third party or if it's like conference stuff it's like i'm working on a presentation or whatever. But having some structure, I think I'm getting a little better at trying to even just interrogate myself.

Starting point is 01:07:31 Like, why am I doing this? Yeah, exactly. Cool. If you could invest in one company, not a public company, some sort of private company, who would it be? My wife would not go for me putting a large amount of money in nvidia like not even a long time ago but even just a few months ago but like you want to i kept thinking the whole time i'm like no way it's going to keep going up and then it's like just so painful to watch it like to go just higher and higher and higher like in that like i wanted to put money in

Starting point is 01:07:58 but um uh let's see well this is maybe uh uh maybe this is a unsurprising answer, but also a little surprising. But yeah, as far as private companies, like, um, I'm, I would consider myself friends with the founder, Lucas Fiddle of PG analyze, and I'm not, I have no financial stake in the company or anything, but we we've caught up at a couple, um, conferences and we kind of stay in touch pretty regularly through the postgres community and i've been now as a consultant using the product with clients and i really see the value i think it's a great product i think it's really cool like they've decided to really go deep on postgres specific observability the kinds of things that

Starting point is 01:08:39 if you're really operating postgres like and you and you have the real need, like it's got great visibility into things like auto vacuum runs and stuff that other tools might just be like, that's too much in the weeds. We care more about the instance. So I think they're great. And they're a bootstrap company. Lucas has talked about that. I'd love to see them just continue to be more and more successful. Yeah, cool. That's great to hear. Which person influenced you the most in your career? Oh, man, I know this is a standard question too. And I had a really hard time thinking about a specific person. Um, I guess like there's just been, you know, I've had like engineering managers or senior engineers at companies in my career experience that have been, um, influential. Like, um like one person I've talked about

Starting point is 01:09:26 is this guy named Paul Berry who I'm not even I'm not sure exactly what he's up to now I believe he's still a CTO at this startup that he's part of and but you know earlier career like as far as the database stuff like someone who you know had really good mastery of writing advanced sql like being an operator being able to go and terminate queries that were out of control and things like that um yeah i'd say people that are that have those kinds of like um hands-on you know somebody that's going to be like oh let's just let's just hop on a computer and figure this out like you know together they they want to um be hands-on but share their knowledge like some of the engineering folks in my career those have been the people that have stuck with me you know okay five years from now will people be writing

Starting point is 01:10:14 more code or less um well i mean it's an interesting question because you know the idea is with a lot of code generation tools and ai that you might write less but it also could create new opportunities with um new types of businesses and products and companies that didn't exist before and um i mean i guess like i just feel like i gotta say it's a wash because it's like i can see a little a and a little b it's it's true yep yep all right i have a special quick fire question for you as a fellow Timberwolves fan. Oh, yes. Yes.

Starting point is 01:10:49 Should the Timberwolves trade cat or keep him? Okay. I can't. This is amazing that you asked me that, by the way, because even amongst my Timberwolves, I'm a very long, I grew up in Minnesota. Like I grew up with, like I had a Pooh Richardson, probably no one knows, like a poster on my, above my bed, like like as a like a 10 year old yeah that's good and um you know i was like uh in the 2004 western conference finals era

Starting point is 01:11:14 of like kg and that sort of team like so i've been i've been a long time to i did take a long break because we moved away from minnesota but i've been back since that's a good time to take a break because yeah i'm like the worst but um we moved back actually to Minnesota in 2015 which is Kat's rookie year and um hopefully he never watches this podcast because I do sort of feel like it might be good to trade Kat I think with the structure of the team now like he's an extremely valuable offensive player that will also be extremely valuable to another team. I just think that like possibly for the composition of the team,

Starting point is 01:11:51 because NBA is all about like matchups and, and like he, he does generate a lot of offense. So we would need a lot of offense to replace his generation. But I just think that like, it might be a good time to make a change. He's been with the team for, I believe eight years. And yeah. Yeah. Yeah. yeah I know I go back and forth even in the playoffs like

Starting point is 01:12:09 he'll have those crazy frustrating games I'm like oh my gosh just have a 60 point game and then they'll have like a you know a 10 point game and it's it's uh and six fouls and like of the dumbest fouls you've ever seen and yeah but yeah he seems like an amazing person and he's had a lot of tough breaks in his personal life and stuff and i you know and he's been a huge minnesota advocate for being a relatively smaller uh city and that within the nba within the country and stuff and that's great but i am i'm slightly on the like let's explore the right trade camp which is controversial even amongst local Timberwolves fans. So it's really funny you asked. Yeah, it is tough.

Starting point is 01:12:48 Andrew, it's been great to have you on. Tell us a little bit more about the book, where we can find it, where people can find you, things like that. Yeah, thank you. Yeah, so High Performance Postgres for Rails. It's available from all the major places, Amazon and whatnot. The publisher is Pragmatic Bookshelf, and whatnot the publisher is pragmatic bookshelf and so their website is pragprag pragmaticprogrammers.com where you can buy the book as well

Starting point is 01:13:11 it's also on bookshop which is like a more of an independent bookseller and i blog at andyadkinson.com and then these days i mostly use twitter X, but I have social media accounts in most places, but I mostly tend to hang out there and post about Postgres and rails things and conferences and stuff like that. Yep. Cool. Well, as I said,

Starting point is 01:13:33 like totally recommend the book, even if you've never written any rails, I'd say if like, if you're in that application DBA or accidental DBA camp, I'm just like, Hey, you want to ship some features, but you also want to understand what's happening under the hood.

Starting point is 01:13:44 I think it's really, really good for that. So, uh, yeah, yeah. We'll have a link in the show notes and all that stuff. And Andrew, thanks for coming on. Thank you. My pleasure. Thanks a lot for the opportunity.

Software Huddle - High Performance Postgres with Andrew Atkinson

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.