Software Huddle - High Performance Postgres with Andrew Atkinson
Episode Date: July 2, 2024Database performance is likely the biggest factor in whether your application is slow or not, and yet too many developers don't take the time to properly understand how their database works. In today'...s episode, we have Andrew Atkinson who just released a new book, High Performance PostgreSQL for Rails. Andrew is one of those "Accidental DBAs" that fell into learning about database optimization at his job and ended up getting pretty dang good at it. In this episode, We hit Andrew with a bunch of questions about using Postgres properly, including how tight to your schema should be, how to handle Postgres operations, and how to think about performance testing with Postgres. We also got into some aspects about the process of writing a book and what he recommends for others. If you're building an application with a relational database, this book will help you. It has real code repositories so you can walk through the steps yourself, and it teaches the concepts in addition to the practical aspects so you understand what's happening.
Transcript
Discussion (0)
fundamentally, I do enjoy the process. Like I write on my blog a lot and I always have,
I'm sure like you, I have a huge list of things I always want to investigate.
And for me, like writing about them is how I do a proper investigation.
How did it take you to write the book?
I'd say the first draft was about like nine months, I think, to a year. And then surprisingly,
I thought it would be much faster. But that period from when the first draft was done to when actually it went through all the technical editing and then later the layout and that sort of thing and the additional edits, that was actually almost a whole nother year, which is really wild.
In terms of consulting, are most of your engagements with Rails and Postgres specifically, or is it even like Postgres more broadly for things like that?
Yeah, well, I've just started to sign some client workup that is outside of Rails. specifically or is it even like postgres more broadly for for things like that yeah well i've
just started to sign some client workup that is outside of rails like they're not using ruby on
rails at all and um but they're using postgres and um yeah i think that's kind of where i want to get
to i feel like there's this giant ecosystem of extensions what do you think of that whole um
ecosystem we're seeing there hey folks this is, this is Alex got another great Postgres
episode for you today. Got Andrew Atkinson on who just released high performance Postgres for Rails.
He's great. I think he's super useful if you're like building in some sort of application framework
using Postgres a lot and just want to know more about performance optimization and scaling and
operational stuff, all those sorts of things. He calls himself accidental DBA or an application DBA. And I just think it's really fascinating.
So we talked through a lot of just like different Postgres features, what you should be using,
what you shouldn't be using, performance tips, all sorts of things like that. Really great.
Hey, check out the book. I bought the book. It's awesome. It just came out. It's really great if
you're sort of in that world. So as always know if you have any comments suggestions questions feel free to hit up me or
sean if you have any guests that you want on feel free to reach out and with that let's get to the
show andrew welcome to the show hi alex glad to be here yeah so i'm excited to have you on kind
of continuing our postgres series here and you are the author of High Performance Postgres for Rails, which is printing as we
speak. By the time this episode is released, it'll be available, you know, probably next day
shipping or two day shipping from Amazon, whatever. So get that if you want it. But
you do some Postgres consulting, but maybe for everyone, everyone else here, give us a little
bit of background on who you are and what you're up to. Sure, yeah, I'd be happy to.
Well, I've been a Ruby on Rails web application developer for more than a decade
and done some other things as well within web applications like other languages and frameworks.
I also did some mobile development.
But the last kind of handful of years, I've been more of a backend developer.
And then I've been really enamored with Postgres for a number of years I've been more of a backend developer and then I've been you know
really enamored with Postgres for a number of years having the chance to use it at higher scale
running into a lot of common issues and I decided at some point to sort of specialize in that
within the backend engineering realm and specifically for web applications and just
kind of understanding how to maximize our use of Postgres.
And now I'm doing that as a consultant for companies that want to hire me to help them out.
Yeah, absolutely.
Tell me, can you tell me more about like the origin there?
And like, how did you get into it?
Like, did you raise your hand when it came up like of, hey, database issues are sort of our problem here at the company right now?
Or how did you get into the deeper Postgres stuff?
Yeah, I mean, I think it's, it's, it was definitely a little hand raising,
it was having opportunities, of course, like that, I guess, preceded the hand raising.
It, you know, it was kind of like trying to look at challenges as opportunities,
technically, but also to be helpful on the team and sort of compliment a team that,
you know, sort of where I cut my teeth mainly was a small team. And we had, we could not everyone on the team could do everything. So I was sort of like, well, I'd love to actually help more in
terms of our scalability at the time, one of the past teams I was on with the database operations
in particular. And it sort of scratched my personal
itch on, you know, learning to read query plans, learning to design indexes and that sort of thing.
But then it was immediately useful and helpful for the team. So that was a big part of it. And
then I think also just like philosophically, as I've worked at, you know, like there's things
that sort of line up with my, my kind of perspective
in life. And that happened to be how databases work too. Like they're, you know, they, they sort
of hold the company's most valuable asset in data. And I like the idea of working close to that,
but then there's this interesting challenge too, where as an application skills, you also need to have great performance to hand the data back out to clients, to client applications.
And that's really an interesting challenge to both make sure that it's consistent and preserved properly, backups and all that, but also can be served back.
And so I just think it's really interesting.
It's really practical. Every company I just think it's really interesting. It's really practical.
Every company I've worked at faces these challenges.
And so, yeah, I just sort of felt like, you know,
I really want to do this better as an engineer.
And what do I need to do to get there?
Yeah, yeah, for sure.
I empathize with that.
I have like a similar road, you know, background stuff,
but just got more and more interested into the database stuff.
Because I think it is really fascinating. Like the end stuff gets a little repetitive after a while but
the database stuff i think is just like so fascinating like how many different areas there
are and it sort of like makes sense logically but you just need to like you know understand all
these different pieces and then and then think how they they go together and i like how you called it
uh on a different podcast like the accidental dba or like an application dba of
like hey you sort of accidentally fell into it and now you're sort of like the person but you're
still like very application focused as well right like you're you're you're a dba but you're not
just like you know ivory tower like sitting in in in the you know in the back office like writing
the queries and all that stuff you're like doing it in the context of of shipping some stuff and
and you know um understanding user needs and things like that, which I think is great. Thank you. Yeah. Yeah, for sure.
Like, um, I learned that term from charity majors, the accidental DBA term. And then I also learned
this term from hockey Benita, um, app application DBA. And that was kind of like a light bulb moment
for me, both of those things where it's like, like yeah you can kind of accidentally fumble your way into this from real needs from an application that's scaling and needs to write and you know
retrieve data quickly and and then you also can say you know i really like working with this part
of the tech stack happens to be a database maybe you don't tend to do so much like you know security
and backups and restores and stuff but certainly the applications
usages of the app of the database like mostly around indexes and schema design and uh data
modeling and that sort of thing like that's that's critically important for the success of the the
growth and and performance of the application as well yep yep absolutely well on the book i want
to say like the book is great.
I got the early access version
and went through it.
I think it's great.
I want to talk about it more,
but like I thought like it was useful
for lots of different types of people.
And I would say like three ways,
like, hey, if you're like
where you were maybe seven,
eight years ago,
where I was like five, six years ago,
and it's like, hey,
you're starting to get interested
in the database
and want to know more
about how that works.
Just reading it straight through is just great, great right because it has this like sample application and you're like actually running these commands and understanding
these concepts but like applying them well i think that's that's super useful number one
i think there's like a second group where it's like hey if you're having operational stuff or
scaling stuff you see that on the future and you want to know some of the patterns for thinking about that better. I think that's like a great reference material for that. And
then just like, if you're having specific performance issues, man, appendix one, I think
it's appendix one. It's just like this amazing resource to like point you in the right direction.
It's like, Hey, if you're having slow queries, go read like these four chapters, right? If you're
having a high rate, right latency, or you're having invalid data, right? Cause your scheme is not
tight enough. Like go read these specific chapters like i thought that was that was super
useful as just like i'm drowning right now and i need something to help me out i don't have time
to read through the entire thing today but like if i could just like fix that problem that'd be
great as well so i think i think like on all those things it's it's really a great book thank you
yeah that's that's really great to hear and the field, we called it the field guide at the end, the appendix one that you're mentioning.
That was actually like a late edition.
And it was based on some reader feedback that was like, hey, what if I don't really want to read the book front to back?
Which I get, I mean, a lot of tech books.
So they were like, what if I have specific issues I want to dive into?
And that made so much sense to me.
And I was like, why, why did we not do this earlier?
Like, but yeah, I mean a lot of, of course, like a lot of, um, you know, manuals and things
you pick up or they have like a troubleshooting guide and, um, yeah, it's great if it gets
more people into the book that way, you know, even if they don't read the whole book, like,
but they can get some value out of starting from a specific issue that that would be a great outcome yeah yeah and the one thing i
would say too is like this book is high performance postgres for rails i have never written rails in
my life i've written like 10 lines of ruby in like some chef scripts once that i inherited for
someone and that's it and it was still like super used it's not like super rails heavy to where it's
not going to be useful to anybody else so if you're like an application dev you want to be an application dba it's somewhere in that realm like definitely check it
out um like in terms of consulting are most of your engagements with rails and postgres specifically
or is it even like postgres more broadly for for things like that yeah well i've just started to
sign some client work up that is outside of rails like they're not using ruby on rails at all and um but they're using postgres and um yeah i think that's kind of where i want to get to
and i i guess like both in writing the book and then in launching consulting which i've really
just been doing now for not quite six months um you know i just felt like ruby on rails is my my
homeland so to speak like it's what I know, so I would start there.
But I do think that a web application, especially if there's an MVC type of framework with an ORM,
whether it's PHP and Laravel or Django or it's Node and Prisma or that sort of thing,
I'm sure that there's going to be some of the same problematic query patterns
because of this kind of object relational sometimes mismatch
and that sort of thing.
And so I do think I could help teams out that use Postgres,
that use other web application frameworks.
And I'd be happy to do that, of course, both on the database side
and then also, if needed, hopping into the code base and that sort of thing.
Yep, sure.
For most of these engagements, what are the developers like and what's their relationship with the database like?
Are they curious but just need a guide?
Are they indifferent and just like, hey, somebody come fix this for me?
Where are they sort of at with that?
Yeah, I've worked with teams that have um you know
somewhat to some extent i get it like i've also been on the other side and have hired
uh i was part of a team at a past company where i was an employee a full-time employee and we
hired a database person outside to come in and take a look at our stuff and help us out and whatnot
and um we they had sort of an a narrow insulated perspective into the company
and i i think like practically speaking that is usually how i start with a company is like i do
have somewhat of a narrow um perspective so i don't necessarily know a lot about the dev team
beyond people i'm initially or i'm i'm meeting one-on-one or might be part of a kickoff or that
sort of thing but i certainly have had you, it's been very fulfilling to have sometimes, um, like
I recently started helping out a company where someone told me right away, they're like,
Oh, Hey, I bought your book and I'm a big fan of some of your stuff.
And I'm, I'm really excited that you're, you're here.
They weren't the person that the, um, they weren't the person that directly hired me,
but they were part of the team.
And I felt very good.
Like this is going to be a positive engagement for all of us, right?
Like I'm going to, this person's kind of primed to want to learn more like they're, you know,
they, they have, um, you know, they have observability in place.
They have some query issues.
They have some indexes on their tables, but they're still slow queries.
They're not sure why
they're wondering about different schema designs like should they should they add their tenant
identifier column everywhere like you know different things that can lend themselves to
opening up improved performance and so i think like in those, those are, those are great. I think I also though have been, um, hired where, um, on some teams where they, they're
pretty good.
And I think they're sort of just looking for, in terms of their skills and their experience.
And I try to call that out right away.
Like I, you know, in some cases, like one client, like I didn't really actually after
a bit of time, like I didn't really think I was super needed by them and i think that's fine too like there could
be a i know another consultant i follow kind of mentioned like the project that he was on arrived
at a natural conclusion like because he was helping out with performance you know performance was
generally improved it's not you know there will be ongoing opportunities however you know if you
make a big improvement for a company like it might sort of just reach a conclusion so yeah i'd say i
have kind of a spectrum of experiences there yeah yeah absolutely what do you have any like common
mistakes that you see um kind of over and over is it the same couple things or is it is a pretty
wide variety um yeah i guess like a few things would just be um over indexing
you know i i know we have this field in a where clause in a sql query somewhere so i added an
index on it but it's not used at all by any queries um under indexing you know like these
queries are slow we've got two indexes on the table but you know that it might actually benefit from having around
10 based on the queries that how the table's accessed how the row data is accessed um you
know never tuning any parameters which you know i i think isn't the very first thing you should do
in my experience but i think having awareness of what the tunable parameters are there's there's
about a dozen or so that are in my kind of core list
that I would recommend. Never archiving any data or really talking about any kind of data retention
policy. I think it's hard because you got to usually have like a product discussion with
kind of sometimes with a lot of stakeholders about like, you know, changing a system that has
access to data forever,
but it's years and years old and there's years and years of data that's never actually accessed,
but it could be. And having that conversation, I'd say those are some of the patterns that come
to mind. Yeah. Yeah. Okay. Let's dive in. I want to dive in. I like a bunch of questions,
but these are all some good ones. So you mentioned over-indexing. And so if I have too many indexes, is that mostly going to show up with, with right performance, just worse,
right performance? Cause it's, it's hitting all these different indexes. Yeah, it's right
performance, but I actually think it's more about, it's, it's a little bit of a symptom of,
you're not really able to tell that your indexes are not being used, you know? And, um, the,
and you know, if it's, it's unlikely that sometimes i think folks
think that they might be adding a benefit to having an index that could be used but it's if
if a certain amount of time has passed and the application is pretty well stabilized um and you
know you're not i'm not talking about a situation where you're like in a rapid scaling period but
if you have a pretty stable application that's mature it's a likely a b2b kind of sas app and you added an
index years ago that's never been used like i think it's very unlikely it's ever going to be
used and you know and i think sometimes there's a you know a little bit of a mindset shift in
this this actually is worsening right performance a bit, but it is providing no value.
And the other thing that can happen is for heavily updated tables is there can be very high bloat in indexes, and they can grow.
And it's not uncommon, and it can be very surprising to people that indexes can actually exceed the size of the table if they're never rebuilt. And so what can happen is besides the right performance,
which is an ongoing concern, as you start to do,
you create backups of your database and you want to restore those
for test environments, or if you start to become more concerned
about your ability to restore from a backup and how long that might take
those are things where you know you have this unused index that is possibly very bloated and
very heavy in terms of how much space it's consuming like let's let's get rid of it and
i'm talking about like for clients i've met with it's not uncommon to have dozens or even hundreds
of unused indexes oh wow yeah you know if they've got a database with 500 indexes,
it's not uncommon to have 200 that are unused.
And usually it speaks to just more of like,
you know, we're just kind of spraying
and praying indexes here.
And we don't really have the observability
or the skills to know that they're not being used.
Yeah, yeah.
Do you, you sort of mentioned like,
hey, if you've had an index around
and haven't written to it,
it's probably, you know, in the last couple of years, it's probably, or haven't used it in the last couple of years, it's probably time to drop it.
Do you ever like do you recommend having sort of currently unused but potentially used in the future for some feature index?
Or are you just like, hey, let's create that index once we specifically know we need it?
Or is that like too expensive an operation to do even concurrently that you'd rather like sort of build it early on and keep it updated up front?
Yeah.
I mean, you hit on all the points really.
I mean, I think that the, it's, um, it's tricky to answer because I'd say my, I have like my really quick answer and then my like Cadillac you know deluxe answer my really quick answer is um if
you know I'd say just if if you're really concerned that the index may be useful and um and be used by
queries but you're not yet able to determine that concretely and you don't currently have a use case
for it but maybe you have a feature in development that you expect it will be useful. It's a big table. I mean, you know, you can keep the index. What I would try
to do is get the observability in place that will show that it's unused. And then on an ongoing basis
that it's unused, even after your feature launches. That's my like attempt at a short answer. My
Cadillac answer though, is it's, it's more of a holistic problem. We want to get like better observability in place that shows all of our indexes that are used and the ones that are
unused and then we want to like develop kind of a hygiene approach where it's like we're always
kind of removing the well i mean usually the unused indexes is kind of a big one-time change
but then if we have the monitoring in place then as as, you know, application code changes and evolves over time,
we also can, you know, do a periodic maintenance kind of check and make sure we don't have a lot
of unused indexes. Yep. For monitoring unused indexes and things like that. Is that like,
you know, a script I'm running every once in a while? Is that PG hero or something like that?
Or like, how are you monitoring those? indexes? Yeah, you can DIY it,
and you can certainly check within the Postgres system catalogs
when indexes are scanned.
That information is tracked.
So you can check periodically and even capture that data you want
if you want to store your own scan data.
You could say, go through all my indexes now,
check their scan data. You could say, go through all my indexes now, check their scan
counts. And then what you could do is you can build a metric or a timeline view as to a particular
index or set of indexes that haven't been scanned over a particular set of time. And you can do that
both on your primary instance and any replicas that you have. And then like pghero is an open source tool that does some of that for
you so it essentially comes with a bunch of those queries baked into it that are checking against
your system catalogs or another tool that i've used with a couple clients that i'm a big fan of
it is a commercial tool though is pg analyze and pg analyze will you set up an agent and it will consume your log files it'll perform
some of those types of queries and it'll actually turn turn that into sort of like work units and
it'll say here's an unused index and it's actually an alert and then you can have a little bit of
workflow around that and that kind of thing so i i'd say, and of course that tool is going to cost money on a
monthly basis. So it makes sense for companies that are heavily using Postgres and looking to
squeeze as much as they can from it. But if you're just kind of getting started, I think
running those queries by hand is perfectly reasonable.
Yep. Yep. Okay. You mentioned rebuilding indexes. Is this similar to vacuum?
Is it contained within a vacuum?
How does sort of rebuild and vacuum, how do those work together?
Yeah, it's similar to vacuum in that the idea is to efficiently store your physical row data or to store it physically efficiently.
So what are like logical rows in
a table you want those to be stored physically as efficiently as possible without gaps and that
sort of thing which vacuum doesn't do but it makes space available for reuse vacuum without
any additional commands there is a vacuum full command that will actually rewrite your table
and that would be the closest that would be the closest corollary but we'll talk about why it's not easy to do but
vacuum full would be the closest to a re-index operation on your index and that it's going to
rebuild your index and any references any index entries that would have pointed to row versions
that no longer exist or are reachable those are going to be gone because
it's just going to rebuild your index from the current point in time and it's it's pretty if you
have a table like it is like it's always worth mentioning to people because um first what i
recommend is you get rid of all of your unused indexes but then i do recommend that you you know
if you have a database that's been around for a number of years and you have tables that
especially are heavily updated or there's frequent batch deletions from them to go and take a look at
your indexes and you can also check out we've got queries in the book and you can find these easily
online that help determine the kind of bloat percentage or the estimated percentage of your
indexes how much of them are non-useful content, basically.
And it's pretty wild how sometimes they can actually be cut in half
in terms of size.
And then that helps both with, again, with space consumption
and access to the indexes.
You're just accessing what you need from the index more efficiently.
Yep. Yep. Cool.
Okay, we dove right into the deep stuff right away.
I'm going to go back to some warm-up questions i had um so one tip you had in the book was around hey single column indexes and you said like hey single column indexes often
is the right choice to just use like hey just index that one column for me like my dynamo brain
i always think of like partition key and then sorting. So I think of that.
So it's weird not to have like, hey, some sort parameter on that.
I guess like does this advice, is that single column advice mostly for if you are getting
just a single row at a time, like maybe getting a user with this particular email address?
Or does that also apply when I have some sorting in there?
And I'm thinking like, hey, an e-commerce store where they want to get the orders for
a customer, right? And so like i'm indexing on that customer id but in that case should i be using a
multi-column with like a sort on the created ad or something like um date yeah it's it's a i mean
you're thinking about it in the right way and um i think like having it's like what I found is I start to develop a nose for,
you know,
sort of like what a good candidate multi-column index would be like based on a
query or a situation where it's not going to be so beneficial.
Like a single column index will provide the same value maybe because there's a
lot more columns that need to be scanned or that sort of thing.
Or maybe there isn't an ordering and the query or that kind of thing.
And so there's going to be that second heap fetch phase after the index scan
that's going to go and get all the other column data that we need.
But yeah, I mean, for single columns, if you have an order by operation,
it's possible that you are in and a filter. Like if you're filtering on all the rows with a date greater than a certain value and your query is ordering by that, a single column index on that column might work out great.
And you can even control the column direction in the index too.
So if you're sorting by order by descending, you can set your index entries up that way as well
in your index definition but if you do have like a where clause that is filtering by an attribute
on the table and and then you have an order by for a second attribute that could be a good spot
to have your leading column be your filter column and your second column be your ordering column and um the what i try to help
clients do what i always do myself is try to get as close to production in terms of row counts and
instance resources that i can if i'm if i'm not able to kind of validate a design on production
which you can do but uh you know like most companies don't want
to do that um but if we can get like a pre-prod postgres instance then we can if we can get the
sql query with representative params then we can actually just try both indexes and run the explain
plan and and see what postgres likes better yep yep on that same note okay i'm gonna i'm gonna
skip a little ahead a little bit like but how do you think about performance testing and things like that with with a relational database
when you have there are just so many factors like you like you're saying you can replicate
the row counts and things like that but replicating the concurrent queries that are going on against
it maybe even like the bloat stuff depending on yeah how much is happening yeah yeah like how
how do you do sort of accurate performance testing?
Or is it a lot of looking at the query plan and understanding that?
Or how do you think about some of those things?
Yeah, I mean, it is really, really hard.
I don't think there is a great solution to have a really super 100% high fidelity
pre-production performance testing environment.
So I think like with databases, it's helpful or with like Postgres anyways, it's helpful to know
the kinds of things that influence outcomes and then basically kind of zero in on those,
you know, like a lot of times what I look at is the, you know, the cardinality of a column,
like the unique set of values and then how selective a query might be
and how selective an index definition might be. And there's this kind of rule of thumb that I
always kind of kick around in my head. I heard from somebody, it's just, this is just like not
a low, not a very sophisticated rule of thumb, but it's like, generally when we access the table,
even if we have not the same sort of row counts, but depending on, well, we do generally want to have similar row counts.
But I tend to think of this 30% figure as a proportion of what we're trying to access and what the total row count is for the table so if we're trying to access something that's like 30 or less
it's likely the query planner is going to choose an index scan over scanning the whole table and
you know in terms of like in terms of performance problems i mean commonly the the main problem is
indexes that we're hoping are used or we're not able to design a good index are not being used.
And so the full table is being read and, um, you know, that's where we might start from. And if we
can do, even if we don't have a perfect fidelity pre-prod environment to test in, but we know that
the problem is in production that the full table is being read and then it's filtered down to five rows or
something from you know millions um then as long as we can simulate that as a starting point uh
and we can get an index that is you know matches kind of like a narrow set then what we can do is
typically what we can do is be pretty confident that if we can recreate the slow query in the
pre-production environment, like i.e. the full table scan, and then we can design either
sometimes through a combination of query changes, but a highly targeted index using tactics like,
multi-column indexes, partial indexes, indexes on expressions expressions different tactics that could be very specialized for a
particular query but like as long as we validate that that query is then used and lessening the
cost of that in the pre-prod environment then usually it's like almost always the case like
let's add the index and prod and the the problem goes away but we can also we can also of course
add the index in production and be super confident that the plan is going to be set.
And then, of course, we want to have observability, too, that tracks what the query plans were for a query over time, which we can talk about as well.
Yeah, yeah.
If I'm adding an index in production and doing it concurrently, is that going to take up a lot of resources?
Like if it's a big table, is it going to take up a lot of resource? Like if it's a, Hey, it's a big table. Is it going to take up a lot of resources or is it just going to
use a smaller amount of resources and just take longer? Or like how worried should I be about
adding a new index to a, you know, a decent size table? Yeah, there are some caps in there. Like
you'll have, um, you can only do one per table concurrently. I think if you start a second one
from another session, it'll get canceled. I haven't tried it lately. But you can only do one. And then
you have maintenance work memory as one of the Postgres parameters that will
control memory
for maintenance operations. I believe that is
maintenance that's used for, or that memory allocation is used for create index
concurrently
but i i tend to actually see it elsewhere like for vacuum operations and that sort of thing
so i'm not 100 on that i might be wrong about that um but generally no i mean that is what we
need to do for big tables like for example i've got a client with a 600 million row table it does
take like can take like a couple hours to build an index. So we might set up a session with Screen or Tmux
and manually kick off a crate index concurrently
in the background, sort of monitor things.
There's going to, of course, be IO and CPU
that's going to be used during that period.
Yeah, absolutely.
All right, let's talk about schemas a little bit
and just basically how strict should I make my schema
is the,
is the overarching view here. But, um, some specific questions like, should I allow Knowles
Knowles on my, on my columns or should I try to avoid that as, I mean, obviously sometimes you
need it, but like, should I, how do you think about Knowles? Well, yeah, I mean, I generally
try to default to being restrictive, I guess. I mean, I think that's like the, the, the DBA mindset
is as many restrictions as possible, because restrictions breed better performance typically, and actually sometimes more information about your data and sometimes better quality data and that sort important to the application's goals, then from a read perspective,
then that's a good spot to think about partial indexes
where we could, if we don't need to,
like if maybe it's information we don't yet have,
but we want to capture a row,
but that particular field is not set yet.
If we're not querying on that,
then we could exclude those using a partial index
and say, give me all the rows,
but where this value is not null.
Yep, absolutely. What about foreign key references should i always use those yeah i i think this is funny i guess because it it seems like for some people it's or it seems like it's controversial
that foreign keys are onerous and there's certainly people recommending not using them
i mean typically what i see is i I'm an advocate of foreign keys.
I think they should be used. I think if you truly are experiencing,
you know, if you have the type of scale where you're moving beyond
one instance and you can't use them, or if
you can, or if you have the type of scale where
you're concerned about the impact of the foreign key constraint enforcement, and you've done the things like add indexes to foreign key columns and that sort of thing, or make sure that cascading deletes are supported by indexes.
You probably also are at the level of sophistication where you know how to monitor their impact
and what the benefit is in removing them.
But I think most people that I've worked with,
they're not there.
It's more like there's maybe some fear,
uncertainty, and doubt around using them.
And so what I try to do is encourage
the use of foreign key constraints,
check constraints,
with the emphasis being on
having the best quality data
that you can
get at the front end, helping describe your data for other use cases, like data analysts that want
to use it later from your relational database. And they're working in, you know, BigQuery or
Redshift or somewhere else. And they're like, I don't, this data is a mess. Like, we're sort of
like setting ourselves up for success if we have good quality data on the front side. So yeah,
I generally recommend them. Yep, yep. What what you mentioned you know um cascade deletes and things like that
is that something i should be worried about um you know the downstream impacts of deleting some
top level items and then just like deletes 12 000 items elsewhere or um is that overblown as an
issue um well i guess like it is something that should be on your radar
if you do take advantage of cascading deletes.
I mean, deletes can catch people off guard
that deletes can be a somewhat expensive operation
in Postgres on big tables
if you're deleting large ranges of values.
Because of the multiversion concurrency control,
there'll be a new row version that's created.
So your old row version needs to
be cleaned up and that can create the fragmentation or the bloat problem and that sort of thing and
if you do have like you said if you touch a bunch of tables and um yeah like it's a it's a situation
where you want to know how to look at the queries that will be generated and use the explain, look at an explain plan and make sure that they're going to be when they're
accessed via the cascading deletes.
So that's going to be an index supportive query if it's a big table.
So yeah, it's not, it's not like, I think if,
if you're dealing with tables like in the sub 1 million to 10 million row
range,
and you got a few that's doing cascade deletes on a reasonably
sized modern instance you're probably fine but i think if you get into the you know if you're
doing big delete jobs on 100 plus million row tables it should be on your radar as something
to be aware of yep yep what about check constraints um how often do you use those
do you use them for like all kinds of application
level validation or do you say like hey rails can handle a lot of that um yeah how often do
you use check constraints yeah i would say that you know as i've learned more about postgres over
the last few years like it's it's something that i'm trying to advocate for more usage of because
i'd say in my real world experience i don't really see a lot of usage of them. And, you know, like I first learned about them more as like a intermediary use case with
where you want to introduce a constraint that doesn't let you add it only for new row changes.
And you can do that with check constraints. So you can use check constraints kind of as a
transitional element, which is cool. But you can also use check constraints as a data quality check mechanism
by writing a rule that kind of describes what you expect. And so as a fan of them, I kind of think
they should be everywhere on all the tables, like anywhere you have an application rule.
Like why not throw a database level check
constraint in as long as you're capturing those post what will be postgres errors when they
violate the constraint as long as you're capturing that in the application and handling that gracefully
which is you know does take some work i i think they're a great tool yep yep yeah i've sort of um
have you heard of tiger beetle in that database? Uh, just very recently, but I don't really know anything about it.
Okay. Yeah. I would say like they're unique in a lot of ways. And one of their, their thing,
like, you know, they have this thing called tiger style about how they write code and things like
that. And one of the things they do is, is they use asserts very heavily in their code. Like all
they're, they're like asserting all these conditions, you know, and this is actually
in their application code, but it sort of makes me like think about that with postgres too it's just like
you know why wouldn't you just sort of validate that um you know the i think an example in in
the book of like hey the the ride completed at is after when the ride started out right so you just
don't have some wonky data that way and things like it's like why wouldn't you just make sure
your data doesn't get in a bad bad state and in some of those different ways even if you have the application
level checks just making sure elsewhere um so i've come around on like hey more more asserts
more checks more just ways to like validate that you're not putting garbage into your database and
causing a problem later yeah and i think if you that that's a thanks for bringing that example
up because it's an interesting one because it's it, it's, you know, it's, I think that like for newer,
newer folks that don't think about data quality as much, or maybe haven't really been burned by
that as much in their career experience, it might be a little odd. It's like, well, of course,
like, can't we expect that? But like, it's, it's a little wild when you think about how it's like,
no, you can't expect anything. Like you don't know what kind of data you're going to get on the front end from the application. Right. And, and sometimes it's not like it's malicious, it's just errors, you know, and that sort of thing with like forms or just, just human data entry, you know, whether it's even outside your company or inside your company. And so it's, it's almost like sending a message in a bottle to the future, you know,
to me, it's like, here's what we expect. And I do think like, if you know, those tools are there,
it might not make sense to add them all right away, you know, because maybe it does a little
heavy handed, but if you start to see some issues like, Oh, we got our timestamps out of order,
like they're nonsensical. It's like, well, let's fix it once. And then let's put the check
constraint in. So it never happens again.
That's kind of my mindset.
Yep.
Yep.
Yep.
Okay.
Another feature that's kind of interesting, and I think you see mixed opinions on it, are like functions and triggers.
Right.
Where like stored procedures kind of out of pretty out of favor for a while.
When should I use functions and triggers?
When should I have?
Should I avoid them?
Yeah, it's hard to provide generic evidence
or advice on this, I guess, to be frank,
because, yeah, I mean, I think that
sometimes what I tend to experience is
if I'm bringing more database awareness
of different capabilities,
I'm probably not going to go with functions
and triggers first
it's more just even like check constraints or something like that that's a little bit softer
you know if you're already a team that's you know using check constraints and has a nice schema
design and has good performance and can read explain plans and stuff then i think like that
might be the circumstances for saying okay well, well, do we have some background
jobs that are client code? Like maybe it's written in a programming language called from Quran.
That's like doing a lot of back and forth between the client and maybe doing some processing on the
client and then making another call to Postgres, doing some processing, writing files out. Like
it might be interesting to think about like, well well what if we did this as a procedure as transactional control it's maybe it's relatively simple it's like
bulk updating some rows or generating refreshing a materialized view or i don't doing something
that's relatively simple but being able to save kind of the client round trips and then in the
book like we looked at how you can still manage that as source code in a sense there's a couple open source tools that um like allow you to write you know just like you
might write a method in ruby or another programming language like put your function into version
control you can version it you can have prs you know you can treat it like first class code and
i think you know i think if you can show a benefit to like the execution time
for that approach over an application code approach, like it, it makes sense to me,
like, why not? You know, you, I didn't really get into this in the book, but there's even like
some unit testing capabilities, PG tap, for example, in Postgres, um, where if you want
to write some kind of like branching logic unit tests, you can do that too. Yeah. Yep. Very cool.
You mentioned explain plans
a few times like what's the best way to tell what i could never parse the darn things so like what's
the best way to learn how to you know make sense of what's going on there yeah it's they're they're
super dense and complex you know it's really cool that you know it's databases are a bit weird you
know where you have this declarative model and it's a
runtime decision like what's going to happen with your query is like post is going to choose that
runtime and so like the best we can do is basically say like what are what are you planning to do
postgres with this query you know using explain and then what i've recommended to people is like, it's, um, or I've shared my experience,
which is that, you know, for me, it's taken a long time to learn bits of a plan, you know,
like learning it sort of not necessarily trying to learn everything right away, but, you know,
the basics like is my index used, you know, so making sure I can see where it's mentioned
in the explain plan
and then the plan is composed of plan nodes and then it's kind of like i don't you know depending
on how much time and interest you have and how important the query is you can just go deep on
on more plan nodes so understanding what a bitmap heap scan and bitmap index scan are
and how that's different from an index scan or a sequential scan you know if you have a need to understand that it's there's some great resources out there
like um the postgres docs are a bit challenging there when you're getting started but like i'm a
big fan of pg mustard which is a explain plan visualization tool it is a commercial tool but they do have a free
glossary that they maintain with a lot of great terms and then their kind of help information is
is kind of a public service too i think where they'll describe what a bitmap heap scan is for
example and um a little bit of information about like when you might see that and what you might
want to do with that um and so yeah i like, and then the other thing too is,
is I'm not beyond admitting that. I mean,
I use chat GPT a ton. I mean, like if, if it's like,
if I have a specific targeted question,
it's an amazing tool that we have available now where I can say like,
I'm getting a bitmap heap scan, I'm filtering, you know'm accessing this many rows the column is this type like how can I access this
like you know you can the more you know about how to read explain plans the better prompt you can
you can supply and it's not like I typically it's kind of my research assistant you know like I
I'll like get some ideas jot them down design a little experiment play around a bit it's not like I typically it's kind of my research assistant, you know, like I I'll like get some ideas, jot them down, design a little experiment, play around a bit.
It's not like it's like I type it in and boom, it's the answer.
And I copy and paste.
It's it's more like it's a starting point.
And then I can validate it, try out some other things and kind of iteratively approach it.
Yeah.
Oh, man, that's genius.
I hadn't even thought about that.
That's a that's a great one.
Cool.
OK. Common table expressions. CTEs, see these a lot. Are these purely for humans and query structure designs that might involve more left joins or outer joins as opposed to subquerying.
And then usually what I...
Sometimes I've done it more as a human where I've read a complex query with a lot of left joins and a lot of subqueries.
And I've taken out an inner part. And I've essentially written it as a complex query with a lot of left joins and a lot of sub queries and i've taken out an inner part and i've essentially written it as a simple query and and understood its performance
characteristics like whether it needs another index or whatever or you know we can structure it
and then you can just kind of wrap it in a ct and you can say you know with my simple query as
sql and then you can kind of like take the results from that and feed it to the
next query.
And sometimes I might not actually end up shipping that,
but it helps me understand what the query is doing.
And,
and then I could maybe apply the tactics from that simple version to the
former query and maybe just more easily identify what it needs to lessen its cost but
yeah i guess like as far as benefits to the planner there is i'm not so familiar with this
but you can materialize cte results and um i don't think we actually covered that in the book but
from a past presentation someone mentioned that to me they're like hey do you ever play around with
materialize which is an option in the, when I believe when you create the CTE.
And so I, I don't,
I also don't want to say that there aren't concrete benefits to the planner
as well. I think that there are, I,
I don't know that I have that in my bag of tricks yet, but yeah.
Yep. Do you,
do you feel like you're fighting the planner very often or is it is it
not that like do you feel like you have to use hints very often or just like write queries
different ways to to force the planner more down a different route or or not too much of that i mean
not really like i i don't i i know that there's a lot of narrative around that like um and if you
look at a lot of stack overflow posts or you ask people like why why is the planner doing this or that sort of thing like
i don't actually identify with that so much it's more like it's to me it's like um
there are there has been a occasional really challenging uh kind of hard to explain
result but usually it's like i think of it as there's kind of a
deterministic result we can get to as if we can make a narrow enough selection which might actually
involve changing the queries or pulling in some tricks like we might actually add a redundant
filter column just so that we could like pick it up on an index and lessen the cost. Or recently, I just did a trick of breaking up a query
that's just scanning a huge range of data.
We could get it to a certain point with indexes.
That was better, but it wasn't dramatic.
But by actually rethinking it as,
what would this look like as a view that would be materialized
that we got indexes to
i could take that big range and broke it into two queries with materialized views that were indexed
and then union those results together and it was actually dramatically faster to do that because
each individual query was fast so it's kind of like doing two queries and combining them basically and um yeah i guess
that's that's generally how i go it's it's more like if something's maybe it's because like if
something's not working i just try to go like a different direction a little bit you know
yep absolutely um what about okay so now we've seen in the last i don't know five ten years
json json b columns um do you i guess like how do you think about json data do you recommend I don't know, five, 10 years, JSON, JSON B columns.
Do you, I guess, like, how do you think about JSON data?
Do you recommend using it?
Do you not recommend using it?
Yeah.
What do you think about there?
Yeah.
Well, I think if I recommend it in that, if like the alternative is, you know, we do 95% of our stuff in Postgres, but we're going to add like MongoDB because we want to do
the schemaless thing over
here. And now we're running Mongo or, you know, or like there's other examples too, but just kind
of using Mongo as an example of, we want to store like data that we don't really know the, um,
totality of what we want to store. We don't really want to maybe do schema changes all the time to
introduce new schema definition. Um, we expect this to be like maybe a high right table.
I guess in situations where you could stay in Postgres, I think it's worth considering.
Like, what would this look like as a JSON B column?
We can store JSON data and we can index it later.
We can access it pretty quickly still.
So, yeah, I tend to think about it more as like,
this is a capability we have in Postgres,
but there are some operational trade-offs
like storing larger amounts of text in Postgres.
There's this whole toast mechanism that can add latency.
It's more kind of like your data is kind of more split up
behind the scenes and those tables,
just like other tables need to be vacuumed
and that kind of thing.
So depending on the scale and stuff, there can be some significant trade-offs but it's also really amazing like the amount of json capabilities postgres has both for storing
and generating json and i also wrote about this in the book like i think um json doesn't mean i
didn't actually know this when i started writing
the book but like json b doesn't mean that you're kind of schema-less or you have no structure
that can be supported forever you can always add that in later with there's a couple tools like
json schema uh validation i believe i forgot the name of it now but what you can do is you can come
in later
and add a JSON schema compliant schema definition and then those check constraints we talked about
before and actually add some kind of constraint validation to your data later, which is cool.
Interesting. Yep, that is cool. Okay. You know, one of my favorite posts is from this guy named
Nelson L. Hodge. he used to work at stripe
um and he's talking about sql and relational databases generally and and you know they're
awesome for a lot of things his biggest concern is like there are so many dang features and some of
them it's hard to know like the operational implications of and i would say like full
text search aggregations things like that that work
well at some scale and i'm not going to say you know where that line is but just like the the
performance profile changes as you get larger as compared to like you know single item look up on a
on an index value like going to be pretty consistent as you go like are there how do you think about
those are there are there features you give like sort of caveats on or avoid or things like that or how do you think about those that concern yeah for sure i mean another one too would be partition
tables which are like you know i kind of like like the terminology of like a sharp knife for
some of those things where it's like if you know what you're doing and you know what the trade-offs
are like you can cut yourself you know like with a sharp knife you want to you don't want to cut
yourself but you also like you could cut your vegetables much faster because your knife
is very sharp and if you're a skilled operator um now that being said like i think a lot of people
they they do learn from experience i mean i do as well i think like um something like yeah full
text search uh partition tables there are capabilities that postgres
supports natively and that does mean that it does have pretty good it has better documentation than
it might if it was not natively supported so um like i know in the postgres documentation for sure
there are caveats so i think it's important to you know not just wild you know willy-nilly adopt advanced
features but to familiarize yourself with the caveats as well and maybe design in some like
observability or metrics that might help you avoid the painful parts so that you can receive
the benefits as well um you know and then i think yeah like a lot of things with databases is like
try to start small if you can like a real use but maybe on a lower criticality area maybe like when
we're doing the huge uh index creation we do that at an off peak time um you know start small and
then i think expand and that's that's how i've done a lot of like work is not only does that
limit your kind
of blast radius if things do go wrong but it also on the other hand the other or the other way to
look at it is it builds your confidence that you can kind of become an operator using these advanced
capabilities and you have real tangible experience to speak to and then you're just sort of
incrementally growing from that as opposed to going from zero to 100 yeah
yeah absolutely um on that same vein of like kind of like in the postgres for everything camp right
we talked about json and full text search and things like that um what do you think about
foreign data wrappers yeah we i've used them in a workplace setting for helping us glue different Postgres databases.
I just checked the time and realized we've been talking for a long time.
I just love talking about Postgres.
Yeah, good.
And your questions are great.
They're obviously from a place of experience, so you're hitting on the main points.
So foreign data wrappers, I haven't really used them in a ongoing context, although my, you know, people do use them that way, but certainly for like,
you know, we have this Postgres data, we have this multi-services architecture at the company.
And like, we have Postgres A over here and Postgres B over here. It's a really cool facility
to be like, we can actually treat Postgres B over there in our a database as if it was a local
table by connecting to it but like i've done that more in a one-time sort of thing like um moving
some data or or going the other direction where we want to split out some data um but i and then
that's one way to look at it the other way is like it is a general mechanism where um you can have
foreign data wrappers for like all kinds of
things it doesn't have to just be a postgres database i think the only other one we did in
the book was a file but like treating a file as a read-only table and um i think that can be a really
cool um component of a of a solution to kind of tiering your data where you have old data that you may want to
be accessible but you might want it out of your main postgres database but into a file
but i know that there's other companies like tembo is big on tembo's a hosted postgres provider
and they have these different kind of uh stacks that they make available and they've written about some of their
foreign data wrappers for external services and technologies yeah yeah that that pattern always
like yeah it's kind of surprising me because like i think the benefit of one of the big benefits of
just use proscribes is hey i have one piece of technology it's operationally sound i know how
it works and and things like that but it's like hey if you have one piece of technology. It's operationally sound. I know how it works and things like that.
But it's like, hey, if you have a foreign data wrapper,
it's like, well, you're routing everything through Postgres,
but you still have your Redis or your Elasticsearch or whatever.
Sometimes people are piping it through,
and it's like you still have to manage that somewhere.
So it's not like you've really saved a lot there.
Yeah, like if it is, you know, a file is not as, you know, as obviously much less concerning operationally than like a whole system, like you said, but yeah, I wouldn't. If someone was going to be doing that, like connecting to Redis or whatever, I would definitely say like, well, you're still going to need observability on that side access to it, you're going to need to connect to it and you know check things out time from time to time yeah yeah absolutely what about sort of on that same vein is like i feel like there's this giant
ecosystem of extensions that are like not quite forks but they but they change postgres like
pretty significantly i'm thinking of like situs for distributed or timescale for time series or
now like parade db who i talked to last week for search and analytics like yeah like what do
you think of that whole um ecosystem we're seeing there well it's it's uh it's cool that postgres
the extensibility seems there seems to be like energy picking up into let's really leverage this
extensibility and there's companies that are trying to make you
know extensions i think are can be as simple as like a function like a string formatting function
like that could be an extension you know but like you said some folks use the term super extension
where it kind of changes the operational model or it brings like a whole set of capabilities
to um to core postgres or community postgres and um yeah i mean i just see it as favorable
as positive it's like it's broadening the um it's broadening the possible usages of postgres
and that's going to be good for some people other people may not use it but it's a great way to
reach some folks that might not otherwise be able to use it. Maybe they run like the parade DB example, like they run Postgres for their main OLTP database.
They want to do some full text search.
They're comfortable operating Postgres like they know how to add roles and they know how to size their instances and they kind of know index design.
And then it's like, well, what if we used it in this more narrow use case?
We run a separate
dedicated instance just for full-text search um i mean it to me it's not any more complex than
saying like we're going to start with elastic search like that's a whole operationally complex
beast that has a lot of a lot of concepts to learn like a lot of challenges that can happen
once you're in production and you have
big indexes and bigger search volume.
So, I mean, going into something like, let's take this core Postgres, extend it.
You know, if I'm like an engineering leader at a company, like they're kind of in the
same realm to me as terms of complexity and that stuff.
And that's the thing we're going to take on.
And, you know, you could argue maybe there's less actually on the Postgres path if you're
a comfortable operator there.
So yeah, I think those are, those things are great.
And then like, you know, more, more extensions ecosystem.
Like there's a couple extensions.
I can send you a list if you're not aware of these.
A couple of like extensions, repositories,
or websites like that, list them and categorize them and um you know helps give you a
peek into like how well maintained they are and what their usage is like and that sort of thing
is a good thing and then like you may be aware of aws uh at reinvent i think two years ago
launched this trusted language extensions capability. And that's
another way to get some types of extensions onto cloud providers where you might not otherwise have
as much free reign to add extensions that you would if you were self-hosting Postgres.
So yeah, I think all that kind of stuff is good. It's like Postgres has these extension points and, you know, like it could make sense to use it in a lot of different ways.
Yep. Yep. Absolutely. Okay. Let's, let's shift gears a little bit and talk about the book. Like
what, what inspired you to take on this, you know, labor of love to write a 450, 500 page book?
Yeah, well, it, it, uh, I guess like I, you know know i had kind of felt like i reached this um staff level
engineer level in like a lot of ruby on rails uh a lot of different companies and a lot of
different teams and you know part of it is like i did want to kind of do something you know maybe
a little more meaningful or impactful even outside of individual
companies i was working for and i have this like energy around learning these topics and then you
know how to make them you know i feel like i have the perspective of application developer and then
trying to make them more appealing or accessible to a developer that's probably not going to ever
pick up a databases only book like a postgres dba sort of book so yeah i just sort of felt like um like i have this background too where i also
studied uh the spanish language in university had the chance to live in spain for a while and lived
with a family and there's i guess like part of my brain is like i like the idea of bringing two
different communities together you know like i I like meeting people from different cultures and that speak
different languages and that sort of thing. And it does sort of feel like two different cultures,
like application developers and database administrators
or kind of data enthusiasts that want to
have high quality data. And then there's this middle
ground of like performance
enthusiasts, you know, where people like want to make sure that their application can scale well.
And, um, they want to understand why is this API endpoint slow? You know, if it's a database issue,
not like a front end browser issue. Um, so yeah, I guess like it was kind of this amalgamation of a lot of different factors of my career spot and then having this opportunity and having a lot of work i put in to
um improve our operations and then feeling like i had something to share
yep yep how how did it take you to to write the book well it uh as of since we we know like today it's it's in print
which is exciting um the the official start i'll have to look back i think it was april of 2022
so today is in june of 2024 so for sure more than two years two to two and a half years and
in total i'd say the first draft was about like nine months i think to a year and then
surprisingly i i thought it would be much faster but that period from when the first draft was about like nine months, I think, to a year. And then surprisingly, I thought it would be much faster.
But that period from when the first draft was done to when actually it went through all the technical editing and then later the layout and that sort of thing and the additional edits, that was actually almost a whole nother year, which is really wild.
And I was working full time, I think, like with for most of it, I think with, you know, a dedicated push, like it could have been much less time.
And also as a first time author, like I had a lot of like inefficiencies in my process that I feel like I got more efficient as I went.
But it is a very time intensive task and I couldn't really afford to work on it while not working. And so I, you know, I'm really,
really fortunate to have had like, you know,
some supportive managers and then you know, supportive spouse and yeah,
I just feel really happy that I've been able to get it done.
Yep. Yep. If you like, if you could tell yourself back in April, 2022,
would you like still say, go do it?
It was like two years you're going to invest into it.
Did you enjoy the process and that whole thing?
I mean, I think I would because fundamentally, I do enjoy the process.
I write on my blog a lot.
And I always have.
I'm sure like you, I have a huge list of things I always want to investigate.
And for me, writing about them is how i do a proper investigation like i'll create
a little sample and you know do something a little hands-on and so i write like a lot of my blog was
just sort of unfocused and all over the place like whatever tech stuff i was looking at or even just
like going to conferences and that sort of thing but yeah lately i i it's felt actually kind of
comforting to have more of a special area that I focus within so I'm
trying to really know exactly what's coming like in the next version of Postgres what's coming in
Ruby on Rails you know I'm thinking about weight like on things that haven't been covered that well
maybe that are like Postgres capabilities that app developers aren't using or even within Postgres
like under covered sort of features and
stuff so the truth is like i always want to write and do more it's just a matter of like
trying to balance that with you know also like income generating activities since i haven't
really cracked that nut as much and then um and yeah just kind of keeping things in balance
yeah yeah yeah any recommendations for folks that are interested in writing a book um yeah i think that most publishers if you go the publisher route of course they're self-publishing
which i don't really know a lot about i think i looked in really briefly into i think it's called
lean press is that what it's called yeah impressed as as one of the options that might help you
um but for the publishers i worked with a couple they each have a proposal process so if
you've ever worked on like a conference proposal um it's going to be somewhat similar they'd like
you to know you know who you're going to market the book to what kind of ballpark estimates you
have for the size of the market and then you're trying to make a pitch basically that is like um
here's why i think this book should exist here Here's some books that are similar, but don't quite overlap, you know?
And then, um, and then, yeah, be prepared for a very time intensive, uh, journey, you
know, and it's, it's really fulfilling to, it feels like a privilege it did for me to
like have the purpose to go deep on certain topics and really, you know,
turn over all the stones,
but like it also,
you know,
it comes,
it needs to be balanced.
It comes at a cost of like,
it takes away time from other activities too.
Like whether it's working or,
you know,
your family,
like I had to cut out some things temporarily,
you know,
and then just kind of had the mindset like this will come back eventually.
I got to just get through this.
So yeah.
Yep.
Absolutely.
I totally agree. That's great. Cool. Let's do some quick fire questions that we always end with here first of all if you could
master one skill that you don't have right now what would it be um can these be non-technology
or yeah absolutely anything you want yeah i know you always you asked these from watching past ones
and i even wrote out some answers but now i'm blanking on all them um i i started to learn the
guitar and covid and learn like a few chords and i'm like this is really fun it's relaxing i wish
i could play the guitar but i can't i can just play a few chords yeah yeah cool that's good one
what wastes the most time in your day um probably myself just using uh social media and you know like not necessarily
being disciplined and having a good game plan and so i think i like to have you know meetings and
structure and goals and that sort of thing and if i don't i i'm pretty good at wasting time wasting
yeah yeah has that been hard in the shift to like
independent consulting like dealing with with just some of that time management stuff yeah i think so
in that like it's it's not actually that i'm just mindlessly browsing it's more like i'm i do think
i'm a creative person and that i always have ideas i want to write about and explore and things and i
can easily just work on a blog post for three hours that doesn't go anywhere you know but like and so i think um i think having a purpose whether it's i'm just
actually starting to dabble in paid writing so stay tuned on that but um if it's if it's writing
like having a purpose like i'm writing this for my blog i'm writing this for a third party or if
it's like conference stuff it's like i'm working on a presentation or whatever. But having some structure, I think I'm getting a little better at trying to even just interrogate myself.
Like, why am I doing this?
Yeah, exactly.
Cool.
If you could invest in one company, not a public company, some sort of private company, who would it be?
My wife would not go for me putting a large amount of money in nvidia like
not even a long time ago but even just a few months ago but like you want to i kept thinking
the whole time i'm like no way it's going to keep going up and then it's like just so painful to
watch it like to go just higher and higher and higher like in that like i wanted to put money in
but um uh let's see well this is maybe uh uh maybe this is a unsurprising answer, but also a little surprising.
But yeah, as far as private companies, like, um, I'm, I would consider myself friends with
the founder, Lucas Fiddle of PG analyze, and I'm not, I have no financial stake in the
company or anything, but we we've caught up at a couple, um, conferences and we kind of
stay in touch pretty regularly
through the postgres community and i've been now as a consultant using the product with clients
and i really see the value i think it's a great product i think it's really cool like they've
decided to really go deep on postgres specific observability the kinds of things that
if you're really operating postgres like and you and you have the real need, like it's got great visibility into things like auto vacuum runs and stuff that other tools might just be like,
that's too much in the weeds. We care more about the instance. So I think they're great. And
they're a bootstrap company. Lucas has talked about that. I'd love to see them just continue
to be more and more successful. Yeah, cool. That's great to hear. Which person influenced
you the most in your career? Oh, man, I know this is a standard question too. And I had a really hard time
thinking about a specific person. Um, I guess like there's just been, you know, I've had like
engineering managers or senior engineers at companies in my career experience that have been,
um, influential. Like, um like one person I've talked about
is this guy named Paul Berry who I'm not even I'm not sure exactly what he's up to now I believe
he's still a CTO at this startup that he's part of and but you know earlier career like
as far as the database stuff like someone who you know had really good mastery of writing advanced sql like being
an operator being able to go and terminate queries that were out of control and things like that um
yeah i'd say people that are that have those kinds of like um hands-on you know somebody that's going
to be like oh let's just let's just hop on a computer and figure this out like you know together
they they want to um be hands-on but share their knowledge like some of the engineering folks in my career those have
been the people that have stuck with me you know okay five years from now will people be writing
more code or less um well i mean it's an interesting question because you know the idea
is with a lot of code generation tools and ai that you might
write less but it also could create new opportunities with um new types of businesses
and products and companies that didn't exist before and um i mean i guess like i just feel
like i gotta say it's a wash because it's like i can see a little a and a little b it's it's true
yep yep all right i have a special quick fire question for you as a fellow Timberwolves fan.
Oh, yes.
Yes.
Should the Timberwolves trade cat or keep him?
Okay.
I can't.
This is amazing that you asked me that, by the way, because even amongst my Timberwolves,
I'm a very long, I grew up in Minnesota.
Like I grew up with, like I had a Pooh Richardson, probably no one knows, like a poster on my,
above my bed, like like as a like a 10
year old yeah that's good and um you know i was like uh in the 2004 western conference finals era
of like kg and that sort of team like so i've been i've been a long time to i did take a long
break because we moved away from minnesota but i've been back since that's a good time to take
a break because yeah i'm like the worst but um we moved back actually to Minnesota in 2015 which is Kat's rookie year
and um hopefully he never watches this podcast because I do sort of feel like it might be good
to trade Kat I think with the structure of the team now like he's an extremely valuable offensive
player that will also be extremely
valuable to another team.
I just think that like possibly for the composition of the team,
because NBA is all about like matchups and,
and like he,
he does generate a lot of offense.
So we would need a lot of offense to replace his generation.
But I just think that like,
it might be a good time to make a change.
He's been with the team for,
I believe eight years. And yeah. Yeah. Yeah. yeah I know I go back and forth even in the playoffs like
he'll have those crazy frustrating games I'm like oh my gosh just have a 60 point game and then
they'll have like a you know a 10 point game and it's it's uh and six fouls and like of the dumbest
fouls you've ever seen and yeah but yeah he seems like an amazing person and he's had a lot of tough breaks in his personal life and stuff and i you know and he's
been a huge minnesota advocate for being a relatively smaller uh city and that within the
nba within the country and stuff and that's great but i am i'm slightly on the like let's explore
the right trade camp which is controversial even amongst local Timberwolves fans.
So it's really funny you asked.
Yeah, it is tough.
Andrew, it's been great to have you on.
Tell us a little bit more about the book, where we can find it, where people can find
you, things like that.
Yeah, thank you.
Yeah, so High Performance Postgres for Rails.
It's available from all the major places, Amazon and whatnot.
The publisher is Pragmatic Bookshelf, and whatnot the publisher is pragmatic bookshelf
and so their website is pragprag pragmaticprogrammers.com where you can buy the book as well
it's also on bookshop which is like a more of an independent bookseller and i blog at
andyadkinson.com and then these days i mostly use twitter X, but I have social media accounts in most places,
but I mostly tend to hang out there and post about Postgres and rails things
and conferences and stuff like that.
Yep.
Cool.
Well,
as I said,
like totally recommend the book,
even if you've never written any rails,
I'd say if like,
if you're in that application DBA or accidental DBA camp,
I'm just like,
Hey,
you want to ship some features,
but you also want to understand what's happening under the hood.
I think it's really, really good for that. So, uh, yeah, yeah. We'll
have a link in the show notes and all that stuff. And Andrew, thanks for coming on.
Thank you. My pleasure. Thanks a lot for the opportunity.