Postgres FM - Blue-green deployments
Episode Date: November 10, 2023Nikolay and Michael discuss blue-green deployments — specifically an RDS blog post, how similar this is (or not) to what they understand to be blue-green deployments, and how applicable the... methodology might be in the database world more generally.  Here are some links to things they mentioned:Fully managed Blue/Green Deployment in Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL https://aws.amazon.com/blogs/database/new-fully-managed-blue-green-deployment-in-amazon-aurora-postgresql-and-amazon-rds-for-postgresql/  Blue-green deployment (blog post by Martin Fowler) https://martinfowler.com/bliki/BlueGreenDeployment.html  Our episode on logical replication https://postgres.fm/episodes/logical-replication pgroll https://github.com/xataio/pgroll ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I am Michael, founder of PGMustard. This is my co-host Nikolai, founder of PostgresAI.
Hello Nikolai, what are we talking about today?
Hello Michael, the RDS team just released a blog post about blue-green deployments
and I thought it's a good opportunity to discuss this topic in general
and maybe RDS implementation in particular, although I haven't used it myself.
I've just read the blog post, but I know some issues and problems this topic has.
So I thought it's a good moment to discuss those problems.
Yeah, awesome.
Even if we look at the basics, I think it's interesting to most people.
Everyone has to make changes to their database.
Everyone needs to deploy those changes.
Most people want to do that in as safe a way as possible
with as little downtime as possible.
So I think it's a good topic in general to revisit,
and it looks interesting.
Right, right.
So I think in general, it's a great direction of development of methodologies,
technologies, and ecosystem, like various tools and so on.
Because bigger projects, not only the biggest projects,
some smaller projects also need it,
especially those who change things very often.
But before we continue, I would like to split this topic to
two subtopics. First is
not frequent changes we
do when we, for example,
perform major
upgrade of Postgres or we
switch to new operational system if
we have self-managed Postgres
with
JLIPC version switch, right?
Or, for example, we switch hardware, I don't know,
like something like big, big changes.
Major version upgrade.
Right, or maybe we try to enable data checksums.
Maybe also this is one of the...
Interesting, yeah.
It's generally possible with rolling upgrade approach
when you just change it on one replica and then another,
and then like, you then like rolling upgrade.
But maybe this idea of blue-green, which came from stateless part of systems.
Originally, this idea was avoiding the topic of databases, but we will discuss it.
So this is a big class of changes, which usually is performed by infrastructure teams.
And it's not very often
few times per year
usually, right? Versus
a very different category of problem
which is changing
our application code maybe
several times per day, trying to react
to the market needs and
our competitors
changes, trying to move
forward, like go to market strategy and so on
so continuous deployment schema changes various stuff so obviously interesting that original idea
described by martin fowler is about the second thing, schema changes and so on, like application changes,
which is done probably not infrastructure team,
but engineering team or development team,
which is usually bigger in size
and they need more often changes,
but each one of those changes is lighter,
much more like it's not as heavy
as major Postgres upgrade, right?
But it's done, it needs to be done very often
and probably in fully automated fashion,
like through CICD pipelines,
continuous integration approach, right?
So we just change it a lot of automated testing
and we just approve, merge,
and it's already in production, right?
So original idea by Martin Fowler, and I think we need to start discussing it already, right? So original idea by Martin Fowler,
and I think we need to start discussing it already, right?
It's about the second problem for developers.
While what the RDS team developed is for infrastructure team
and major upgrades, it's a very different class of tasks to solve, right?
Do you agree?
Yeah, I do.
And I'll probably jump in the gun a little bit here,
but I feel like they might be slightly misusing the phrase blue-green deployments for the
description of this feature. And I really like this feature. If I was on RDS, I think I would
use it, especially for major version upgrades. I think it makes that process really simple and
lower downtime than most other options smaller database users have. But yeah, I completely agree
that this is not at all appropriate for application teams wanting to roll out new features, add a
column to a table, add an index. It just doesn't make sense. Because logical doesn't support DDL
replication yet, right? That's why
this is like stop, full stop.
And even if
it did, I think the
way that this
is done wouldn't be appropriate.
Here I would argue with you, but let's do
it later. Just let
make a mark that I have
multiple opinions here, no final opinion.
So I have different thoughts.
Let's discuss it slightly later.
So, okay, let's talk about the original idea, blue-green.
First of all, why such weird naming, which reminds me red-black trees from the algorithm and structure
from computer science, basically.
Binary trees,
next ideas,
red-black trees, and so on.
So why this name?
You've read about it, right?
Yeah, I saw in an old Martin Fowler
blog post that I'll link up
that they
had, I suspect, I didn't actually look at the
timelines but i suspect it was back from when they were consulting i think probably at thought works
that seems to be where a lot of these things have come from and they had some difficult to convince
clients that they wanted to increase the deployment frequency but people were scared of risk as always and they had this idea that
well i mean it's it's kind of standard now but i guess back in the day it wasn't as standard that
staging needed to be as close to production as possible so that you could do some testing in it
and deploy the changes to production in as as riskfree a manner as possible. And then they took that a bit further and said, well, what if staging was production,
but with only the change we wanted to make different?
And instead of making that change on production, we instead switched traffic to what we would
previously have called the staging environment.
And they talked about naming for this.
I don't even know what you'd call it, but methodology, I guess.
And they thought about calling it AB deployments,
which makes a lot of sense, but they didn't want to do it.
AB means we split our traffic,
maybe only read-only traffic in the case of databases.
And we compare two paths for this traffic.
Well, the main objection that Martin had with that naming
is that they were scared the client would feel
that there's a hierarchy there.
And if we talked about there being a problem
and we were on the B instance instead of the A instance,
the question is, why were you on the B one
when the A was available?
And I think that's, I'm not sure.
I think you're quite right that A-B testing might have already been a loaded term at the
time, but it also is a good counter example where most people understand that in an A-B
test, we're not assuming a hierarchy between A and B.
Right. But also the approach, this approach says
the second cluster,
like secondary cluster,
which follows...
Okay, I'm thinking about databases only, right?
Let's switch.
Since we discussed Martin Fowler's ideas,
we should talk only about
stateless parts of our system
and database we should touch on a little, right? So, only about stateless parts of our system and database we should touch on
a little, right? So, okay, stateless. For example, we have a lot of application nodes and some of
them are our production. Some other are not production. And what I'm trying to say, it's
not only about hierarchy and which is like higher, of course. So, yeah, by the way, I remember a similar name.
Okay, I'm a database guy.
I remember if you give
hostname primary to your primary
but after failover you don't
switch names, it's a stupid idea
because this replica now
has hostname primary. It's similar
here, right? So we need
some distinguish
but not to
permanently say this is
main one because we want them interchangeable
symmetric,
right? So we switch
there, when we switch back,
back and forth,
and always one of
parts of like one cluster
is or one set of nodes
is our real production and another is
considered as like a kind of powerful staging right but key questions not only about only about
hierarchy but how exactly is testing done in one case we can consider, this is our staging, and we send only test workload there,
which is done, for example, from our QA test sets,
from pipelines,
or we consider this secondary cluster,
secondary node set as part of production
and put, for example, 1% of whole traffic there
this is very different testing
strategies right so two different strategies
I think in
original ideas it was like it's staging
all production
traffic goes to main
node set blue or green
depending on current state
and
that's it, right?
So we cannot say it's A-B
because in A-B,
we need to split 50-50 or 20-80
and then compare.
Yeah, sometimes in marketing,
I've heard people talk about A-B testing,
which is concurrently testing
two things at the same time.
And then sometimes they they call
what this might be is cohort testing they say we're going to test this month uh the timelines
will be different but if you wanted to test if you wanted to to switch from blue to green in one go
and send all traffic to the new one that would be considered considered, it's not A, B, because it's not concurrent, but you might say this cohort is going to this new one.
I would say that they're both A, B, in my opinion, because they both use production
traffic to test.
So this is exactly, by the way, the idea, we can switch there for one hour, then switch
back, and then during next week, study the results, for example, right?
It makes sense to me, or next hour, I don't know.
I don't really care if it's like concurrent or sequential,
but the idea is we use production real traffic.
It's a very powerful idea.
Not only data or not only applications,
application nodes are configured exactly like on production
because they are production sometimes, right?
We switch them.
But also we use real traffic to test i think original idea was we don't do it this secondary node set is used
for lower environment it's still production data right or production it talks to production
database but we generate traffic ourselves, like special traffic, special workloads under control.
This is the idea, the original idea, like we do with staging.
But we know this is our final testing.
It's very powerful.
It uses the same database, first of all.
So we should be careful not to send emails, not to call external APIs
and also to convince
various auditors that it's fine
because they always say if you do production
testing, maybe it's not a good idea.
Who knows? But it's
very powerful testing, right? But it's
not done with production
workloads.
Yeah, interesting. And
have you heard the phrase testing in production that this feels like
a uh it's like uh i do it all the time yeah but i like it yeah yeah well it kind of feels like that
partly when we're switching over as well because as much as testing as we've possibly done
most of us with a bit of experience know that as you can do all the testing in the world and productions just different like we all use is just
different they will use it or break it in ways you didn't imagine or have
access patterns you just didn't imagine so we kind of are testing and I think
that's one of the big promises of blue-green deployments in the
theoretical or at least in the stateless world.
Let's introduce Vestor, stateless.
Is that you can switch back if there's a problem.
That feels to me like a real core premise and why it's so valuable is if something goes wrong,
if you notice an issue really quickly,
you can go back to the previous one.
It's still alive and there are no ill effects of moving backwards.
And I think that's a tricky concept in the database world,
but we can get to that later.
Yes, and this is exactly.
Let's continue with this.
I think we already covered the major parts of the original idea,
of stateless idea.
We can switch to stateful ideas.
And this is the first part where the RDS blue-green deployment implementation
radically differs from the original stateless ideas.
I noticed that from the very beginning of reading the article,
they say, this is our blue, this is our green.
And they distinguish that.
Yeah.
It's a different approach.
It's not what Martin Fowler talked to.
Very different.
So, and obviously, like,
reading from this article,
obviously the reverse replication is not supported,
but it could be supported.
It's possible.
And actually we already implemented it a couple of times.
And I hope soon we will have good materials to be shared but
in general you just, why not
create, when you
perform switchover, why not create
a reverse replication and
consider old cluster as
like kind of staging now.
Not losing anything.
And without this
idea, it's one way ticket
and this is not an enterprise solution, sorry.
It's plain and simple.
It's not an enterprise solution.
It's definitely not.
Well, it's not blue-green either, I don't think.
Right.
But it's an interesting point about scale.
So if I'm just a small business with a small database, and I'm doing a major major version upgrade and I want to be able to go backwards,
it would be tricky to do with this, I think.
Yeah. So it's not tricky. It's tricky. If you want,
if you don't want data loss, you can go back, right?
But you lose a lot of rights.
So, but I can't do, let's say if it's not a major version upgrade,
if it's something like maybe changing a configuration parameter,
I could do what Amazon are calling blue to green,
change the parameter in the green one, switch to it,
and then I can do it the same process again, switching it back.
But you lose data.
New writes will be not replicated backwards.
Not back.
So let's say I go blue to green,
change the parameter on green,
switch over.
Now I've realized there's a problem
and I set up a new one,
a new green,
as they call it,
and switch again.
Well, okay, okay.
In this case,
we deal with very basic example,
which probably doesn't require so heavy solution to it, because, like, depending on which parameter you want to test would be easier to just do that,
especially because the second
consideration,
reading this article, is that downtime
is not zero.
So restart is not
zero, and here is also not zero.
I don't remember if they mentioned
the issue checkpoint to minimize
downtime. I think
no, right? In fact, that alone
is probably enough in your books
to say it's not enterprise-ready.
And to their credit,
they do say
low downtime switchover.
They're not trying to claim that it is.
Right. If we say that,
if this is our
characteristic, it's not zero downtime, it means
that this solution competes with regular restarts.
That's it.
So why should I need this to try to switch to different parameters?
I can do it just with restarts and not losing data,
not paying for extra machines and so on.
But for major upgrades, it's a different story.
You cannot downgrade,
unfortunately. There is no PG downgrade tool.
Exactly.
So you just need to use reverse logical replication and orchestrate it properly, and it's possible
100%. And this would mean...
But not through Amazon right now.
Yeah, it's not implemented, but it's solvable.
And I think everyone can implement it.
It's not easy.
I know a lot of details.
It's not easy, but it's definitely doable.
What were the tricky parts?
Tricky parts are like if you need to deal with...
We had a whole episode about logical, right?
So the main tricky part is always not only like sequences or DDL replication.
It's very well-known limitations of current logical replication.
Hopefully, they will be solved.
There is good work in progress for both problems.
There are a few additional problems which
are observed not in every
cluster, but these two usually
are observed in any cluster because everyone
uses sequences, even
if they use this new
syntax-generated
identity.
I don't remember. I still use
Bix.
It always has identity, I think.
Yes, but behind the scenes,
it's also sequences, actually.
And everyone is usually doing schema changes.
So these two problems are big limitations
of current logical replication.
But the trickiest parts are performance and lags.
So two capacity limitations on both sides.
On publisher, it's a wall sender.
Like we discussed it, right?
Yeah, we can link up that episode.
Yes, yes.
So wall sender limitation and logical replication worker limitation.
And you can have multiple logical application workers and interesting that
this is like actually
says that article needs some
polishing because they say
max logical application worker
like I'm reading max
logical application worker and I don't
see S because it's plural
the setting and I'm saying
inaccuracy here and then the whole sentence
is saying when you have a lot of
tables in database,
this needs to be higher.
And I'm thinking, oh, do you
actually use
multiple publications and multiple
slots if I have a lot of tables
automatically? This is super interesting because
if you do, as we discussed in our
logical replication episode,
you have big issues with foreign key
violation on the
logical replica side,
on subscriber side, because
by default, foreign key
is not followed
when replicating
tables using multiple
poopsoup streams, right?
And this is a huge problem if you want
to use such replica for some testing,
even if it's not production traffic.
You will see, like, okay,
this row exists, but the pending
row is not created yet.
Foreign key violated.
And it's normal for logical replica
which is replicated
by multiple slots and
publication subscriptions.
Not discussing this problem means that probably there are limitations
also at large scale.
If you have a lot of tables, it's not a problem, actually.
The biggest problem is how many tuple writes you have per second.
This is the biggest problem.
Roughly like thousands or a couple of thousands of tuple writes on modern hardware with a lot of vCPUs like 64, 128 or 96.
I'm talking Intel numbers, usual Intel numbers.
You will see single logical replication worker will hit 100% CPU.
And that's a nasty problem.
That's a huge problem.
Because you switch to multiple workers, but now your foreign keys are broken.
It's hard to solve problems for testing.
So, I mean, if you use multiple, you need to pause sometimes
to wait until consistency is reached and then test in frozen mode.
This is okay.
But it adds
complexity. But if your traffic
below like 1000 tuple per second
roughly, depending
also, is it Intel?
But by the way, it doesn't matter how many cores
because I talk here about limitation
of single core. It matters
only like, is it
modern core or quite outdated?
On the family,
it depends.
If you talk about AWS
on the family
of EC2 instances
you try to use
or RDS instances
you try to use.
So,
this single core
limitation
on the subscriber side
is quite interesting.
But if you have
below
1,000
tuple per second
writes in source update the deletes probably
you're fine yeah so this is interesting to check and this lagging is I think the
biggest problem because when you switch over you need to catch up when you
install the reverse logical replication you also need to make sure you catch up. This defines
the downtime, actually.
Yeah, because we can't switch back
until... We can't switch or switch
back until the other one is caught up.
Right, because we prioritize
avoidance
of data loss
over HA here, over
high availability here.
I'm sure since
the RDS blog post talks about it's not zero HA here, over availability here. And I'm sure since the
RDS blog post talks about
like it's not zero downtime,
they have additional overhead.
But if you have, for example, Pidger Bowser
and you're going to use post-resume
to achieve real zero downtime,
then you need the
lag to be close to zero.
And the limitations of
logical application worker will be number
one problem. Another problem is long-running
transaction switch.
Until post-16, I think, right,
cannot be parallelized
or 15.
So if you have long
transaction, you have
big logical replication
lag. So you
need to wait until you have good opportunity to switch over
with lower downtime. That's one thing I think I do want to give them some credit for. This does
catch some of those. So for example, if you do have long running transactions, they'll prevent
you from switching over. Equally, there's a few other cases where they'll stop you from causing yourself issues which is is quite nice and i wanted to give a shout out to the
like postgres core team and everybody working on features to improve logical replication
has enabled cloud providers to start to provide features like this and that's really cool it's
this feels like the the good features going into Postgres Core are enabling cloud providers to work at the level
they should be working at to add additional functionality.
So it's quite a cool, like, not necessarily that we're there yet,
and as logical replication improves, so can this improve,
but they are checking for things like long-running transactions,
which is cool.
Yeah, and definitely Amit Kapil and others
who work on logical replication, kudos to them 100%.
And also RDSTM, I'm criticizing a lot,
but it's hard to criticize someone who is doing nothing.
You cannot criticize such guys who don't do anything.
So the fact that they move in this direction is super cool.
A lot of problems, right?
But these problems are solvable, right?
And eventually we might have a real blue, like the question,
is the blue-green deployment terminology going to stay in the area of databases
and Postgres ecosystem in particular?
What do you think?
Because this is a sign that probably yes.
It should be reworked a lot, I think.
But in general,
maybe yes. What do you think?
Yeah, I don't know.
Obviously, predicting the future
is difficult, but I do
think that badly naming things
in the early days makes it less likely.
Calling this blue green
when it's not actually i think reduces people's trust in you using blue green later in the future
when when it is more like that but you you've got more experience with this than me in the
for example in the category of database branching, like taking these developer terms
that people have a lot of prior assumptions about
and then using them in a database context
that they don't 100% apply to
or they're much more difficult in,
I think is dangerous.
But equally, what choice do we have?
How else would you describe this kind of thing?
Maybe it's a marketing thing.
I'm not sure.
That's cool direction of thinking.
So let me show you some analogy.
Until some time, not many years ago,
I thought, as many others, that changing something,
we need to perform full-fledged benchmarking.
Like, for example, if we drop some index,
we need to check that all our queries are okay.
In this case, okay, we can do it with pgbench, sysbench, jmeter,
or anything like we...
Or simulate workload with our own application
using multiple application nodes.
A lot of sessions, like, running, like, 50% CPU,
and this is just to test, an attempt to drop indexes.
It sounds overkill.
I mean nobody is doing it actually because it's too expensive.
But people think in this direction.
It would be good to test holistically.
But actually there is another approach, lean benchmarking, single session, explain and
analyze buffers, focus on buffers, I.O., and so on.
Similar here.
And first class of testing is needed for infratasks, mostly upgrades and so on, to compare the
whole system, lock manager, buffer, pull behavior, everything.
File system, disk, everything.
But it's needed only, as I said,
like once per quarter, for example.
Of course, for every cluster,
if you have thousands of clusters,
you need almost daily, I think, right?
To run such benchmarks.
And similar, these upgrades,
major upgrades and so on,
these tasks go together usually.
You need upgrades,
so you need to do benchmark.
But for small schema changes,
you do it every day,
multiple times maybe, okay, once per week maybe,
depending on the project.
You release often.
You develop your application quickly.
You don't need full-fledged benchmarks,
and you also probably don't need full-fledged blue-green deployments, right?
But maybe you need, I don't know, maybe you need I don't know maybe you need
still need it
this is where I said
I have open-ended questions
like what should we use for
better testing
because I could like
if we are okay to pay
two times more
we could have two clusters
with one-way replication
but when we switch perform switchover zero downtime switchover, immediately we set up reverse replication.
So, real blue-green approach.
In this case, probably we could use them for DDL as well.
Of course, DDL should be solved.
But we can solve it applying DDL manually on both sides, actually.
This unblocks logical replication.
So we just need to control DDL additionally, not just alter.
We need to alter there and alter here.
In this case, probably it would be a great tool to test everything.
And then if we slightly diverge from blue-green deployments idea,
but use A-B testing idea,
so we point like 1% of traffic to this cluster,
read-only traffic only.
I'm not going to work with like active-active schema,
like multi-master, no, no, no.
So then we can test at least read-only traffic for change schema.
But again, there will be a problem with schema replication
because logical replication is going to be blocked.
We need to deploy the schema change on both.
It's not only about the lack of logical replication of DDL.
It's also about even if DDL would be also replicated, if you deploy
it only on one side, it don't
deploy on another side. Logical replication
is not working. Or it replicates it,
right? So I'm not
quite sure.
Actually, we can drop index on the
subscriber. Or we
can add a column on subscriber.
And the logical replication
will be still working.
But some certain cases of DDL
will be hard to test in this approach.
But still, imagine such approach.
It will be full-fledged blue-green deployment
with simple, like, symmetric schema,
simple switch back and forth,
reliable.
I don't know
maybe it's a good way
to handle
all changes in general
we just paid two times more
but for some people it's fine
if the costs of error
and risks are
costs of problems
are higher than this
what do you think?
yeah this is a tricky one. The first database-related company I worked for did a lot of work in
the schema change management tooling area, not for Postgres, but for other databases.
And it gets really complicated fast just trying to manage versions between, like, just trying to manage deployments between versions of maintaining data.
And the concept of rolling back is a really strange one.
Like, going backwards, let's say you've deployed a simple change.
You've added a column for a new feature.
You've gathered some data, does
rolling back, like maybe temporarily involve dropping that column?
I don't think so, because then you destroy that data.
But then it's now in the old version as well.
And there's this weird third version that I often talked about in the past, rolling
forwards rather than rolling back.
And I think that's gained quite a lot of steam in the past rolling forwards rather than rolling back. And I think that's gained quite a lot of esteem in the past few years.
The idea that you can't, like with data, can you actually roll back
because do you really want to drop that data?
Yeah, you know, dropping column doesn't remove data, you know, right?
Because that's why it's fast.
But it's not a story.
Well, this approach with reshape and now how this new tool is called to handle DDL in the reshape model.
When it's similar to what PlanScale does with MySQL, the whole table is recreated additionally.
So you need two times more storage.
And we have a view which masks this machinery, right?
And then we, like in chunks, we just update something.
There you can have ability to roll back, right?
Because it's maintaining it in both places?
Because for some period of time, you have both versions working
and synchronized inside one cluster.
But the price is quite high
and views have limitations, right?
But here, if we talk about
like we're replicating whole cluster,
the price is even higher.
Yeah, and the complexity is even higher, I think.
Of course, yes.
Managing it within one database feels complex.
It's called PgRoll.
Oh, nice.
A new tool which is a further development of that idea of the Reshape tool,
which is not developed, as I know,
because the creator of that tool, Reshape,
went to work into some bigger company,
not Postgres user, so unfortunately.
So I don't know.
The problem exists.
People want to simplify schema changes
and be able to revert them easily.
And right now we do it hard, I mean, hard in terms of physical implementation.
I mean, if we say revert, we definitely revert. But dropping column is usually considered as
non-revertable step. And usually it's quite known. like people usually design in larger projects they
usually design so like first application stops using the column and a month later you drop it
and then already you know so i'm i'm actually talking about adding a column which is way more
common i'm talking about adding a column because if you
need to support rolling back, that becomes dropping a column. Okay, so what's the problem?
Data loss if you do roll back or what? Yeah. Oh, you want to move forth and back then forth
again without data loss? Possibly. You want too much. Yeah. but so i think that's like we talked about
blue green deployments right let's say part of what you're doing is rolling out a new feature
and so you roll it out for a few hours and some of your customers start using that feature but
then there's a major it's causing a major issue in the rest of your system, so you want to roll back,
does the use of that feature,
are we willing to scrap those users' work in order to fix the rest of the system?
I think people would want to retain that data.
Yeah, well, let's discuss it in detail.
First of all, on subscriber, we can add a column.
If it has default, the logical replication won't be broken
because it will be just inserting, updating.
Okay, we have extra columns, so what?
Not a problem, right?
But when we switch forth,
setup of reverse replication will be hard
because we have now extra columns and our old cluster doesn't have it.
So we cannot replicate this table.
Unless we replicate DDL, which if we start replicating DDL backwards, then we're kind of reverting to our existing state, which is strange. This is one option, yes. And another option is to, I know there is an ability to fill the rows and columns, I guess, right?
So you can replicate only specific columns, right?
I never did it myself, but I think this is…
Yeah, yeah, yeah.
So if you replicate only a limited set of columns, you're fine.
But in this case, moving forth will like you you lose this data and it's similar to what
you do with fly with sketch liquid base or redis migrations usually you define up and down or
like house upgrade downgrade steps in this case you create column alter table add column then
other table drop column and if you go if went back, of course you lost data which was inserted
already, and it's considered normal
actually, usually.
Well, yeah,
but that's my background as well,
is that people often wouldn't
end up actually using
the rollback scripts. What they would do is roll
forwards. They would end up with
an old version of the application, but the
column and the data are still there
with the database.
You talk about people who closely
like you
talk about companies who
are both developers and users of
this system. But if you imagine
some system which is developed
and like, for example, installed in
many places, some
software,
they definitely need a downgrade procedure to be automated,
even with data loss, because it's more important usually to fix the system and make it alive again.
And users in this case don't necessarily understand details
because they are not developers.
And it's okay to lose this data and downgrade, but make system healthy again.
In this case, we're okay with this data loss.
Well, yeah, but I guess going back to the original topic, you asked do I think blue-green
deployments will take off in database world?
And I think it's the switching back that's tricky.
But I don't want to diminish this work that's been done here,
regardless of what we call it,
because I think it will make more people able to do
major version upgrades specifically with less downtime
than previously they would have been able to,
even though it will still be a little bit.
Yeah, I don't know.
Maybe we need to develop the idea further
and consider this blue-green concept as some intermediate.
It reminds me of red-black trees, right?
Like binary tree, red-black tree, then AVL tree and so on,
and then finally binary tree, black tree, then AVL tree and so on. And then finally B3.
And this is like development of the like indexing also approaches,
algorithms and data structures.
So maybe like closer to self-balancing than a lot of children for each node.
Maybe here also like it's very distant analogy, of course,
because we talk about
architectures here, but maybe
these blue-green deployments
or green-blue deployments,
I think we should start mixing this
to emphasize
that they are balanced, right,
and symmetric.
And also, like,
say, tell RDS
guys that it's not
fair to consider one of them
as always source and another as always target
we need to balance them
so I think
there should be some new concepts also
developed so it's interesting to me
I don't know how
the future will
look like
also let me tell you a story about
naming. In our systems we developed, we chose like we know like master, slave in
the past and primary, secondary or primary standby official Postgres
documentation follows this terminology right now. Writer, reader in Aurora terminology.
Also, leader, follower, Patroni terminology.
Then logical application terminology, publisher, subscriber.
Here we have blue-green, right?
In our development, we chose source-target clusters.
And it was definitely fine in every way,
monitoring and all testing, like everyone understands,
this is our source cluster, this is our target cluster.
But then we implemented reverse logical replication
to support moving back.
And it was like source target clusters naming
showed it's the wrong idea immediately, right?
So I started to think in our particular case,
we do it like we set up
these clusters temporarily.
Temporarily might mean
multiple days, but not persistent,
not forever.
In original Blue-Green deployment, as I understand
Fowler, if I understand correctly,
it's forever, right? We just, this is
production, this is staging, then we switch.
So I chose the
new naming is old cluster, new cluster,
right? But if it's persistent, that's also bad naming. Maybe blue-green is okay, green-blue,
blue-green, but definitely, yeah. Why don't you use the Excel naming convention with the final,
final V2 at the end? This is the final server. This is the final, v2 at the end this is the final server
this is the final final server
so naming is hardly known
right
wonderful
I enjoyed this thank you so much
thank you
thank you everybody and catch you next week
yeah don't forget
to like share subscribe
share is the most important I think or like is the most
important what is it like i think comments comments are the most important to me anyway
like let us know what you think in youtube comments maybe or on twitter or on mastodon
you know i wanted to take a moment and emphasize that our we continue working on subtitles
subtitles they are great.
They are high quality.
Yesterday, I asked in a Russian-speaking Telegram channel
where 11,000 people talk about Postgres,
I asked them to check YouTube
because we have good quality English subtitles.
They understand terms.
We have 240 terms in our glossary.
We feed our AI
based pipeline to generate
subtitles. And I wanted
to say thank you to my son who is
helping with this actually. Who is
still like teenager
school but also learning
Python and so on. So
YouTube provides automated generation to
any language. So to
me the most important is sharing because this delivers our content to any language. So to me, the most important is sharing
because this delivers our content to more people.
And if those people cannot understand English very well,
especially with two very weird accents,
British and Russian.
Yeah, sorry about that.
Yes.
So it's good to just, on YouTube,
to switch to automated generated subtitles in any
language.
And the people say it's quite good and understandable.
So share and tell them that even if they don't understand English,
they can consume this content.
And then maybe if they have ideas,
they can write us.
Perfect.
This is,
this is the way,
right?
Thank you so much.
Bye bye.