Postgres FM - Sharding
Episode Date: August 11, 2023Nikolay and Michael discuss sharding Postgres — what it means, why and when it's needed, and the available options right now. Here are some links to some things they mentioned:PGSQL Frid...ay monthly blogging event https://www.pgsqlphriday.com/Did “sharding” come from Ultima Online? https://news.ycombinator.com/item?id=23438399 Our episode on partitioning: https://postgres.fm/episodes/partitioningVitess https://vitess.io/Citus https://www.citusdata.com/ Lessons learned from sharding Postgres (Notion 2021) https://www.notion.so/blog/sharding-postgres-at-notion The Great Re-shard (Notion 2023) https://www.notion.so/blog/the-great-re-shard The growing pains of database architecture (Figma 2023) https://www.figma.com/blog/how-figma-scaled-to-multiple-databases/Timescale multi-node https://docs.timescale.com/self-hosted/latest/multinode-timescaledb/about-multinode/ PgCat https://github.com/postgresml/pgcat SPQR https://github.com/pg-sharding/spqr PL/Proxy https://plproxy.github.io/ Sharding GitLab by top-level namespace https://about.gitlab.com/handbook/engineering/development/enablement/data_stores/database/doc/root-namespace-sharding.html Loose foreign keys (GitLab) https://docs.gitlab.com/ee/development/database/loose_foreign_keys.html ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork
Transcript
Discussion (0)
Hello, hello, this is PostgresFM episode number 58, and my name is Nikolai, and together with me
Michael. Hi, Michael. Hello, Nikolai. And we are going to talk about sharding today. Sharding,
sharding. Two big experts of sharding are here, are going to discuss sharding.
Yeah, definitely not an expert here, actually this came up last week as a topic
somebody suggested partitioning and sharding as a topic for a monthly blogging event that i'll link
it up in the in the show notes but yeah we've done an episode on partitioning i thought it was a
really good one i really enjoyed that and learned stuff and i think we got some good feedback on it
this though is sharding and i think maybe that is a good place to start, like what the
difference is.
I tried defining it, you gave me some feedback.
How do you define them?
Well partitioning is when we split tables but we remain on a single node.
I mean we might have read-only standby nodes, but if we count only primary nodes,
which have read-write access,
it's only a single node.
But once we do similar split
involving multiple primary nodes,
we should call it sharding instead of partitioning.
And the name, I guess, came from some game in the past oh really
yeah it's from it's some some game online game and they called shards some i should not lie
i was not prepared to explain etymology here but my part of my brain says it's from some game, from gaming,
and they call shards some parts of the world, of gaming world.
It's basically if we move this concept to databases,
we have database sharding.
I really like your definition.
It's the one I tried to go with in my blog post. So partitioning is at the table level,
splitting what's logically one table
into multiple tables. And sharding is the same, but at the database level. So it's splitting
what can logically be thought of as one database, but is actually behind the scenes, multiple
databases. So that makes sense to me, except, and I think it's worth warning listeners, lots of blog posts you read out there, definitions, even on Wikipedia
and other places that are normally quite accurate,
they do often use the word partitioning in places that I think
would more accurately be described as sharding.
So it is confusing out there.
The CAP theorem, it's official terminology
network partitioning right so and network partitioning yeah i mean in negative connotation
it's uh like in our case we should call it network sharding but it's like doesn't make sense also
there's some confusion around vertical versus horizontal. And, well, I don't understand why
confusion but I see people try to say vertical sharding, which actually just vertical split of,
like, maybe functional split of databases to multiple databases. For example, you see weak
connection between two groups of tables. We could say clustering here as well and actually
we can apply some machine learning approaches like basic k-means to try to automatically detect
good options for vertical split of databases into sets of tables with weak connections weak
means like almost no foreign keys between them and also dynamic relation when we have
two tables involved in the query, it's also a connection.
Though it's not static as foreign key, it's dynamic.
So I used it.
I used machine learning to try to help people understand how to split databases vertically.
Vertically means, okay, users are here,
products are there, right? And connection between exists, but we don't do it often. Well,
maybe it's not a good example. But I think it's, it's clear when we have different data in different
parts, different primary nodes. But horizontal means we basically take out our table and split
it using like horizontal line, right? I think the partitioning case is actually
simpler to explain. So horizontal partitioning is at the row level. So
we're taking some rows from the table and putting them in a different node,
sorry, in a different table and vertical would be column based.
So we're splitting the table based on the columns into two different tables.
Now that's,
it's the same,
but for sharding,
right?
Like we don't have the same column.
It is,
it's kind of still columns and rows,
right?
We're just taking columns as a whole table and putting them somewhere else.
And in the horizontal case,
we're taking row-level data.
So we have the same tables across different nodes,
but different rows in each one.
Vertical column level, okay.
It means like if you studied database theory,
it should be called like projection.
When you have only limited set of columns and the other columns go to different
table, you probably have one-to-one connection between them, relationship, sorry. I agree.
And if you do it at server level, it's just functional split, which is called functional split. And this is a normal approach to scale your
system. If you prefer micro services or just midsize or big
size services approach when you split some functionalities in
one database and other functionalities in another
database, they have some connection, of course, maybe
they have some common part, and you need to take care of replication. But majority of data is very different in nature, versus okay,
row level means like, we have same sets of tables, but different data because some part of data goes
to one node, some different part of data goes to another node. So how do we do that? Let's talk.
Well, do you want to do how or I was thinking maybe we could start with why? Like what? Why?
Why is simple? Why is simple? Well, I think there are a few reasons to be honest. I think
there's one big difference. Performance.
Well, scale, right scale right like when you say
performance it's um they are related we're talking about that i like it i think it's the main reason
you would ever want to shard which is hitting a max like let's say you're super keen to stay
on amazon rds postgres and there's a max size that you can provision. And on some basis, you're getting close to maxing that out.
Maybe your CPU's going up, maybe your ingest rate.
There's some level in which you're scared,
well, we can't continue to grow this vertically.
We can't scale up anymore.
Is that what you're talking about?
You just used the word vertically in different
sense? I mean, vertical scaling when we just add the resources, more CPU, more memory,
better disks.
So yeah, that's what I meant. What I meant is, if we get to a point where we can't do
that anymore, we need to think about sharding.
No, not necessarily like, again, like you just used again i will repeat
you used the word vertical scaling where vertical word is in different sense compared to what we
just discussed when we talked about columns you and RAM, vertical scaling, we probably
still can split our system vertically in terms of tables and columns and have different parts
of our schema and different primary nodes. Yeah, I agree. But I thought we were counting
that as a type of sharding. I don't like to call it sharding at all
but i see people do it vertical sharding okay for me it's like just functional split
or like go in microservices architecture or something and it has some of problems uh which
this approach has uh are similar to normal sharding. For me, normal sharding
is horizontal sharding. When we don't specify vertical or horizontal, we mean, usually we
mean horizontal sharding. Same scheme everywhere, but different data.
I completely agree. Same with partitioning. If you don't hear somebody say horizontal
or vertical, they mean horizontal.
Right, right. And vertical partitioning also, like, it's not common. Some people use it again.
Yeah.
It's just, it's an attempt to have some unified terminology for everything. But okay, so
we cannot live with one primary node, we are saturated, we tried everything like offloading read-only
queries to standby nodes, reduced frequency probably with some caching, with some optimization
and so on, we still see that we are growing and soon one node is not enough, it's very
painful situation, very scary, especially for CTO and so on,
to suddenly to hit the ceiling.
And of course, in this case, usually people choose one of two directions
as the main one, like again, vertical split or sharding right away.
But sometimes they need to mix.
If it's a really large project, you start with like you bet
on one approach, but still need to apply the different one as well.
Yeah.
Right.
So if you like microservices, and microservices is a bigger than technological topic.
It's not just technological microservices is organizational topic, you need to change
management and how teams are organized yeah how they choose
technology they probably choose not progress some people some teams who might choose not
but something else and so on so it's a bigger than just technical discussion but if you choose
microservices probably you don't need sharding right it's they they microservices approach either solves it this
problem of saturation and inability to scale or it just postpones it so much that you have like
five years or so right maybe like the thing the thing i don't understand about microservices is
like what if you've got one thing that's very difficult to split logically and that is the most heavy ingest rate of everything?
So you could easily have quite a small team
looking after one huge node in the microservice infrastructure.
So I suspect you could still hit this.
If you've got a load of IoT sensors or something,
you could get a lot of data very quickly.
So yeah, I think it's possible even in the microservices.
I cannot agree here.
If you, for example, e-commerce and a huge one,
like a leader, a continental leader, for example,
and I have a few examples, Postgres-based,
which I worked with directly or I just learned from them
based on discussions with the people involved.
So if you choose microservices, and e-commerce,
most of e-commerce systems somehow tend to choose microservices approach.
They love it.
I mean, engineers, backend engineers and managers,
they usually choose microservices.
In this case, it's very hard to imagine that one of services,
which usually you have something related to registration
inventory orders like and so on many many services in typical e-commerce it's hard to imagine that
one of them will require sharding right away you need to grow a lot to see the need in sharding of
one of those services it should be really. I agree for e-commerce.
I think there are some sharding
or things that get called sharding
that are horizontal
but have a different primary driver, I think.
And I think that's analytics query performance.
Ah, okay, okay.
I silently reduced the topic to OLTP as usual for me.
Yeah, okay.
But I agree, OLAP, analytical systems, they can have a lot of data,
and usually sharding there is definitely a good thing to have.
So I agree with you here, definitely.
Cool.
So a couple of reasons, at least uh for wanting this but yeah i
think you yeah how is probably a good thing to move on to like what are our options right how
so unlike my sql world where vtess exists we don't have vtess in postgres and attempts to
migrate vtess to postgres failed i know a few and the developers of vtess announced that they
don't pursue this goal anymore but we have a few options situs first of all and again like i i had
a joke about we are big experts because i consider my myself not an expert in sharding at all because
i reviewed situs and played with it a couple of times in the past, but it was before
Microsoft has decided to open source everything. So I always
considered only the open source part because I didn't want to be
to have vendor lock in to Azure. But right now we have interesting
situation, they open sourced everything. So it's worth reviewing once again,
especially the feature, the very important feature for large and
growing projects, rebalancing without downtime. It's super
important for sharding, because you never know, which node will
grow, it's hard to predict. So you need some tools to change distribution of data in the future, but without huge
downtime. And this feature was originally only in paid version
of Citus. But now it's open source as well. But I like
up to date knowledge. For me, if we talk about LTP, I usually
asked, I talked to them
to Cytos team couple of times and
asked please provide cases of pure
OLTP good
like heavily loaded systems
with OLTP workloads
but everything they provided
that time it was like 2, 3, 4 years ago
it was looking as
HTAB to me not
OLTP you understand what so hybrid hybrid transactional
analytical for example some search an analytical engine for videos or something where it's okay
like only limited number of users and they are motivated to wait a couple of seconds in this case
it's okay i mean to have some latencies. But in OLTP,
we have only usually dozens of milliseconds, or just hundreds of milliseconds, but not a second.
A second kills our traffic and people go away. So we cannot wait, we cannot allow waiting so long.
And when I benchmarked myself, Cytos was not behaving very well for oil tp but
then i've got some response on twitter from developers that i do some things wrong so
very likely i did some things wrong i mean i i was trying to measure latency overhead this is my
favorite test for such systems because the biggest problem for performance when like splitting is not that
difficult but when you have something that decides which shard to go router right it adds overhead
because it needs to parse the query and parsing query and so on requires time so this adds overhead
and for me in all tpks it's unacceptable to have overhead more than
a millisecond. Millisecond is quite big already. What were you seeing? I don't remember details.
Okay. It was a few years ago. Imagine if it was, and I'm not saying it is, imagine if it was tens
of milliseconds, you would deem that completely unacceptable. Yeah, definitely, because it's already very close to human perception threshold,
which we know 200 milliseconds.
And we need probably multiple queries
to serve one HTTP request.
So I cannot accept that overhead at all
because I know my backend engineers
will add milliseconds on their own.
They know how to do,
how to add more milliseconds.
So I cannot allow this proxy
middleware, right? I cannot allow it to have significant overhead. But again, my benchmarks
very likely were not ideal because of this feedback. And I never tried one more time.
Probably it's time to benchmark again
and see the overhead of sito's proxy and you're a few years out of date right if they open sourced
it a little while ago so you're at least that much out of date and i think the latest version
included schema level sharding which seems quite interesting for some like OLTP type split. Yeah, exactly. So that it's quite vertical,
right? If you if you were considering if someone's considering sharding, and they're on Postgres,
they're gonna look at Citus, they should look at Citus. But there are other options as well,
right? Like that's the I will I would revisit this decision right now for ltp heavy
loaded ltp system but again benchmarks are needed once again and probably some tuning and so on
and i still don't know very good examples of ltp systems which use cytos but i i'm not
closely watching i looked at their customer look look at some of their customers, and some of them are very analytics-heavy,
so like Algolia for search, Heap.
I believe Microsoft used them internally for quite a few things.
But Freshworks stood out to me as potentially,
like that's a help desk ticketing system, right?
Sure, they need to search as well, but a large part of that is OLTP.
So that seemed interesting to me as well, but a large part of that is OLTP.
So that seemed interesting to me.
Yeah, that's interesting.
But analytical area, let's look at Hydra, which recently released 1.0.
Congrats to the Hydra team.
Interesting, column-based.
And they inherited this code from Cytos for column store.
So it's interesting i looked at the hydra website in preparation expecting them to have something around sharding but i think they've fought situs
to do the other parts of what situs does well like the analytical processing but no no mention
of sharding anywhere on the hydra website maybe at some at some point at some level you don't
really need it if you have column store and vectorized processing. So I'm not an analytical guy, I'm not
a sharding guy. I honestly don't understand what we are talking
here about. Let's move on because we do have some really
interesting cases and write-ups and blog posts from a recent
OLTP company. Notion and Figma both blogged relatively recently
in the last couple of years about sharding Postgres.
In a way which I call application server sharding,
A-S-S, which I also implemented myself a few times.
And this is what you usually choose
because you don't have proper tooling.
And we probably cite this as proper tooling for LTP.
Again, maybe someone has good experience or knows about some good experience.
I would like to know.
Please comment anywhere, like on Twitter or on YouTube or anywhere.
But application server sharding or application level sharding, I like application server
sharding because it's about this side. Application side sh side sharing. Sorry. This is challenging and requires an effort. Definitely.
And usually it's quite easy to split and so on. But you need to think about several
things as usual. First is this rebalancing in future you will need it definitely.
And how to do it without downtime second is how to avoid
distributed transactions for example it's absolutely bad idea to have multiple connections
to different primary nodes and work with them like in some messy way you start transaction you
work with different connection it also has transaction. If you do that, if you absolutely need it, you need two-phase commit, 2PC.
But it's slow.
So you cannot have thousands of TPS on 2PC at all.
It's impossible.
So it's very slow because it has its own huge overhead.
So usually in all TP context, we try to avoid to pc unless absolutely needed
right and finally you also need to take care of this router and should have small overhead
one millisecond is good probably that's it high level yeah i i think the notion blog post in
particular there's two there's two blog posts
actually from notion one from 2021 where they initially did this and they shared a huge amount
of detail in the preparation how they chose things how they were preparing for resharding
late today rebalancing sorry and then there's a follow-up blog post from this year from them
talking about resharding without doubt to all with i think less than a
second of noticeable impact on users which is incredibly impressive but they they um they
might have even used the word partition key or something they chose to shard based on what's it
called um like workspace because people aren't ever people aren't ever looking for information from two workspaces at the same time.
So you don't have that same problem.
Yeah, that's actually the same as with partitioning.
It's very difficult.
And I saw cases when people spend a lot of time trying to find the ideal partitioning key
or sharding key in this case, or a set of keys.
For example, unlike partitioning, where wearding key in this case or a set of keys for example unlike partitioning we
where we can for example choose one key and it's enough we also need to think about how
right workload will be distributed among multiple nodes here and for example we know
timescale cloud they they unfortunately not an open source they have basically sharding as well
and there you need to have two keys one is time based timestamp but it's not enough why because
if you just use only timestamp you will have hotspot one shard will be receiving all most of
the rights all the time so you need additional for balancing you need the second level of Second key basically is a part of complex key
So for example a workspace ID could could work here as well time scale multi nodes a really good point actually
I was looking up and it looks like you can self host it. So even if it's not open source by different
It's interesting news to me. Okay
According to the yeah, I was reading some I'd only I also thought that wasn't true until I checked the docs.
What is the license?
I didn't check, sorry.
They have two licenses, Apache and Timescale,
which is not open source.
I very much doubt it's not.
I don't think they're doing anything on the Apache one anymore.
I think that's their old stuff.
But anyway, let's not guess.
I'll link up with the
docs and people that are interested can look into it themselves this episode is full of guessing
but let's let's return to this main topic you need to take care of these things and you need to
think about as usually when you're architecting something you need to think about how much
of data you will have in five years and how will you approach
rebalancing with minimal cutover time so you need probably involve logical decoding logical
replication it's improving in latest postgres versions but it was a surprise for me in one case
when it was not about sharing it was about vertical split of a quite big system. And I was considering
logical replication right away to perform a split. But split was like in two parts, like 50-50.
And in this case, it's easier to use just physical replication and then drop tables on both sides,
which are not needed. Because it's easier to install. It has less bottlenecks, fewer bottlenecks, and so on.
So in some cases, just physical replication, and then you drop tables you don't need.
And also, a lot of balancer and lag detection is interesting there.
So let me add one more item, which is quite a huge item.
If you go, it doesn't matter, microservices or sharding,
you're going to have many more nodes.
And in this case, operational side, you should be better.
Autofailover, backups, provisioning, balancing,
everything should work much better more
reliably and in this case it means that you need to simplify if you rely on managed postgres
probably it's also okay but you need to trust it 100 and so on but if you manage postgres yourself
before increasing the number of primaries you need need to unify, for example, naming convention. It takes
a lot of time if you have different schemas of naming of hosts, for example, for different parts
of your system. And then you design some very good tool, which, for example, performs minor or
major upgrade or something. And then you bump into issue of deviations. So you need to simplify,
unify everything, because you are going to have many
more nodes now yeah it's a great point it's not cheap to add sharding adds a lot of complexity
so it's a good idea that's a nice point to simplify it's not about sharding it's about
just growing fleet in terms of clusters you have more postgres clusters, so you need unification and simplification and so on.
So I actually just thought of thinking about the other options out there.
I do remember hearing and reading up on EDB's product,
it kind of in this area called Postgres Distributed.
Now that kind of raises the point of a different use case for sharding,
which is, well, one of the big advertised features of that
is being able to keep data local to a region, for example
so like if you want to keep your data
bi-directional replication involved
so it's the new
let's detach this topic because it's very different and specific
it's not sharding
well, but by our definition of one logical database split into
multiple physical databases it kind of is but then you need to replicate in both directions and so
this is based on the claim that replaying logical replay is easier than initial apply of changes.
And so far, I haven't tested myself.
I saw it only in BDR documentation,
but I don't think it's so.
I think it's more marketing claim.
I didn't see benchmarks.
So let's keep aside bidirectional replication completely and return to this topic in a separate episode.
And because we also don't have time for it right now,
I also wanted to mention
two different tools.
First is PGCAT.
If you want to
shard yourself,
PGCAT already offers
a simplified approach because
it provides sharding in
originally provided sharding in explicit
form. Application needs to say, okay, this needs to be executed.
And I know on which shard in the comment.
So like basically just some helper tool.
But you need to take care of a lot of things yourself.
But I saw this, they improved, improved it and some kind of automatic routing already there.
And also overhead is quite low.
I tested that long ago again, like a year ago maybe.
And overhead was not bad at all for LTP.
It's written in Rust, so quite performant.
Interesting. I would look at it.
And it's been developed quite at good pace.
And another is SPQR.
Also under development, I watch changes in both repositories
and I see a lot of development happening in both projects.
And this project was developed with the idea of having more automated sharding tooling,
similar to VTGate and VTES.
Yeah, would you call that...
Do we need another name for it?
Is it almost like pooler-level sharding? If we've got application-level sharding, is would you call that it would do we need another name for it? Is it almost like pooler level shot? If we've got
application level sharding? Is this pooler level?
Well, well, yes. Well, if in this case, we can distinguish
like application level sharding application side sharding, it's
when backend engineers is responsible for everything,
basically, or almost everything additional like software can be can be
put like in transparent fashion yes we need to take care of rewriting some queries because we
don't want to deal with multiple shards at the same time yeah yeah often but we can distinguish
at least two two types of this middle, which helps us with sharding.
First is like, right, like VTGate style or this SPQR style or PGCAT when something lightweight is placed, and it doesn't include Postgres code, or at least majority of Postgres code.
In this case, this tool needs to parse queries, Postgres code. In this case, this tool needs to parse queries,
Postgres queries.
Postgres syntax is very, very advanced.
So it's a challenging task.
Grammar is huge.
And another approach is placing a whole Postgres node
in between.
And in this case,
I think latency overhead
is quite significant.
And this is what PL Proxy in Skype
15 years ago,
developed more than 15 years ago
was doing and it's quite interesting approach I'm not sure about overhead but it has limitations
that you need to rewrite all your queries in the form of server functions because it's kind of
language PL proxy it's a language which similar to map reduce approach but overhead is interesting
but at the same time skype was altp definitely and overhead requirements were quite strict
it's again it's similar to pgq a lost art or lost knowledge well if you like i was thinking we could
end on like where do we think the future's going a little bit
and i think vites is a really interesting case i do think we're maybe at a point where the
postgres based startups that started in the last 10 years or so are really at a scale that
youtube were at when they started needing this and they started building Vitesse.
And I wonder if maybe that is what we're starting to see with the likes of PGCAT, that we're starting to see
some of these companies that have been built on Postgres
really needing some better tooling here for their own use cases.
Or, you know, as you said before, the bazaar of different options,
lots of people will build their own tooling and maybe
one of them will emerge like vites did for mysql and we'll have that in 10 years time it'll be the
same place maybe i don't consider pgcat as a sharding solution it's more like puller but i know
with very lean approach to development when they decided why not having this, for example. And unlike PgPool, they don't aim to solve every problem completely.
For example, this explicit approach when you, as a backend engineer,
is putting a comment on which shard to execute,
so you're responsible for routing.
It's manual routing, right?
It works quite well well it's a simple
feature why not right so i think with posgas somehow very good sharding solution never existed
and vtes has many features which they usually are overlooked for example asynchronous flow of data between different parts of system.
And if we consider a huge project, we shard different parts differently.
So basically, we're already splitting two services, right?
So we need to shard users in one way and products or orders in a different way.
In this case, we have basically two vertically split parts and already then we split them
horizontally.
But in this case, we want to avoid dealing with multiple nodes.
So why not having some kind of materialized views, which will would bring data asynchronously
with some delay, but quite reliably.
So we could work with local so i mean if you imagine some sharding system with
with ability to have different sharding keys for different parts of schema plus you add
materialized views with ability to be incrementally updated not not fully. And the ability to be incrementally updated asynchronously
between nodes, plus also global dictionary,
because sometimes you need some global dictionary
and also ability to synchronize everything.
This is something like Vitesse has already
in my SQL world.
But Postgres somehow don't have it.
And I think it's possible to build it from bricks
it will take time, all these things
but somehow
full-fledged solution doesn't exist
again, I'm lagging in terms of
Cytos understanding
because for me
there is a requirement on the overhead
if it's not passed, I'm already
looking at different direction
right now now I would
revisit sitos revisit pgcat and spqr and if tested in my for my case and then go with
application side sharding once again unfortunately last question from me I think I saw for to
have complete picture there's also a third type of sharding when we
maybe fourth type of sharding when we don't shard in ourselves when we don't have middleware
but we decide to split our system to multiple systems if you talk about sharding you talk
usually about database split but what if we split whole system?
For example, we had only one website, but now we have 10 websites. But they have, for example,
unified authentication subsystem. So you have one login, it works on each one of those 10 sites.
And this can probably solve your you mentioned this problem, to have data closer to your customers.
For example, you could have one website for one country, another website for a different country.
Maybe you have hundreds of websites.
They have a single login system, and they have different databases.
In this case, you split horizontally also, right?
But you split not at database level, but at whole system level.
So they have everything separately.
What about this approach?
Yeah.
And by the way,
it's not just about latency
of having data close to users.
It's also about residency,
privacy, legals.
Yeah, exactly.
But this approach is interesting
for SaaS systems systems maybe like notion
yeah well but interestingly they didn't like they've uh there are a lot of benefits of keeping
it as a single application right simplicity of management or like having let's say i'm part of
one company in the u.s and part of a different company in Europe, I can, I can have
my work so I can log into the workspaces within the same UI very easily. So I suspect you can do
it even if it's two different applications, you just need a single authentication authentication
system, it's possible to do. So you use one login, it's for Google do so you use one login it's for google services you
use one login for many many different services right it's it works similar here we can have
different systems they are the same if we forget about the legal details they are all the same but
you'll log in to all systems using single login it works everywhere like seamlessly. Yeah, I don't know if it would be quite as easy.
It's not easy.
For example, okay.
It's not easy, but you can grow a lot using this approach.
Like scalability is infinite.
I think I saw a talk by GitLab
talking about splitting systems out.
I think there might be a recorded version of it,
but I suspect you know more about it than me.
Which route did they go?
Well, disclaimer, they are still my customers.
Yeah.
So I would recommend checking their documentation.
It's very transparent.
Most of the things are public.
They split vertically first.
And it's obvious for systems like GitLab
because if you think they have a lot of functionality,
but some kind of functionality is quite automatic
and it's related to CI-CD pipelines.
And this is exactly what was moved to different database,
CI-CD data.
It's coupled very well inside CI part, but it's not that connected to different database. CICD data. It's coupled very well inside CI part,
but it's not that connected to different parts.
Oh, by the way,
check out what they needed to do with foreign keys
when performed this split.
Because they needed to create a new concept.
I forgot the name,
but basically how it works,
it's like a synchronous foreign key check
between two databases.
So it's quite interesting concept
which helps you preserve at least something
when you move data to different nodes.
And this is useful.
This experience is useful probably
for other systems as well.
And right now I think they go to direction
probably I don't need to discuss
because it's still under development.
But you can check their website documentation
and open issues, a lot of information is open.
Yeah.
I mean, foreign keys are notoriously difficult.
I think that's something Vitesse still is working on.
It might be soon even.
I forgot the name, unfortunately, but there is some
name
GitLab invented,
introduced some
new concept.
We can find it and link it up, right?
Yep. Awesome.
Thanks so much, Nikolai.
Thanks, everybody, and catch you next
week. Bye-bye.