Coding Blocks - Designing Data-Intensive Applications – Single Leader Replication
Episode Date: June 7, 2021We dive back into Designing Data-Intensive Applications to learn more about replication while Michael thinks cluster is a three syllable word, Allen doesn't understand how we roll, and Joe isn't even ...paying attention.
Transcript
Discussion (0)
You're listening to Coding Blocks, episode 160.
We did it, guys.
160.
There it is.
1-6-0.
We finally got to a number evenly divisible by two.
We can pack it up.
We're done.
160 episodes in.
All right.
Bye.
These and other Mathema Chicken shortcuts for your life are available to you if you subscribe
to us on iTunes, Spotify, Stitcher,
or wherever you like to find your podcast apps or your podcast,
we're probably there to leave us a review.
If you can, we super appreciate it.
All right.
And visit us at codingblocks.net where you can find show notes,
examples, discussion, and other things like links.
Oh, and your feedback, questions,
and rants can be sent to comments at codingblocks.net or Twitter.
We'll get to getting this right one day.
You can follow us on Twitter at codingblocks or head to www.codingblocks.net and find all our social links there at the top of the page.
With that, I'm Alan Underwood.
I'm Joe Zag.
And I'm Michael Outlaw, www.
Dub dub.
Dub dub. This episode is sponsored by data dog,
a cloud scale monitoring and analytics platform for end to end visibility into
the performance of your entire containerized environment.
And Linode simplify your infrastructure and cut your cloud bills in half with
Linode's Linux virtual machines.
And Educative.io, learn in-demand tech skills without scrubbing through videos, whether you're just beginning your developer career, preparing for an interview, or just looking
to grow your skill set.
All right.
And so today we're going to be talking about data replication, which...
Data or data?
Well, it depends.
We'll get to that coming up.
What?
What?
I think it's data replication.
That's different.
Yeah.
Do you say database or database?
I think I'll switch it up.
No.
All right.
Sorry.
Whatever rhymes the best.
All right. I. Whatever rhymes the best. All right.
I'm in slower, quick.
So, so before we get into how to pronounce D A T A as we like to do,
we like to thank those who have left us reviews.
So with that, I think, I think outlaw,
you've got to do this because there's only a couple here.
So that's, this is you.
Yeah.
Yeah. So this butchering of your name is brought only a couple here. So that's, this is you. Uh, yeah. Yeah.
So,
uh,
this butchering of your name is brought to you by Alan.
So thank him,
uh,
you know,
hit him up for any rants on a Slack at Alan.
So from audible,
we have ash fish and anonymous user,
AKA Andreas.
Yeah.
Thank you very much.
Thank you for leaving those.
Just excellent.
Excellent reads.
All right.
Well,
on with the show then.
And,
uh,
I got to start off with a quote.
This was a quote that was in the book.
That was awesome.
So I just had to kind of throw it in there.
And it was specifically about this chapter.
And this is from Douglas Adams,
uh,
author of hitchhiker's guide to the galaxy,
uh, and a bunch of
other things the major difference between a thing that might go wrong and a thing that cannot
possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out
to be impossible to get at or repair ain't that the truth dude i loved that quote i think i read it like 10 times because that was
a whole lot of like back and forth on the wording yeah yeah but it's so true i can't it can't go
wrong yeah just gonna say so like british authors uh i mean they're just the funniest people in the
world right so yeah i i approve that i like it and yeah because when
something that's never supposed to break breaks it's always terrible and uh that's why we have
techniques for dealing with some of these problems uh like replication and so uh oh first i guess i
should say that we're continuing on with the chapter from designing data intensive applications
which is a book that we talked about to several episodes on,
on now we're getting to the second chapter or second section rather,
which is talking all about data.
Hey,
wait,
wait.
Do you realize this is a very important milestone for our show?
This is the only book that I think we've ever gone back to after we were
like,
okay,
I'm tired of doing this book for a while.
She's the only one.
Maybe for as long a period of time as it has been.
There have been ones where we've had a couple episode gap.
Right, like our shopping spree or something.
The last time we talked about designing data-intensive applications,
it was several episodes back.
Yeah.
Maybe even a year
ago yeah i guess this is important yeah that's so good yeah i picked up the book for some reason
and had to look up something i was like wait we should get back to this actually it is it is just
over a year april 12th was the last episode yeah still one of my favorite books by the way episode
130 that's so good uh i had highly uh highlighted
highlight this book too so like you're going back and reading it it's like kind of i love
highlighting books i know that's like some people think that's awful but i love it wait a physical
book with a highlighter yep interesting okay that's why you buy the physical book is so that
you can mark it up yep i mean just to like refresh some
memories here on this book like i think collectively like we all had positive opinions like super
positive opinions about this book we really liked it and if you haven't read this book uh you should
definitely get the book because like right away one of the awesome things about this book just to
refresh everybody's memory was uh it starts talking starts talking about like, Hey, you know, like how could a database possibly work? And like,
what might it possibly look like? And they start off, the author starts off with like,
well, let's just write a script that like, uh, logs some, some output to, uh, you know,
some input into a file and we'll just keep adding, you know, concatenating an, Oh, now we have read
problems. Like how could we read that data from that same file?
And, oh, now we have concurrency issues.
Like, how can we deal with these concurrently?
Like, you know, it just keeps on building from that simple idea of, like, writing a key value pair into a plain text file.
And then you get into, like, awesome thingss table tables and lsm trees and b trees and
all kinds of really cool concepts and so now we're going to dig into replication as it relates to
uh database data systems like alternative titles for this book could be how to write a database
or how every distributed application you ever heard of has been written, basically.
And I mean, just even the terminology that they kind of talk about with different things.
Like you can kind of like look at like a Mongo and say, oh, that's probably why they call that.
Or you look at a Postgres and go, that's probably why they call that that. And now like, you know, things like even when technologies use words that are kind of similar, like shards versus partitions,
you can kind of like think about the chapter on sharding and partitioning and kind of
think about the differences in those kind of
techniques and like the subtleties
between those. It's just interesting.
Hey,
we didn't talk about this before.
Do we want to do a book giveaway on this?
Oh, yeah.
Oh, snap.
It's been
a minute for sure. So if you leave a comment on episode 160, so that'll be codingblocks.net slash episode 160.
You'll be entered for a chance to win a copy of this fantastic book.
All right.
We're now rolling on.
All right.
Now we're going to start the episode.
We're talking today about replication in database systems.
And just to be very clear about kind of the definition of what we mean by replication,
we're talking about keeping copies of the same data on multiple machines connected by a network. And very specifically for this episode,
we're going to be talking about data that's small enough that can fit on a single machine.
So we're not talking about spreading out data over multiple machines because the data is too big.
We're talking very specifically about what it means to be able to fit that,
basically the whole data set on several different machines.
So each one can have the full data set available to it and why would you even want to do that
and uh one of the things i love about the book is it really kind of boils down questions like
that and gives you like really concise answers and basically there's three main reasons why
people replicate data one is to keep it close to where it's used. Caching is a good example here.
Maybe geographic data.
Yeah, I was going to say geography would be a better...
I don't know if I would call it caching
as much as just keeping it geographically close.
So I think about one of your favorite websites
that's around the globe, right?
Like a Stack Overflow or even a Google, right?
Google.com, right?
They would have instances that are closer to you
so that you can have quicker response times.
Actually, I don't know if Stack Overflow does.
Do they?
I think I remember they had a CDN as part of their...
Yep.
And you can kind of think of that as being, you know,
that's a content delivery network, which is basically a distributed cache,
which is a copy of data located all around the world
in order to make loading it faster for the people in those areas.
So if you ever want to know how to build the next Fastly
or content delivery network, hey, it's a great book to read.
Second reason is to increase availability by which we mean one goes down.
They were still available.
So your application as a whole is still generally available.
Uh,
so everything about that one from like a purposes of like,
you know,
a primary and secondary copy of your database,
right?
That's what we mean by availability.
And then one goes down, you can still read. Great. Uh, so, uh, copy of your database. That's what we mean by availability.
One goes down, you can still read.
Great.
So the third reason, the last reason, is you can increase throughput by allowing more access to the data.
If you've got a database, you can only allow so many connections at once.
If you need to allow more and more readers of that data. You need to add more nodes or more replicas.
And that can vary by technology that you're using, though.
Because some technologies, even though you have replicas,
you still only go through the primary for reads and writes.
Especially older database technologies, that would be true of.
And this is something I don't normally think of.
If you imagine having a database
that's so small that you can fit it easily on multiple computers but just having so many readers
of the database that you still need replicas just in case uh you know just because you can't handle
the number of connections you have like a mobile game or something it's really popular i mean stack
overflow doesn't do it that way but they actually have a very similar use case.
Because if you read some of their docs, they talk about less than 5% of their traffic are writes.
It's a very low percentage of it, and they have a ton of reads.
So if they had opted to go that route, they could have had a bunch of read-only replicas out there for pulling out questions and stuff.
They didn't do it that way, but that's a good example of where that might work out well.
And for data that doesn't change,
it's generally very easy.
It's essentially trivial to copy data that never changes.
So if you imagine the 50 states in the U.S.
or, I don't know, I can't think of anything that doesn't change
because everything changes.
But you can imagine, you know,
cases where you've got some sort of data set
or lookup values that just never change.
And it's trivial.
It's not even worth talking about.
You literally just copy it over
and then you've got a copy
and you never have to worry about it.
Everyone's happy.
The problem is when you do have changes
and how do you
keep the data in sync between all these different locations and there's uh only three popular
algorithms and what's funny about these algorithms is that they're old as the hills hey well i wanted
to just add one thing to that data that doesn't change.
It's easier if the data doesn't change because one of the things that they described in there was if you were to just do like a file system copy for it.
If it's not changing, then sure.
And that's the way like the CDNs, they can work like that, right? Because they can just distribute a copy of an image around the globe across the CDN network,
and it's the same image everywhere
as long as it still hashes to the same checksum, for example.
But in the case of a database,
that's not necessarily going to work as well
because the entire database not might
not be changing and that entire database might be like really large so that's where like you need to
have uh a more efficient way to do that so this is where we get into the three popular algorithms
well you know it's funny and so i was gonna say i'm so old that when i went to school
uh we had books this is anyway it's just paper. It's wood with ink on it from squids.
And in the math books particularly, you would be able to open up the first or last kind of cover or a couple pages.
And you would see these big charts that had things like square root values or sine and cosine values for certain numbers, things like that.
Those are examples of things where –
Logarithms, yeah.
You have these big tables that you can look at values
because calculators weren't as common back then.
Whoa, whoa, whoa.
You're making it sound really bad.
Hold on.
I know.
Calculators weren't available then.
Bucket party.
So, yeah, it's kind of an example of data that doesn't change and so it was really easy to
copy that stuff around i think i don't know i'm getting off on tangents i'm just thinking like
books when you publish them and ship them out this example of data doesn't change
yeah i think you get the point but uh the three popular algorithms for dealing with data that
does change and what's funny about these is the book mentions that the algorithms and discussion papers and a lot of the thought that happened around replication is really old, like 1970s, and hasn't really changed much since then.
Now, the practical application of these is still really tough.
So there's a lot of things to kind of consider and mature and polish up and a lot of things that still go wrong even then. So there's not like clear cut, easy
done. You just link to this dynamic library
and now you're replicated. There's trade-offs to make and we're
still focusing on getting it right even though we've known about the techniques for
many, many years. So can we just
talk about the elephant in the room here for a moment?
Like what in the world was going on in the seventies, man?
Cause that's when C was created as well.
Like why is it like so many good things happened in the seventies and we're
only just now like, you know, in the last couple of decades, like, Oh,
that's what those guys were getting at. Yeah. We should probably take,
we should probably like do more distributed data uh applications and and come up with uh ways to deal with that like that's
that's a pretty cool idea that they had back in the 70s you know five decades ago right well it's
because they weren't arguing about like uh css and html or you know taser spaces or anything like
they were just getting it done. So good.
Yeah.
So the three popular algorithms that we mentioned are single leader,
multi-leader and leaderless replication.
And today we're going to be focusing on single leader.
And then we'll be able to see how that kind of the principles and stuff that
we talk about today can kind of lead into those,
those other two.
We're sorry.
You have reached a number that has been disconnected or is no longer in service.
Okay, so speaking of distributed systems,
it seems like maybe Zoom is having some distributed problems tonight
because we just lost the call like midway through
and I've never seen Zoom
actually completely crash in the
middle of calls. That was a first.
Kudos to Zoom for having
gone this long without
that being a regular occurrence.
I don't know, Jay-Z,
where were we at? We were talking about
the elephant in the room with all the
cool things that were invented in the 70s.
Yeah, I ended up down a weird rabbit hole so it's kind of good that we uh cut all that stuff
oh i'll be sure to edit it back in 70s are pretty great though apparently and also pretty bad
so uh anyway i got some vocabulary for you so uh we call the group of computers that make up a
data system a cluster cluster Cluster of computers.
And each computer in that cluster, just generally speaking, we're going to call a node.
And each node that has a copy of the database is a replica.
That's weird because you said it was called a group of computers called a cluster,
but I always heard it as like a three-syllable word, but you only said two.
What do you mean? Cluster?
Say cluster
is a three-syllable word?
It'll hit you.
It'll come to you later.
Really?
Really, guys?
Why do I have all these puzzling looks
on me? Come on.
I can't say it on air.
Oh, I got it.
I got it.
I got it.
If you understand what Al is saying, you should leave a comment and maybe win a book.
No, no, no.
Don't.
Don't.
I don't want.
Don't say that.
Don't leave that comment on there.
Hey, but real quick.
So you said single leader.
Did we say multi-leader and leaderless also? I didn't hear that part okay it's probably when it was crashed so the three
popular algorithms we're talking about are single leader multi-leader leaderless single
leader is the one we're focused on today and then we're going to kind of roll the things that we've
learned and talked about today into the other two okay cool all right so now now that we've back
backpedaled now we can go we can jump back forward into the future now that we have defined the replica and the cluster blank.
So we're good.
Yep.
And so that's the main vocabulary.
And different systems will sometimes use these words interchangeably or kind of mean different things.
But I just want to kind of lay those out as generic terms.
So if someone's talking about different systems and they talk about nodes in the system, then generally speaking, they're referring to a computer in that cluster, which has multiple clusters.
And Elastic in particular has like special definitions of what they call nodes.
And they've kind of like tacked on some additional meaning to them.
But for the most part, those still operate in practice as a node.
And yeah, so the important one to get here is that each node in this cluster that has a copy of the
database is a replica we're going to say that word a lot today yes yep so since every replica has a
full copy of our database our data set then any changes need to be fully copied over to every single replica
and the most common approach to doing that is leader-based replication and two of the
algorithms that we mentioned are both involved leaders so single leaders or multiple
and i think this is like probably pretty common, right? Like most people are probably already kind of familiar with this kind of
scenario where you have like a primary database server and then a bunch of,
uh, you know, secondary replica or, or, or AKA replica database servers.
Yeah.
One or more typically, right?
I mean, depending on your environment, I mean, in some cases, you might just have two, right?
One for your main, one for a failover in case the other one blows up.
But if you're worldwide distributed and all that kind of stuff, then you might have more so that you're sharing those reads and stuff out across the planet.
But yeah, typically, you'll at you know the primary and failover and so just
knowing those the three popular algorithms like uh there's only one that i've ever heard of as
being leaderless and that's cassandra or friends over at data stacks that's the only database i've
ever heard of and that's the only context i've ever heard that the term leaderless for um single
leader and multi-leader like i instantly think of relational databases I've worked with.
I've been in situations where I've had a primary
in replicas,
you've mentioned before.
And the older multi-leader
kind of took me a minute.
I know Kafka has some
interesting things where different
topics and partitions actually
have different primaries.
And so I guess that would be a case where you have kind of multiple leaders per partition.
I wasn't really sure about that, but we'll have to read up on it. I haven't gotten to that chapter yet.
Yeah, we'll get through there.
Because I was struggling, which is embarrassing to say,
considering our love for Kafka. Because I was struggling
to think of a multi-leader example. But yeah, now you mentioned it, you're like totally right.
Because with Kafka, you can have, you know, within a topic, you can split that topic up
across multiple partitions and each there can be a leader for each one of those partitions
across all of the brokers.
So if you have like three brokers and you have your partition is spread across
or your topic is spread across 10 or let's make it something evenly divisible
by three then.
So you have nine partitions for that topic.
Then each broker is going to be a leader for three of the partitions for that
topic.
So, okay.
Yep.
That's a good example.
I think I'm not sure about this.
I need to finish reading the chapter.
Elastic, I know you can have multiple master nodes,
which can act as leader,
and you can even have nodes that have the master role
and basically end up becoming leader.
So I would guess it's a multi-leader situation,
but I haven't looked into that any further.
Yeah, I know Mongo
has primaries and replicas.
I guess,
I mean, definitely getting ahead of ourselves
here, but a question,
is there
a such thing as multi-leader
for ACID
compliant type
systems?
Because I don't know that Kafka counts as that. Does it? No, Kafka for like acid compliance type compliant type systems. Right.
Cause I don't know that Kafka counts as that.
Does it?
No,
I don't think Kafka is not.
Yeah.
There's no transaction,
right?
So,
yep.
Oh,
you know what though?
So we are going to be talking about transactions and that's actually,
that's coming up a little,
a little bit ahead,
but all the questions and all the things that we're talking about and wondering about,
they're all going to be addressed by this amazing book.
There we go.
A little bit of foreshadowing right there.
That's right.
But for now, just to kind of get the ball rolling, we're talking about single leader replication,
which is the one that I'm by far much more familiar with and have seen in the wild.
In this case, one of the nodes is designated as the leader.
All writes must go to that leader.
You cannot write to any of the other places unless it's the leader.
And the leader writes the data locally,
and then somehow it's got to get that data to its followers,
either by publishing some sort of like replication log or change string.
And we'll get into the kind of the details between those two.
But somehow it's got to get those changes over to those other replicas in
order to keep it.
And as you can imagine,
there's going to be some latency there.
It's going to be a period of time when those replicas don't have that data
yet.
So this is an example of of what we've talked about.
I want to say I even referenced it in the last episode,
but Postgres has this capability, for example,
to where in a multi-replica setup for Postgres,
you have the one primary where all your writes are going to go through,
but you can distribute and parallelize your reads
across all of the replicas in that Postgres cluster.
Right.
Yep.
And you know what comes to mind too is,
so when you write to Kafka,
you can configure your consumers to say that
your consumer can not count that write as written until...
Your producer.
Oh, yeah, yeah. I'm sorry. Yeah, you're right. Producer.
And there's three choices. One is
as soon as I send it and the leader
that I'm writing to
says acknowledge it,
then I consider myself done. Or you can say,
no, I want that leader to confirm
to me that at least one replica
has the change.
Or you can say, I want
that leader to confirm to me that every single replica
has gotten that change and that's like the safest and also the slowest also leads you to the most
like most error prone yeah because if there's any issue with one of with one of those uh replicas
then your right fails completely.
But I thought there was also a fourth option, though, where
you could just fire and forget. I don't even
care if you acknowledge it.
Both of you guys are jumping ahead to the next
section.
That's what we do.
Why are you all up in my
game? This is how I roll.
We're going to get there. We're going to be like,
oh yeah, we already covered that. That's how I roll. We're going to get there. We're going to be like, oh, yeah, we already covered that.
That's how we roll.
Yep.
That's why we spend so much time on the beginning.
Don't shame me.
That's the way.
I'll do the show however I want to do the show.
That's right.
That's right.
Yeah, both of you.
I was like, yeah, they're both jumping into the next section.
So we should hurry up and get there because I think what you guys were saying was pretty good.
We already covered it.
Right.
So what he said is when these changes go out, right, they get streamed, the followers have to basically take those log entries and apply them in the same order.
There's multiple ways of doing this.
And then, like you said, in some situations, you're going to have the read
set up to be able to do it for multiple replicas. The writes still have to go to the primary.
And yeah, there's a ton of databases that support them, right? Postgres, Mongo, there's
SQL Server has log shipping, like there's all kinds of things like that, right?
So now this is where we get into exactly what they were just talking about, which is the sync versus async.
And it actually can get a little bit more complex than even what you guys were just saying a second ago, right? Like there's the one where it's fire and forget, you know, I sent a message,
just assume it, it worked, right? So that's your quickest, lowest latency. There's the,
the synchronous across everything, which like Outlaw was saying is slower.
It's more thorough, but it could also be more failure prone or error prone.
There's also a mixture to where you can do something similar, I think, to what Jay-Z
might have said, or maybe it was you, Outlaw, to where you can have some of them set up to
be synchronous so that you write to the primary.
It has to write to at least two more secondaries and then the rest of them can
be async.
Right.
So that was different though.
Yeah.
So,
so what Jay Z was referring to is like from Kafka specifically as a producer,
um,
your,
your Kafka producer application can like,
uh, write the data to the broker and as part of that
write it can say hey I want to verify that you wrote it to at least
the leader in a replica. What you're
talking about though is that some systems
from the book talk about how some systems can be configured in a, uh, hybrid kind
of mode, like what you're describing, where the right has to be written to not only the leader,
but also a replica as well as part, at least one replica as part of it. And, and they called that
like semi, I think it was called semi synchronous for that type of scenario.
But then there could be other replicas that are part of that same cluster and they would get that right done asynchronously later.
Right.
So you're not having to wait for everything to get it, but you're guaranteed you're at least in a high availability scenario, right?
And if you do use Kafka, since we're beating up on Kafka for a moment, even though it does
count as the multi-leader scenario, if you look at your topics, if you describe your
topics using the command line tools that
they have for the little shell scripts they have for it, you can actually see there'll be like a,
a column called ISR and that is the NSYNC replicas. And so for each individual partition of a, of a
given topic, you can see which replicas are in sync with the leader. And there'll also be another
column. They'll show you like, who is the leader? What broker is the leader for that partition? And, um, then you can also see
like all the other replicas for it as well. And you can see like who's in sync and who's not.
And if you really want to have some fun with Kafka and you really start to put it through the paces
on some heavy, heavy load, like stress test loads, which Kafka has some built in scripts for this,
then if you during that heavy load, if you go back and you describe those topics to see that,
you'll see that in sync replica column, get all out of whack, where like only the leader will
show up in that column for that for a given partition and the replicas don't, or, you know,
depending on like how heavy the workflow is at that time,
maybe you'll see some topic or some partitions
that might have, like in my previous example,
I said three brokers, I think.
So maybe you see two of the three brokers on some
and another partition you see all three
and on one partition you only see one.
But you can actually see Kafka, uh, you know, you can see this, this synchronization in action in a way through the
Kafka, uh, shell scripts that they provide, you know, to, to describe the topics as, as you're
doing that heavy load. So here's one interesting thing though, while piggybacking on, on Kafka
here, Kafka is actually a much simpler use case than something like a database, right?
So when you think about the fact that like what Outlaw just said, where you can see the replicas on Kafka get out of sync, just imagine when you've got something like a database that's writing, you know, 20 records in a transaction.
And now that thing needs to replicate out against 20 different replicas, right?
Like, it's one thing to write records from a queue across multiple things.
But now when you start talking about synchronizing records with IDs and all kinds of other stuff, like, things get hairy, right?
And it's super interesting, too, from the database perspective.
Like, one of the things that I had never considered
as part of that replication strategy,
we've already talked in the past about write-ahead logs,
which this particular chapter, they said, like,
I believe that that was, like, the majority of them,
especially for, like, Postgres and Oracle,
like, write-ahead logs were the big way they did.
But one of the,
one of the strategies that was described was that,
you know,
I forget what they called it,
but I'm going to call it like a replay for lack of a better,
a better example.
So like you,
you call an insert statement and to insert some piece of data.
And that exact same statement is re-executed on each of the replicas.
Well, that sounds harmless enough unless you're really into data.
And then you might be like, wait a minute.
What if I have an auto-incrementing ID?
Those need to be in sync.
But also worse, what if you have like a date kind of column and that is it defaults, has a default constraint
on it to automatically, you know, insert now, which if the replica inserts it five minutes
later, if it replays that five minutes later, then it's going to have a different timestamp
on it.
So like it's, it is a possible strategy that might be used in some places.
I think they described it like that was how the original my SQL used to work.
But,
um,
you know,
for the,
you know,
serious database systems,
that's not,
that's not going to cut it.
It's not going to be good enough.
Yeah.
And,
and also to further that, when he's talking about the replay thing,
he's talking about replaying the actual SQL statement that ran, right?
So if you did insert into users in your primary,
and like he said, that insert statement had a get date function in it,
which if you listen to our Dating is Hard episode,
you know you shouldn't use get date and
sequels ever. But if you did, if you did on that replica, do you run the same insert with that get
date function? Or do you try and get the date that was inserted on the primary and carry that same
date over to your replicas, right? Like things get way more complicated
when it seems like an easy problem on the surface.
It's, you know, synchronizing clocks,
synchronizing the IDs and everything.
Like it can get insane when you start thinking about that.
And I didn't see that stuff in the show notes here.
So we jumped to the end of the episode.
Dang it, sorry. We're pretty much done so thanks for listening
we'll have some resources we like some tips of the week
we probably already did those
to you right I just swore that was earlier in the chat
enjoy the survey
and yeah so we've all
done it we've all jumped way too far ahead
that's fine
to bring it back a little bit
everything we're talking about now
is harkening back to the CAP theorem,
which we've talked about several times on the show.
Basically, it's just the idea that you can have at most
two out of the three choices for consistency,
availability, and partition tolerance.
So the things that we're talking about with async and async
and problems with replication lag deal with the problems around trying to keep a system as consistent and as available and partition tolerant as it can be.
The CAP theorem is kind of like that joke.
What's the joke like?
Pick two.
It could be cheap, it could be fast, or it could be right.
And CAP theorem is kind of like pick two. You get consistency it could be fast or it could be right yeah right so cap theorem is kind
of like pick two you get consistency availability or partition tolerance which ones do you want
so i actually just read a an article that kind of convinced me uh to not use that metaphor anymore
oh and the article made the case that you can never really give up partition tolerance
it made the case that basically you get at most two, meaning you get either consistency and partition tolerance or availability and partition tolerance.
But you can't really have consistency and availability together.
You always have to choose one or the other there.
You can have at most two.
Yeah, that makes sense.
Because as we get further into the show, we'll touch on why.
Okay, so it's really pick one.
Do you want consistency or availability?
Yes.
And also, how well do you do with partition tolerance?
So that's why I wrote it this way because I literally just read the article before I typed this and said you get a most too.
So if you do a good job, you could have consistency and partition tolerance somewhat
or availability
and partition tolerance.
But the article made the case,
and now I'm friendly
and you can't,
that you cannot have database
that is both consistent
and available.
Highly available.
Yeah, no, that's right.
That's right.
Yeah, so I'll find that article.
I'm sure,
I'm sure I've mentioned this in the past
when we talked about the CAP theorem, but
I want to make sure. It can't just be me,
right? Because anytime I hear
CAP theorem, I immediately assume we're
discussing Captain America, right?
That's how you become Captain
America, is you take some of the CAP theorem
and then you become Captain
America, like eventually, right?
It might just be you.
I don't know.
Yeah.
That's a little awkward.
Fine.
Fine.
I'll have to find the article.
There you go.
So, Joe, you have in here the book mentions chain replication.
I saw that same link in the book.
Did you actually follow it?
I was too much into reading the chapter that I didn't go off on the tangent.
So I did a little bit specifically because I think the reason I picked up the book again,
because I was like, I've got a case where I want something like chain replications
because I've got a situation where I need to be tailing an op log.
I'm basically doing some data syncing where i've got an application that
acts like a replica it follows changes to a database and then i do some stuff with that
change log uh and you know i'm not i'm doing this very high level i'm using tools like dbz
and so i'm doing it but i don't have access to the primary because of networking whatever and
so what i need to do is i need to sync changes from a replica. But the replicas traditionally don't publish the change log or the operation log or the write-ahead log.
They're consumers, not publishers of it.
So I was like, well, what the heck is that?
Is there a situation?
Is that a flag I can turn on or something?
Interesting.
So you're saying you're trying to get the changes from a database, didn't have access to the primary, have access to the replica, but the replica doesn't publish those changes.
That's right.
Because it doesn't write any itself.
Yeah.
And so you mentioned like the owner of this database is like, I don't trust you.
You can read from me, but you, your user, your account, your authorization key does not allow reading from the primary.
Wow. Okay. That's fine.
All I need to do is read and then smack
brick wall.
Looking at you, Mongo.
Well, guess what?
Mongo does support chain replication in some
cases, but I haven't gone down
deep enough to know if I'll be able to do that.
But yeah, that is exactly
the situation I'm looking at.
And it looks like maybe it's going to be possible
thanks to this crazy thing called chain replication,
which is exactly what it sounds like.
With replicas following, well, with followers following followers.
Right, yeah.
So instead of replicas reading from the primary,
a replica can get its copy of data from another replica.
Yep.
And it's a trigger that sounds,
it seems like you should just be able to just do it.
Like, what's the problem?
Right?
And just publish, you know, like forward it on along.
But the deal is, you know, we talked about sync and async
and how maybe the leader knows whether its replicas have gotten the data
and it can use that information to know, okay, I can accept more, you know,
depending on how you've got that synchronicity tuned.
But if it's got this chain replication thing going on,
then it doesn't have that communication channel
between the leader and follower and the follower's follower.
And so there's things that might be missing from the API
that maybe those followers need to communicate back with the leader
in order to kind of move stuff along.
So it just gets a little tricky.
That's interesting because usually the primary is what's keeping track of,
of who's done what.
And if you've got a follower following another follower,
then that,
then that follower has to know that the other follower was,
was it's actually finished what it needs to do.
So essentially now you've got a bunch of primary acting secondaries or replicas
performing the same job. Yeah, that's interesting. Yeah, it's not a one-way flow.
There's backtalk from the followers to the leader sometimes.
Yep.
This is why I love this book, because it'll make you think about things that
you might have just never thought about or you just took for granted.
And definitely like this book has forced have to be able to like read
from a replica following a replica just seems something like, well, that surely that has to
be a thing, right? Because if I think about this in terms of like a geographically spread system,
right, it might be really expensive. It might be a really expensive operation for
a replica on the other side of the world
to have to come back to the opposite side in order to get that update.
And there might be replicas in between that would be much faster in terms of latency.
And so just trust that it would get there.
And so I never considered that it wasn't possible.
Right. Or that it wasn't possible. Right.
Or that it wasn't just how it worked, right?
Yeah.
Yeah, but now that we're talking about it, it's forcing me to think like, oh, yeah, I could definitely see some problems where like, okay, what happens if the replica in between the end replica and you know, one of those middle replicas goes out,
then what do you do? Right. And so now I'm like, okay, yeah, I could totally see why you might not ever want that. And you want all the replicas to come back to the primary, but then it also begs
the question too, of like, well, gee, now it seems like you're just putting that much more
workload on the primary because now not only is it serving the world for all of the rights but it has to
serve all of the replicas for all the reads so now you definitely want your reads distributed
across the replicas because the primary is already doing so much work depending on like
if we're talking about like really glow you know huge scale type application right like if you only
have like you know three replicas then who cares It's probably not that big a deal. But yeah.
I mean that's the thing that I love about this book though is it makes you put on your thinking hat for the minutia of things like databases that you would otherwise just be like, well, I'll take it for granted. Cause like, especially like as a, you know, an application developer where it's like a database is just one of the things in the toolkit
of like things that we're using, but it's not like I, we didn't devote our entire career to like
master the mastery of that thing. Right. And so it was, it's been easy to like take for granted some of these you know concepts
right so absolutely yeah i definitely feel like um this reading this book has really uh helped me
understand kafka which i'm still that could be another subtitle for but uh yeah just there's so
many things that like when i first got started with kafka that i just thought were like almost
like insultingly hard you know you ever hear the joke about like open source companies that make the software so hard to use that you have to buy their services, whatever.
Like I kind of felt that way about Kafka a little.
I was like, there's no way.
Why does it take like eight services doing thing?
What's this bootstrap server crap?
Like what's this NSYNC replicas?
Like what's the part like partition?
Like who cares?
Just like let me do the thing.
And now read the book.
Oh, actually, it almost seems like they designed
Kafka around
the concepts in this book.
They went through chapter by chapter.
Okay, let's pick this over that and this over that.
They designed a system by kind of picking
all the best things they wanted for this one
particular use case and they just did it.
Wait a minute.
This is written by Martin
Kleppman. Wasn't he
involved in Kafka?
Am I wrong? I thought I remembered that
somewhere in the book.
He didn't work at LinkedIn.
He's a researcher, distributed, blah, blah, blah.
Previously worked at LinkedIn.
Worked on large-scale infrastructure.
Yeah, and Kafka
started from LinkedIn.
Did it not? The back of the book has a quote Working on large scale infrastructure. Yeah, and Kafka started from LinkedIn.
Did it not?
Yeah, it did.
The back of the book has a quote from the creator of Kafka and the CEO of Confluent. Oh, yeah, there it is.
Yeah, that's funny.
Yeah, I think there was.
I thought I remembered maybe in the preface or something,
there was some kind of a connection to him and Kafka. So I did just find out a benefit to having a paperback book is, uh, you can actually
flip it over and look at the back of it. So I can't do that to my digital copy. Yep. Yep. That's
the, that's the reason why I've always loved the paper ones is that like, it's super easy to scan
and flip through. Like I totally get you on the convenience of reading on the digital ones,
but the convenience of searching to me,
like,
you know,
aside from like,
you know,
control left type searching,
but like the scan ability of it is like so much easier when it's paper.
Yeah.
But you know,
that's just me talking.
I found,
I found the article I was talking about earlier about you can't skip
you can't sacrifice partition tolerance
so that was a really interesting article
I'm still kind of chewing on it a little bit but
yeah it was good
oh and so I mentioned
just kind of wanted to hammer home on the point
we said the algorithms are
basically solved but the implementation
details are so tough and so nuanced and there's so many trade-offs
that in the real world
application is tough. And this is an example
like the chain replication.
So I just wanted to mention another case where
imagine if the replica
or the follower that you're following
gets promoted to leader.
And now, what do you
do? Do you try to find another replica?
So there's so many little details written,
so many little decisions you have to make when it comes time to actually
building these systems.
And those details mean a lot about the way people perceive and interact with
your systems.
It stinks.
Well, I mean, they also,
the book doesn't go into like huge detail on it,
but there is a whole portion in
this chapter related to leader election yep you know and you know as i was reading it i was
thinking about like how would i how would i want to code that like how how would you decide you
know leader election right i've always thought it sounded so easy and so i am so i must be so
wrong about because i've always heard it's notoriously difficult and i always thought it sounded so easy and so i am so i must be so wrong about because i've always heard
it's notoriously difficult and i always thought it's just like well one of them picks it and if
no one else has already grabbed it then they get it and if two grab try to grab at the same time
both think they have it then they just pick a random number and the bigger one wins done but
wait did we not put this in the show notes i don't know if we're jumping ahead here or not
no if we if we didn't, yeah.
So like one of the things, what you just said, sure, that could work.
Another one is who has the most up-to-date data, right?
Pull every one of the replicas and find out, hey, do you have the latest data?
Do you?
Whoever has the latest, most complete set of data from the primary could then be the follower, right?
Like these are decisions you have to make, but doing that takes time, right?
That takes time.
But even then, even with the time, I'm sorry, I should get off.
Go ahead.
No, you're good.
I don't even know what I was going to say.
Sorry.
I was going to say, so even then, like, let's say you've got a, you know,
one of those partition, you've got a network partition. So now you're on two totally separate networks and you've got one of those partitions. You've got a network partition.
So now you're on two totally separate networks, and you've got four nodes over here, four nodes over there.
And both of you elect leaders of their four nodes.
So now you've got two leaders.
But in terms of time, the network heals itself.
And now you've got these two things that have been kind of running off thinking that they were the captains of their own ship.
And now we've got to merge these things back together and they've
diverged.
Yeah.
Which by the way,
remember what that means when you're a leader,
you take rights.
So that's what he's saying.
When they diverged,
all of a sudden you have two different databases that are now taking the
rights.
And now you got to consolidate those somehow.
And,
and if you're talking about auto-anchormenting keys and garbage like that,
oh man, good luck. I mean, it's amazing that any of this works at all. Because as you were
describing this, I was just thinking about, imagine you had a database that was geographically
spread across just the United States, right? And let's say that your replica, your leader was in, say, someplace central like a Texas, for example.
But you got replicas that are spread out across the East Coast, across the West Coast, across the
North, what do they call it? Northwest, whatever, in an effort to make those,
to reduce latency on those reads, right? But now, let's say that there's a big major mal-outage
in the internet for the United States
that divides the country, right?
And so you got half your replicas on the east of the country,
half of them on the west,
and they're each going to pick their own replica
to be the new leader.
And now what happens? the network comes back together?
Right.
Like,
yeah.
So many problems like that.
It's like,
how does this even,
how I,
it's amazing.
It even works.
And does it all the time?
Maybe it does it.
Maybe there's a time where you're like,
I,
that's one of the things that they said in the book too,
is there are times where people will opt to manually fail things over as
opposed to allowing for automatic failovers because they want to be able to
control how the things happen.
Right.
Because,
you know,
may I,
one of the examples they had was,
let's say that you have your threshold set at 30 seconds for response,
right?
Like basically
these things know if they're alive because they're constantly sending pings back and forth to each
other. Right. And let's say that you send one and it, it was going to take 35 seconds to respond.
So cause you cross that 30 second threshold, it starts going into, you know, with time to time to
fail over mode. And, and that server was just busy at the time.
So it couldn't respond properly.
So now you've got this situation
where the network's going to be doing all this thing.
Your servers are going into this mode
where it's causing some problems.
Oh, only to find out that it really shouldn't have failed over
in the first place.
So yeah, there's all kinds of difficult decisions
that could happen both automatically
or if you
want to do it manually so that you know that you need to do something. Well, if you think about it,
remember back to the old days of SQL Server with high availability, there was a witness
server. And the purpose of the witness server is you would have a primary and a secondary.
And the witness was basically like an unbiased third party that would say like, uh, yep, I can see both of you right now. You're both up and
running fine. But if the primary went down, then the secondary could be like, well, Hey witness,
can you see him? Can you see the primary? Or is it just me? Because then if the secondary can't
get a message back from the witness or
maybe the witness says,
no,
I can then the,
then the witness,
then the secondary could say like,
Oh,
well I don't need to resume.
I don't need to take over leadership capabilities because it's me.
It's not him.
Right.
Right.
But I don't know that that's like,
you know,
the most common way of doing it is a whole chapter on it
if only there was a chapter on it right yep uh yeah so um you we kind of touched on already but
um just bring new followers online like i remember um working with elastic kind of uh like early days
and me adding a new node like i thought I would just give it the same cluster name
and whatever. It would just work. But it turned
out I had to do a little bit of extra legwork
to let the other nodes know that I was adding
another node because they all
communicated. Just like you mentioned, these things
need to know about each other. They don't just work in
isolation. And so
bringing on a new node was not as easy
as I thought it would be. And things have gotten much better.
Citizens are, for the most part, especially with Kubernetes like, Kubernetes and, like, autoscaling and cloudy things in general,
have gotten to the point where you can add, like, nodes to your clusters much easier than add replicas without having to do a lot of, like, manual work.
And just imagine, like, even updating configs on servers that are already running.
A lot of times, you know, you have to, how do you send a message to them?
Do you need to restart them?
Like, how do you tell them, like, you're done?
Well, I think that depends, though, right?
Because, like, by mentioning Kubernetes as an example, I mean, like, that's kind of blurring the lines of, like, the definition of the cluster that we're talking about. Because even with Kafka, if you were to just bring in another broker,
it doesn't mean that it's automatically going to distribute the topics,
the partitions of those topics across that brand new broker.
You have to manually go in.
So it's a big deal.
So even in the Kafka world,
they recommend that you size your environment ahead of time for,
you know,
like plan for two years worth of size.
And that's how you should plan to size your Kafka environment because it's
kind of a hassle.
If you decide later that you were like,
Oh,
Hey,
let's add a new,
a new broker and,
and the downtime that you're going to take for moving those,
those partitions around.
But you know what?
Yeah, that's totally true.
You know what, though?
I think that what Kubernetes has brought to the game, though, is because that is an orchestration
system, its whole job is to handle adding things or keeping
things alive in a certain way. And so like one of the examples are Kubernetes operators, right?
Like, so the whole notion of adding a broker, I'm not, I'm not as familiar with like the operator
for, um, or Strimsy or something for Kafka,
but typically those operators are set up to be able to do things like that, right?
And it's so like, which, yeah, so in this case, maybe not Kafka,
but like Crunchy Data, for example, on Postgres, right?
Like they had features in their operator for adding things
or backing up and doing all that kind of stuff.
And I think what Kubernetes brought to the game was making companies think about ways that they
could automate some of those things, right? So while it doesn't inherently just have it just
because you're in Kubernetes, a lot of times, a lot of these systems are now coming online with
the ways to automate these things. Yeah, I guess the point I was trying to make there
is I wanted to be careful about how we were using the term cluster here
and mixing it because as it relates to data-related things
like an Elasticsearch or a Postgres or a Kafka,
that's a big deal.
And so like Joe mentioned,
if you were to bring in a new elastic
search node, that doesn't mean it's automatically going to get, start getting used. Like you have
to go through some manual effort and the same with same exists with Kafka. And even with the
Strimsy operator, um, which Strimsy is, uh, you know, uh, a custom operator, a Kubernetes operator that tries to wrap a lot of the Kafka bits into
resource definitions that are like work, like kind of quote, like a native Kafka, you know,
thing, but it's really not. It's like, you know, through this custom operator.
But even in the Strimsy documentation though, there are certain operations, especially related to like manipulating your topics like this, where they straight up say, no, you got to manually go and do this because it is one of those, like, hold on to your butts.
Cause we're about to do some stuff in production.
You know, here we go. So, and, uh, uh yeah i totally agree with that and yeah you can change the
number of replicas and the operator is in charge of doing all the little operations for that but
it doesn't know there are human decisions that you need to make when you add those like do you
need to rebalance stuff do you need to just start yeah there's all sorts of things that you just have to make decisions about, unfortunately.
It is a complicated, you know, like for our, if you know a DBA, you should buy him a beer.
Like, I think that's the takeaway from this book.
Right.
So one of the things that we have here is we talk about bringing these new followers online and it is
a common practice, right? Like you're going to have new replicas that you need to add or whatever
to a cluster. One of the ways that they do this a lot of times in databases is they'll take a
snapshot of the primary database, the leader database at some point, right? And a lot of
times they'll set these up as like daily jobs as daily jobs or maybe multiple times a day to where you get a snapshot.
And to bring a new replica online, it's going to get the latest snapshot and then also find all the latest transactions that happened after that and try and replay them on top of it.
And this is where you brought up the SQL Server log shipping before.
I'm not familiar with it. Like conceptually,
like I have the,
like a high level idea of like,
Hey,
it's just,
you know,
I'm getting the logs from there to there.
Yeah.
That's basically what it was.
Yeah.
It was that.
So you,
it was essentially just what we said here.
You take a snapshot,
you ship the,
the transaction logs.
I think more or less,
I mean,
I may not be saying this perfectly well, and a DBA can, can chime in, in the comments if they want, but it would ship, transaction logs, I think more or less, I mean, I may not be saying this perfectly well and a DBA can,
can chime in in the comments if they want, but it would ship, you know,
they'll win a chance to get a book. That's right.
Or they'll get a chance to win a book. I'll say that eventually. Right.
Yeah. So it would ship the,
the logs that would get replayed on, on the replicas. Right.
And that's the whole point of it. And I think that if I remember right, this is...
In the book, did they say that this was the common way or this was the older way? I think this was the
older way where it would ship the logs that were
the actual statements and stuff. Oh, that's the old way. Yeah, that's the old way.
Right ahead log shipping was the majority way.
Okay, yeah. From way. Okay. Yeah.
From what I remember.
Yeah, we'll get there.
Yep.
So, yeah, once it replays all those on the replica,
then the replica's caught up and everything's good to go.
So, yeah.
All right.
Well, I'm going to be smarter by the time we're done with this episode.
We're going to bounce around it a few times.
And if I'm not, then I guess I'll blame Joe.
Right.
Somebody's wrong.
This episode is sponsored by Datadog,
the unified monitoring platform for real-time observability
and detailed insights into Docker performance.
Enhance visibility into container orchestration with a live container view
with easily detectable clusters that are consuming
excess resources using an auto-generated container map.
Now, here's the thing. We're talking about, in this episode,
replication and
what you might want to do, like how you would deal
with situations of a replica going down or the leader going down. And, you know, the beauty of
it is when you have somebody like a data dog, you know, at your side to help you out with this,
they make your life so much easier because from a monitoring point of view, they have you covered
infrastructure monitoring. That is literally their game. They have you covered. Infrastructure monitoring, that is literally their
game. They have you covered, whether you're in the cloud, whether you're Docker or Kubernetes
or whatever your containerized environment might be, they're going to be able to help you monitor
that platform, that tech stack in its entirety so that you don't have to worry about split brain
situations and bad databases.
Oh no, what happened? Why did it go down? Because you're going to know ahead of time,
you're going to know like, oh, there's, you know, uh, we should, uh, we should be aware
because we're getting these alerts that something's going down and we should go ahead and take care of
it before it gets really bad. And we mentioned that container map. Um, I don't know if you,
I don't remember if we talked about it or not, but, uh, if you just Google data that container map. I don't know if I don't remember if we're talking about or not, but if you just Google data dog container map,
you can see what that looks like.
They have a really cool video that literally walks through stepping through a
gigantic environment.
And what's really cool about it is how well it scales because data dog knows,
I think,
or two about making visualizations that scale.
So they look great when they're small,
they look great when they're huge.
And it's just awesome to see them flip around from things that are very tiny.
And so you can see the individual details on one little thing.
And then they zoom out.
And there are hundreds.
And they zoom in somewhere else.
And you're back to seeing these kind of small details.
So it gives you these really convenient ways to interact with your data,
the way you think about your data.
And, I mean, Datadog, they're the kings of this arena.
They've been doing this for years.
And so, I mean, this is how you do it.
So go check that out.
You might even refer to them as the lead dog.
I don't know.
Yeah, the Datadog.
Yeah, I mean, they're amazing.
And the capabilities are, are just
amazing and out of the box data dog collects critical metrics from each of your Docker
containers so that you can get immediate visibility into aggregated and disaggregated surface
level traffic.
And you know, we didn't even mention this part too, but they have over 450 plus vendor
backed integrations.
So whatever your tech stack is,
just trust me when I say Datadog has you covered.
And vendor backed is really good.
And it just copied the code from Stack Overflow like I do.
So try Datadog today by starting a free 14 day trial and receiving a free Datadog t-shirt after just creating one single dashboard.
See, I think I don't know about this Datadog.
I think it's Datadog.
I think it's Datadog, too.
So you should visit DatadogHQ.com slash coding blocks to see how you can enhance the visibility into your stack.
Again, that's Datadoghq.com slash codingblocks.
Okay, well, as I said at the start of this episode,
we greatly appreciate all the reviews that we get.
It really does mean a lot to us.
It's kind of our motivation,
a big motivating factor for sure in doing the show.
So if you haven't left us a review,
we would appreciate hearing your feedback.
You can find some helpful links at www.codingblocks.net slash review.
And with that, we head into my favorite portion of the show.
Survey says.
All right.
I see you over there, Joe.
Mocking me. I see you over there, Joe. Mocking me.
I'm dancing.
I mean, you can't just scream it directly into the mic.
You got to, like, turn your head when you do it.
That's right.
I don't know.
Maybe one day we'll, like, capture that as, like, a meme or something.
Okay.
So we asked a few episodes back for your next car
you plan to buy and your choices were an electric car i can tell people it's because it's green but
we all know it's about the acceleration or a hybrid just in case these EVs don't work out. Or a gasoline car.
There's other kinds.
Or a diesel, turbo diesel at that.
Or a fuel cell car, like the Hindenburg, but on a smaller scale.
Or Uber, the Uber mode of transportation.
Or lastly, anything with more than two wheels is too many.
This is an even number, so therefore,
Jay-Z, you are first.
Well, I'm going to say
extremely important
26%. Wait, what? That's not Extremely important.
26%. Wait, what? That's not one of the choices?
He just meant something else.
It's not even one of the choices, Jay-Z.
You're not even paying attention.
30%.
Yeah, it's...
I mean, I guess I'm going to go with the gasoline car.
26%. Gasoline car 26%.
Yeah.
Okay.
That's so funny.
Okay.
Oh, you know, man, we're a bunch of nerdy folks that do this.
I'm going to say an electric car, and I'll go with
26%. 26%
electric, okay. That's a lot of
percents. Not hyper-confident
in this one. Okay, so
Jay-Z,
gasoline car, 26%.
Allen, electric
car, 26%.
Alright, so
what do you call an alligator in a vest?
I got nothing.
I got nothing.
Same.
Nothing.
Normal?
You should know this, Joe, living in Florida.
Florida man should know this.
Just another day?
I don't know.
Just a Tuesday?
An investigator.
Oh, my God.
Okay.
All right.
So thank you, Dave, for sharing that tweet.
And your answer is, the winner is Alan.
Look at that.
Whoa.
All right.
An electric car, 34%.
Holy cow.
Look at that.
I did not expect that.
Hey, speaking of which, and I hate to go off on too much of a tangent.
Did you guys see the announcement on the Ford Lightyear?
Yes.
Dude.
That thing looks amazing.
Yes.
It will power your house.
For three days.
Wow.
For three days.
And that's, it'll pull a freight train.
Yeah.
Look.
Like a fully loaded freight train that's hauling every F one 50 model ever made.
So, so I'm going to be honest with you.
The cyber trucks, ugly sin, right?
Like there's, there's no question.
Who designed that thing?
It destroys the F one 50, even the lightning in terms of specs and all that.
But, but the Rivian, um, some other companies that terms of specs and all that. But the
Rivian, some other
companies that are coming out with this stuff, I'm going to
be honest with you. If it came down to it,
I'd probably get the Ford just because
they've been around a long time, right?
It would be really hard to drop
60, 70k on a Rivian
knowing that these are the brand new kids
on the block. Where do you even
take that thing for service?
Yeah.
Are they going to be around for that thing?
You guys remember the Fiskars?
Oh my God.
Fisker Karma.
Such a beautiful car.
One of the most beautiful cars ever made.
That company was gone like a couple of years after they started selling.
So it's like, do I, would I really want to drop that kind of coin on something that's
going to disappear?
And you know, another company bought it, bought the, bought the rights to that car.
So you can still buy that car.
But I don't think,
it's not called a Fisker Karma now.
I think they just call it the Karma
or either that
or they just call it the Fisker.
I don't remember which.
But yeah,
that car was ridiculously gorgeous.
Just so awesome.
Yeah.
Now it is kind of funny though.
Oh man,
I got to find this YouTube channel for you, Alan.
Jay-Z probably could care less about any of the conversation we're having right now.
But there is a channel that I watch a lot, and his name is – oh, gosh.
Why would he have to have his name as his channel name?
Because I can't pronounce that.
So it's something like Doug DeMuro.
Oh yeah.
Yeah.
Yeah.
I watch him all the time.
Yeah.
His channel is fantastic.
And he has one,
he has a video on the Fisker Karma,
like the original version of it too.
And like how,
how silly the car was in reality.
But yeah,
just totally beautiful car.
If you've never seen one,
uh,
go google it
googling for it because it is uh it's worth a peek but especially when you consider the time
that it was when it was created too oh man it's over a decade ago yeah yeah yeah but in in all
honesty the f-150 lightning does look pretty amazing it's going to have over on the extended
battery over a 300 mile range. And the cool part,
Jay-Z,
like you said,
is if you have them hook it up to your house,
if you're without power,
as you are down in Florida during storms and whatnot on occasion,
um,
it can actually flip over and power your house with nine kilowatts of power
or something a day for three days or something.
Like it's pretty incredible.
It is. So yeah, I don don't know it's gorgeous too and like the display in the thing is gigantic
yeah it's got this huge center console that i don't even know what it is like maybe a 19 inch
monitor flipped on end or something it's stupid size it's ridiculously sized and then there's
lighting all around it so So, you know,
you'll always want to see,
including lighting inside the bed so that you can see the contents of your bed.
And it's got a frunk because there's no motor.
It's got a front.
It has 11 power outlets spread around the exterior of the vehicle.
So everybody who voted that they're going to get an electric car,
you need to go check out the ford
lightning and know that this should be on your wish list i just googled this you guys are talking
about a truck dude dude excited about a truck all right man like to take to mcdonald's dude
you're getting kicked out the South. I'll say,
I'll say it might be important,
but not enough to go out of your way.
26%.
Dude.
No,
look,
I'm telling you when this truck comes out,
like I'm with Alan on this,
like there is going to be a run on this truck.
I,
the F one 50 is already the number one,
you know,
bestselling vehicle of all time since its creation.
I mean, it's been number one for – it says – oh, I'm looking at it right now.
For the past 44 years.
44 years.
44 years.
Best-selling vehicle in America.
Best-selling truck of all time.
So when this thing comes out, there's no doubt in my mind that this thing is going to be wildly popular.
You're going to see them all over the place it will be the truck to beat in terms of an
electric truck totally and not without a good reason for it too i mean it looks looks better
than the other one now the rivian is is a pretty good looking it's beautiful oh yeah the rivian's
beautiful and there's another one too i can can't remember its name. It almost looks like
a Lincoln emblem on it, but it's not.
Do you remember
the one I'm talking about?
It's like the Titan or something. No, that's Nissan.
No, the Rivian.
I don't know. Anyway, alright.
So today's survey.
Joe's falling asleep over there.
The important stuff.
Joe's like, I don't care.
B, 10%. Alright, so the important stuff joe's like i don't care yeah b 10 all right so we if if we were to start a car podcast joe would be like i'm out yeah right
how long i could talk for another 20 hours dude i saw i so badly like if i could have
the job that those guys from uh top top gear and uh the grand
tour like you know jeremy and captain slow and richard like i would that is the dream job ever
for you know for me like i would love to have that job oh you're gonna like let me like race
your lamborghini around the track and like you know hope that i do well okay well you know here
we go.
Yeah, that's right.
I've got the Aston Martin Vulcan, right?
I'll never get out of it, but I don't care.
Yeah.
Yeah.
Why would you?
Yeah.
Okay.
Well, for this episode's boring survey then, how important is it to learn advanced programming
techniques?
And your choices are now,
if you've been paying attention,
Jay-Z has already given you all your choices,
so I don't need to read them.
Okay.
No,
I'm just kidding.
Uh,
how important is it to learn advanced programming techniques and your choices
are extremely important.
You got to keep,
you got to keep sharpening that saw or man, it might be important, but not enough to go out of your way.
You'll learn it as you go.
Or, wait, there's advanced programming techniques?
Like what, switch statements?
Or, it's not important at all.
There's already a Stack Overflow answer for it.
This episode is sponsored by Linode.
Simplify your infrastructure and cut your cloud bills in half with Linode's Linux virtual
machines.
Develop, deploy, and scale your modern applications faster and easier.
Whether you're developing a personal project or managing larger workloads, you deserve
simple, affordable, and accessible cloud computing
solutions. Get started on Linode today with a hundred dollars in free credit for listeners
of CodingBlocks. That's a hundred dollars. You can find all the details by going to linode.com
slash CodingBlocks. Linode has data centers around the world with the same simple and
consistent pricing regardless of your location. So wherever it's convenient for you
to have the data center,
don't worry about the cost.
That's not even going to factor into your decision.
You can do a lot with $100.
This episode, we've been talking about replication.
You can very easily spin up,
say, a three-node Kubernetes cluster
with those nodes in very different parts of the world.
And you can experiment with throwing up different databases and then killing those boxes just
to see what happens and see how things react.
And that's the kind of experiments that you can run.
And you can run those as experiments because it's so fast and easy and cheap that it's
just crazy.
And with that $100 free credit, you can do all that stuff in just a matter of...
It's crazy how fast you can spin up a cluster. And you can do that really quickly and give that
a shot because why not? Hey, so I'll take it even a step further. So Linode offers all that. You can
do your Kubernetes cluster and all that, but sometimes you just want things to run and you
don't want to have to think about it too terribly much. They also have a marketplace. So if you go to leno.com slash marketplace, you'll actually see these one-click installs to where,
I don't know, maybe you want to set up a monitoring service. You want something like
Grafana up there and running. You can do that. They have that. Maybe you want a Redis or a
Postgres SQL or a MySQL or MariaDB type thing, they all have one click install.
So if you just want to get up and running quickly, you can go to the marketplace and do that as well.
Yeah, I was going to add to that, like specific to this episode.
You know, if you don't want to deal with the Kubernetes, you can absolutely super simple.
You want to spin up MongoDB.
You want to spin up MySQL. You want to spin up MySQL. You want to
spin up Postgres. Like Alan said, just go to the marketplace, find the technology you want and say,
click it and then say, create Linode. And it will spin up that machine for you with that
particular service and you're done. You can like bring up your own cluster of Postgres and see what happens when you take one down
just to see for the fun of it,
you know, because that's what we do.
That's how we roll in our day, you know?
That's right.
And remember, if you go to leno.com slash coding blocks
and you get $100 free credit for this,
just choose the data center near you.
You also get 24-7, 365 days a year human support
with no tiers or handoffs, regardless of your plan size. You could choose the shared or dedicated
compute instances, or you can use your $100 in credit on S3 compatible object storage or managed
Kubernetes, whatever you want. And if it runs on Linux, it runs on Linode. So visit linode.com slash coding blocks and click on that create free account
button to get it all started.
All right.
So jump back in,
uh,
talking about how we handle outages.
So we've already kind of touched on some things.
Wait,
we know how we handle outages.
You get the F one 50,
you plug it up to your house.
We discussed this already.
Where were you?
How about if I bring us back in with a joke? Can I do that?
Let's do it.
What is the tallest building in the world?
Michelin tire or something.
I don't know. It's not going to be anything.
It's probably in Dubois.
The library.
It's got the most stories.
That's good.
Oh, my gosh.
That's from, hey, that's my dad joke API right there.
Nice.
Very nice.
Nice.
Very nice.
All right.
So, sorry.
Sorry.
Sorry to interrupt on the handling outages. No more Ford Lightning stuff. All right. I'm done. Very nice. Nice. Very nice. All right. So sorry. Sorry. Sorry to interrupt on the handling outages.
No more Ford Lightning stuff.
All right. I'm done.
All right.
So now we're talking about handling outages in our cluster of nodes containing many replicas.
And the deal is nodes can go down at any given time.
And so what happens if a non-leader goes down?
So a follower goes down well we've got
uh we've got some choices and some decisions that databases can make but for the most part who cares
right uh i mean if you're reading from that replica then maybe that's a problem um
but i don't know i just can't care that much.
Well, I mean, if you've only got one replica, then you care a lot, right?
You care a lot.
Now you need to spin up another replica and get something to where it's copied the primary pretty quick.
Yeah, because otherwise you have no fault tolerance.
Right. Now your availability is in jeopardy.
But, yeah, I'm with with you if it's just a
read-only replica or something like that then maybe you don't care yep we should be clear though
here because you know we say like you know nodes can go down at any time but they don't have to
quote go down like they could be running but they just don't have network access to be talked to.
Right.
You know, it could be no fault of their own.
Yeah.
So think about it this way.
If you're a database, you're a database and you are focused on being highly available, meaning you really don't want to go down no matter what, then chances are if a follower goes down, then you don't really care too much.
You're not going to stop the whole
thing you're not going to pull the and on cord and shut everything down say no more requests
if you are a database that's strongly concerned with being consistent meaning that when you do
things like take on rights you make sure that your replicas have those rights before you confirm them
to the producer then you might consider shutting things down saying saying, hey, I can't take on any more writes or reads right now
because I've got a follower down, and that follower is very important.
So that's kind of a different philosophy that different databases kind of take,
and oftentimes they're configurable.
So you can kind of pick and choose how you want your database to react.
Just because, like I said, if you only got one replica and it goes down,
that can mean something very important.
Also, if you just got like a traditional relational database, that's very important.
And you've got, you know, an important replica that goes down that maybe is used for, I don't know, banking.
I don't know what you'd use for.
It'd be very important.
But it might be worth shutting down because you are afraid that maybe that replica is still trying to serve data or is trying to do stuff.
And so you don't want to continue on taking rights until you figure out exactly what went on.
Because you don't want that other replica that you think is down that is maybe just on a segment of network thinking it's become the leader.
You don't really know what's happened.
So until a human comes in and reconciles that and says, let's move on. You could be in a bad state.
Well, that goes back to the USA example that I gave where the country gets split because of a major network outage.
And then you get into the split brain scenario.
Yeah.
So if you're a database that cares about being highly available, you probably will just go ahead and elect new leaders and then deal with the problem later. If you're a database that strongly cares about
being consistent, you might say, I lost
communication with another replica. I don't want it going on
and taking on rights. I don't know what's happening. I don't know about the state
of the world. So I'm just going to shut down until someone tells me that things are okay again.
And they're both right answers uh so when the replica becomes available again we talked about that catch-up mechanism where it basically takes a snapshot and then kind of
catches up and then once the replica is fully on you're able to to react so we basically use the
same algorithm that we kind of described there so that's interesting i hadn't thought about this but
that catch-up mechanism, if it
was a replica that just went offline for a little while, do you, I mean, that's probably another
decision that the database system has to make, right? Like, do you fully restore from the
snapshot and then get the catch-up logs, or do you just try and reconcile what,
what logs hadn't made it over since,
since it went offline,
right?
Like,
I mean,
those are two different decisions that have different implications on them.
And both are probably legit.
You might,
you're,
you might actually implement it to where it could handle both.
And it might say,
well,
how far behind are you?
And then I'll use that as a decision point as to what,
what type of strategy we use. Cause if you know,
you're a few seconds behind,
it might be good enough to just send you the few log entries that I've had
coming in. Right.
Yeah, absolutely. And we talked about like replica of replica chain replication.
That's another case where you've got leaders talking, sorry,
followers talking to the leaders to say, Hey's another case where you've got leaders talking, sorry, followers talking to the
leaders to say, hey, here's what I've got.
Help me catch back up and let me know when I'm
caught up.
It's interesting.
What happens if you lose the leader?
That's where we start getting into
much more complicated things.
I'll mention failover. One of the replicas
needs to be promoted, in which case
you need to figure out which one's going to be in,
and everyone needs to kind of update their configurations
and then move right along.
And it can be automatic in some cases,
but we talked about the problems there,
like what if a bad decision is made?
What if you've got two leaders taking a split brain,
taking on rights, getting into an inconsistency state,
and then you've got a big mess to clean up.
It would have been easier if just shut down.
One thing that we're kind of taking for granted though,
is that like,
um,
we keep saying split brain,
but we never technically defined it.
But so if you've never heard that term,
uh,
it is a common term,
but it basically means like in this scenario where,
uh,
you know,
take it like, as we refer to like in this scenario where, uh, you know, take it like,
as we refer to a primary and a secondary in the, in a database kind of world that, um, you know,
there's, there's something that causes an outage to where one of the replicas thinks that it should
take over as a leader. And so now you have two leaders and that's what's referred to as split
brain because some traffic might go to one leader
and other traffic might go to the new leader.
And so you have new data that came into both,
and then you have to figure out how to reconcile that.
Yeah, bad news.
You don't want to deal with that.
No.
That's like when you've got a problem that's when like you shut
down you put the site into maintenance mode application
you just turn it off and then
you order some pizzas and you get some
people together to figure out how the heck you even go about
doing that and it's going to be some
manual decisions that you're not going to
you're not going to like
so before you can
fail over first you have to determine if the leader
has failed which is really tricky. Like how, especially
for replicas, you know, outlaw mentioned
the witnesses. So if you're even
a human, sometimes it can be hard to know what's going
on if you've lost access to a data center
or something. You know, is it me
or is it just, you know,
like what's the problem here?
So
choosing a new leader, there's a whole
chapter on that
and there's some
interesting algorithms
I've heard of RAP before
and some
even calling them
algorithm
is almost like misleading
because it's kind of
a collection of algorithms
and you know
servers
like we've talked about
Zookeeper before
which is a system
often used for
kind of being a participant
Kafka
yeah for sure
yeah we've never done
an episode on Kafka we've never gone into detail
about what zookeeper does yeah no no we've mentioned it in passing it's a key value store
it does a it does a lot of little things i don't know well we should talk about zookeeper one day
we should although doesn't it seem like a lot of, I mean, it's a really popular open source way for managing this type of thing.
It seems like a lot of open source projects are trying to get rid of it as a dependency.
I know Kafka for a long time has talked about trying to kill that.
It's definitely on the Kafka roadmap.
Right.
Yeah.
Right.
So, yeah, I don't know. I don't know if it's worth doing an episode on it or not, but it is interesting because it was used for a lot of open source projects to make sure the system stayed running and, you know, all that.
I mean, we've been doing a lot of Kafka in the recent years, and I still don't know how Zookeeper fits.
I know it's necessary.
It's part of it, but I'm like, whatever. It's just, I guess I'll just have it running and it does something.
Yep.
So we talked about these nodes need to talk about each other,
need to know about the replicas in order to kind of publish things.
The replicas need to know where the leaders are.
So I know at least in Kafka, Zookeeper just holds a lot of that data.
And so a lot of things are the APIs go out and they ask like Zookeeper, like, hey, who's, which leader should I use?
Or, you know, what are my leaders?
Bootstrap servers too, it's just, it gives you one place to talk to.
And then the bootstrap server is in charge of saying, okay, you get to talk to these people and it'll forward you of um the the replicas that you should be talking
to and like that's zookeepers involved in there for now for Kafka but they want to get rid of it
because they want to cut the dependencies they don't have another service that you have to
secure access to and everything yeah and but like think about everything you just described is like
oh well it's got its own like you know data management that it has to you know it has its
own availability requirements so now now there's a whole other
cluster
for my cluster.
Suddenly, when did Exhibit
get to be into
software architecture? And he's like, hey,
dog, I heard you like clusters, so I added a cluster
to your cluster so you can have a cluster.
For real. And how do you keep a
zookeeper on as well? I mean, you can set up another zookeeper.
zookeepers all the way down.
All the way down.
Yep. And yeah,
a leader election, like we mentioned, is notoriously
hard, even though I think I can do it
pretty easily. Random numbers,
you said. That's it. Live lines of code.
Done. I mean, I wrote
it while we were just on this call here. That's right.
Yeah. So obviously I'm missing some major, major things. You we were just on this call here. That's right. Yeah.
So obviously I'm missing some major, major things.
You need to git commit and git push.
Yeah.
That's it.
Yeah.
Couple A.
Reconfigure.
So the last thing that can kind of happen, the last thing in this app for failover is basically any clients need to be updated.
So, you know, we we mentioned the bootstrap
service for Kafka or Zookeeper,
but as we said
for Elastic too, like Symfomsy,
these nodes need to know about each other. So when you add
a new one, the clients need to know that
something new has been added that they
either need to rebalance or communicate with
in order to split and route
and do all the crazy things with data that we're used
to doing.
And failover is hard.
So, you know, we mentioned like how long to decide until a leader's dead.
Even then, what do you do when the leader comes back?
You know, who's in charge of kind of bringing that data back together?
This is all stuff that we kind of jumped on to earlier.
So we're kind of like a split brain mentioned.
Well, hold up, hold up.
There is one, one big thing that I thought was interesting.
I remember reading this last night that when the leader comes back,
what do you do?
What if there were rights to that leader that, that like, what do you do?
You throw them away?
Like, do you put your entire system in an inconsistent state?
Like, it doesn't seem like it would be that hard.
But man, you're talking about a bunch of little things that can create some really big inconsistencies in your systems.
There was actually a mention of an issue that I don't remember if it was specific to this part of the chapter.
But it was kind of similar where they were talking about,
there was a,
a GitHub outage and they had,
um,
an auto incrementing ID for,
for the record.
So as new records were being inserted,
it,
you would automatically increment to,
you know,
five and six and seven and eight.
And they had this outage to where it got into this split brain scenario and
things were getting duplicated into both.
And then when they got the system back up and running,
they were actually from a security and privacy kind of point of view,
realized that they were serving up the wrong data to the wrong client because
the IDs got in,
you know,
messed up as part of up as part of that.
If I recall, I think that was, you know,
I might have some details messed up.
That's awful.
Yeah, that's what they said.
They were serving data to the wrong customers
because the IDs were out of sync.
That's crazy.
And that was GitHub, too.
I mean, we're not talking about like a small organization.
We're talking about GitHub,
and I was baffled when I found out that they run
on MySQL. Like, really?
Yeah, and it was sharded MySQL.
And you ran on MySQL? It was sharded
MySQL, right? It was the shards that got
out of sync and there were some problems.
At any rate, yeah.
It's crazy. Just one little thing.
One little thing. One ID could be wrong
and it could completely jack everything up.
So, crazy. If someone out there knows the answer is there a compelling reason to ever use mysql
over postgres now of course if you're using like a wordpress or something like i understand that but
oh even for wordpress i hate it well i'm just saying like out of total ignorance, is there ever a reason why you would pick MySQL or Maria over Postgres if you're doing a new database?
20 years ago, when MySQL was hitting the streets, it was like, oh, hey, a free database system, finally.
Yeah, let's use it.
Let's do everything with MySQL.
But yeah, to your point now, in the year 2021, when you have Postgres available for free.
So I think this is a good opportunity for somebody to leave a comment on the episode
because I think that this is actually a really good conversation.
And then the only thing I could think of off the top of my head is the tooling for MySQL
is still way better than a lot of tooling for Postgres, unless you go by DataGrip.
If you buy DataGrip, that's different.
But the open source tools for Postgres are not great.
Just a lot of stuff.
Postgres is definitely very complicated.
It's got a lot of really advanced features.
So I guess you could argue that maybe MySQL is simpler,
same way you could say, well, I use SQLite instead of something else.
It's simpler, it's easier, whatever.
I'm just curious.
I do have one bit of hatred for Postgres.
Like one seriously
strong bit of hatred for Postgres.
Syntax, yeah.
Fight me. Not syntax.
Their date columns don't
store offsets.
Don't store offsets.
You have to write your own way to handle offsets and dates,, let's do this. Don't store offsets. You cannot get, you have to write your
own way to handle offsets and dates. And that is complete garbage. 100% trash. Meaning even if you
do set a column as a, as a timestamp column in Postgres and you tell it that it's a timestamp
with an offset column with all, it's with time zone.
With time zone, all it means is when you insert that data into it,
it's going to convert it to UTC and throw away the offset.
So that is a major piece of hatred for Postgres,
that they do not have a column that will allow you to store the offset with the date.
Or not necessarily a column, but it's not a built-in partin a data type that data yeah there's no data type yeah yeah yeah so if you want to add the associated updated column to every
single table you've got you're going to be doing either you create a custom type you'll have or
you end up having to add two columns one for the uh timestamp one for the offset which is exactly
which if you're going to do that credit custom type. Right though. Correct me if I'm wrong, but if I remember correctly though,
you can, you can derive the offset from the time zone,
but you cannot derive the time zone from the offset.
That's always correct. But so therefore having the time zone gives you,
you can, you can determine the offset.
It throws away the time zone.
It converts everything to UTC and stores the UTC value.
You don't get it.
No, there is a data type with time zone.
Yeah, read the docs.
It throws it away.
It converts it to UTC and throws it away.
Believe me, I was trying to do some stuff that I was going to stream on YouTube the other night,
and I went down a rabbit hole and got really mad about it.
So at any rate. I'm going to be off in documentation night for the rest of the episode.
Can we, can we press pause? All right, let's go back. Sorry about that. All right. So just like
we kind of hit on before there's solutions to these problems and, uh, it just gets really down
to details and there's some things that you're just going to have to lose. There's some things
that you're going to have to just decide, like, this is what we're going to do about it.
And we're going to have to deal with consequences because there's no just perfect, easy, clean, elegant solutions.
Some of the problems we mentioned, node failures and reliable networks.
And yeah, basically, depending on what database you're using, what it's being used for and being configured for, you know, you're going to kind of make your best configuration
and hope it's good enough.
So when it comes to replication logs,
remember these are the things that the followers use to follow
to get the changes.
There's three main strategies here.
We've kind of hit on them, so we'll go through quickly here too.
And the first one is the one that Alan talked about first,
which is the statement-based replication,
which is the old My alan talked about first which is the statement-based replication which is the old mysql way of doing things where the leader logs every insert update delete and the followers execute those commands just as if you would type them in order the problems things
like now rand like random numbers new ids can be different so they had to kind of hack around those
and try to find ways to kind of do stuff like that but even still you have problems with
things like auto increments and especially with
transactions because when you run
a transaction how do you 100%
guarantee that things happen
in the same order on different systems
so even if you run the same commands you're not guaranteed
to get the exact same output
and so basically MySQL
abandoned this because it just was too finicky
they were running into problems with the approach and uh yeah they called out the specific version two of my sql
where they changed it i remember it was like five yeah i've got one yep a question here about what
lsm databases do uh with things during the delete compaction phases.
And, oh, right, right, right.
So I was thinking about, so MySQL is a B-tree.
You know, we talked about a little bit between differences between B-tree databases where you can do things like make updates to the data in line.
And LSM databases, which are things like Elastic or some database for Kafka,
just takes in basically writes data to logs and append only fashion and then later goes back and does things like cleans up and deletes zombies and basically either throws away old data or compacts it somehow.
And if you are doing a statement based replication, then you can have replicas that get out of sync with each other because maybe one has started compaction and the other hasn't.
And so you basically can't have LSM databases
doing statement-based replication that are consistent.
And you can try to work around it,
just like we mentioned with MySQL trying to pop the appropriate values in there,
but it's an uphill battle because there's a better solution.
And that is right-ahead log shipping. And this sounds kind of similar. We've talked about
right-ahead logs several times. It's basically what the database does. It kind of writes down
what it's going to do before it does it. And then if it needs to roll something back or
it becomes in handy a bunch of different ways. But at a glance,
it doesn't really sound that much different.
Like I always kind of thought of the right ahead log is basically being a
log of the things that I'm going to do before I go do them.
Why wouldn't it be the actual statements?
But the thing is,
it provides one level of abstraction and also has more details.
So when I say maybe saying a level of abstractions is not a good way of saying it,
but it actually has finer details,
like basically where things are going to be written to,
like specifically on disk.
So like literal segment numbers.
And it's very much what the database is going to do very precisely and exactly.
Well, but am I remembering it wrong, though?
Because it was saying that that depended on the particular database,
for example, implementation.
Some of them might be that specific where they knew the storage layer
and they would say, hey, this is how you're going to ship this piece.
But other parts might just be like, hey, it's this row or it's, hey, it's these columns.
No, that was different.
So what he's talking about right now with the right ahead log shipping, it was very much at the storage layer, like what he was saying.
It dealt with how data was stored at the file level.
Right. with how data was stored at the file level, right? And the problem with that that they said, too,
is because it is so coupled to that storage engine,
the upgrades could be really painful.
Oh, right, because they got into examples to where, like,
you would probably have to upgrade the replicas
before you could upgrade the primary.
So you upgrade your replicas, then you purposely do a failover so that an upgraded replica becomes the primary so you upgrade your replicas then you purposely do a failover so that a new an
upgraded replica becomes the primary then you can update the previous primary and they took that
even to a further extent and said that that may not even be possible if the upgrade had an
incompatible storage engine like basically to make this work well with upgrading your, your database server
is the newer version of the server would have to have the old version compatibility built in.
If it didn't, then you're probably going to have downtime because there's no way to avoid it.
So yeah, the, this was because it was dealing with the low level storage of the data and how
it was stored on disk, it causes problems,
even though it worked better than the other one,
which was the,
you know,
statement shipping.
Um,
it still had its problems.
Oh yeah.
It caught it statement based replication versus right ahead log shipping.
Right.
But then what you just leaked into was the next one,
which was the row based log replication.
Yeah. Logical row based log replication. Yeah, logical row-based log replication.
Yeah, and so real quick, the right-hand log shipping, that's still a very common feature of relational databases like Postgres and Oracle.
And even SQL Server.
Yep, LSM databases too, so like Elastic and stuff can still make use of this.
But like you mentioned, the downside is the upgrades.
So row-based is the final one, and it mentioned that
MySQL Now has got this binlog format.
Not the final one.
Well, so final in terms
of evolution, as in like this,
you know, right-ahead log
shipping was the evolution of statement-based
replication, and row-based log replication
is kind of an evolution of right-ahead. And row-based log replication is kind of an evolution right ahead.
And there's a fourth type that's kind of on the side.
Yeah.
So let's say that row-based is the last of the things that you should actually consider.
They're the standard ones.
They're kind of the way that things are usually done.
Because that fourth one, yeah, I feel like this is clickbait.
That fourth one will surprise you.
Yeah, that's right.
Yeah, so we're teasing. But we're going to get there real soon. So, yeah, I feel like this is clickbait. That fourth one will surprise you. Yeah, that's right. Yeah, so we're teasing.
But we're going to get there real soon.
So, yeah.
So, this way is the one I was thinking about, the abstraction where it decouples the replication from the storage engine.
So, it's very similar to a write-ahead log.
It's just a little bit more abstract.
So, it cuts those details on, like, where, what do you call it, like the actual binary,
where the locations on disk are going to be stored.
And it almost sends like something,
like I've seen this with Debezium and Mongo,
where it kind of like sends like a message that basically says like,
hey, there's update this row, here are the two fields to change, or hey, we're doing a delete of this ID in this table or collection,
which is somewhat what Kafka does to the time zones.
So it's basically it's more complicated in terms of like, you know, this is something where you take the statements in, you convert them for the write ahead log.
And now you've got a process that goes and takes that write ahead log and then makes this kind of more generic.
And so it's like a third layer.
So it's more complicated, but it
allows the most flexibility. You don't have to worry about downtime. And this is frequently
referred to as change data capture. So when people talk about CDC or change data capture,
they're generally talking about row-based log replication. Yeah. And it's easy to understand
why this is kind of a preferred way is because now you're dealing with the actual rows that changed instead of all the logic to get there.
Right. So if if you're trying to replicate data across servers, imagine you have triggers and all kinds of things in place.
You know, all that stuff has to fire off and then you get get your set of ACID transaction-based updates done.
Trying to replicate that across multiple servers is hard, but if all you're saying is,
hey, this row over here that has Michael Outlaw at age 21, update him to age 22 now.
Now, instead of running an update statement, you're just saying, hey, this record over here should have value 22 for the age, right?
Now, one of the things that was interesting about it is, I don't know if you guys caught this when you're reading it, it didn't necessarily use the primary keys.
It had its own way of finding those rows right i don't know if it was some sort of hash base type thing but but they said that
they kind of avoid using primary keys because it may not make it unique enough across the systems
which was kind of interesting oh i thought it was that maybe i remember that wrong i thought
i was saying that like in cases where there isn't a primary key then it used the combination of
those of all the columns but if there was a primary key then it used the
primary key i thought i don't know that it went into a ton of detail it said that it had its own
way of uniquely identifying rows which which was pretty interesting and now the book just slightly
touched on this is one of those things like the individual implementations would be their own book
right i've seen this in sql server where you would enable it specifically for a table. You would say, I want to turn
on CDC. It literally
had a proc name. And what that would do
is it would go and actually create a separate
table and a separate database
that would track the changes. And what it would do
is basically
publish these
abstractions for what's changing from
moment to moment. So it's like a history table,
but instead of
uh you know tracking like the whole state of the table it basically just tracks the changes between
each and uh kind of this intermediary language but i mean you're you're both are kind of like
talking about this in like a glowing kind of review you know but you could see where like
the right ahead log shipping would be more performant because you know see where the right-of-head log shipping would be more performant.
Because take the example of the triggers and everything that you mentioned, Alan,
where it was like, okay, you might want all that to be replayed.
And in the case of the right-of-head log shipping, it's just like, here's the end state.
Just do that.
Go to there.
Skip to the end, and you're done.
Right. But it's, it's reliant upon physical storage specs and stuff, which is, I mean, yeah, none
of them are perfect.
Right.
Like to your point.
Well, I mean, that's kind of like knowing the, the internals of the storage is like
really that that's the meat and potatoes of any kind of database system, right?
Like you expect that it's going to know that part of it.
And that's why you, uh, you know, invested time and money into it.
Right.
But like I said, the upgrades,
the upgrades become a problem as they change their stuff.
Like there's, there's definitely ups and downs.
Now, one of the things that's interesting about the change data capture though,
is, um, Jay-Z mentioned Debezium.
If you haven't heard of that, there's a very popular way to get changed data capture out of systems like Postgres or SQL Server and all that.
Pipe it into Kafka so that you have those changed sets.
And I think this sort of brings us into this next one, which is the trigger-based replication.
I'd never heard of this one.
Yeah, it's almost like, so you've probably done it before kind of manually.
And the idea is here that you can on-demand create a backup or a replica.
And, for example, you might have an application that just copies all the data
from the database into local memory, does this thing,
and then maybe at some point it does later. Another example would be if you just kind of did a backup
like you did a one-time backup and like loaded on a dev machine or something like that and those
are examples where like you are making a replica but just done very manually and it doesn't have
any sort of automated process for keeping things in sync but the book just wanted to call out like
hey this is kind of a fourth type. It's not like the others.
It's not really a process for keeping things in sync other than just kind of
like for one awful things that comes in handy sometimes.
Yeah. It's you doing it. Right.
Like it's you creating some sort of custom way to,
to create your replicas, which gives you flexibility, right.
Is really all it does.
What's our let's, let's check the temperature here.
What's our, our feeling on triggers.
We talking about this fourth way, or are we talking about database triggers?
Database triggers.
I can't hate them.
I like, I'm familiar with the downsides, but I just can't get offended when I see
them.
Yeah.
I'm sort of in the same boat.
I mean, everything has their uses.
In general, I've over the years drifted away from having all the business logic in a database,
which if you had asked me 10 years ago, five years ago, I probably would have been like,
yeah, keep it in the database.
It's easier but um as i've worked on more distributed things i definitely
would gravitate away from triggers because those are usually enforcing some sort of logic there but
they have their place i mean they wouldn't exist if they didn't yeah yeah, kind of scenario or opinion, like they do, they do kind of hide some details.
If you aren't familiar, you know, depending on what your familiarity is with the database and
especially like, you know, what you're using for source control for that database, um, it could
definitely be abstract, you know, hidden away from me. from you if you don't have it have that
database in some kind of source control where you can see all that functionality but yeah there's
definitely like you know i'm thinking of um where what's it called in sql server where you could
have like the version history like row version on, on the table and on a row.
Right.
But I forget what they call it.
Um,
where you can have that history,
temporal,
temporal tables.
There you go.
Thank you.
Um,
so,
so with a temporal table,
you know,
you change a value of a row,
but really behind the scenes,
you know,
SQL server is keeping all of those changes.
So if you needed to see it,
you can. And in Postgres, that doesn't exist. And the, you know, kind of work around way of doing
it instead with Postgres is you to use triggers. Well, I mean, that's the way it was done in SQL
server for years and years before temporal tables became a thing, right? Is you throw a trigger on it after, after insert, after update, whatever.
Yep. So, yeah, I mean, again, I can't hate them all. I'm with Jay-Z.
They have their place. I, I, if there were other options available,
I would probably gravitate towards those other options if they made sense.
This episode is sponsored by Educative.io. Educative.io offers hands-on courses with
live developer environments, all within a browser-based environment with no setup required.
With Educative.io, you can learn faster using their text-based courses instead of videos.
Focus on the parts you're interested in and skim through the parts you're not.
Yeah, so I've mentioned some of the courses I've really enjoyed,
particularly Grok in the system design interview.
Well, I was just on Educative Now.
I noticed they've got a section on free courses, which is awesome.
You should go check that out because it gives you a really good chance to see what they're like.
And they've got some really good ones. In fact,
I noticed there's one called Grokking. I actually just
closed it. Dang it.
Grokking, the
advanced system design interview,
which is a follow-up course to the one that I like so much.
And it's free right now. So make sure
you go check that out. Well, check this out.
It has
the course contents includes gigantic
chapters like meaning multi-module chapters on things like oh dynamo cassandra kafka and just
scrolling through i noticed that each one of those has a module on replication and data
partitioning i clicked into kafka how to design a distributed messaging system. It's a whole gigantic section just on how you would write Kafka and how Kafka works, including the role of zookeeper.
I was going to click on here.
Oh, there you go.
I got to take it.
Yeah.
And it's got information on how Kafka works underneath and stuff.
And it's just really good.
And so this is one of the free courses right now.
This access is going to expire probably by the time this airs,
so that's unfortunate for you.
But it's worth checking out because it looks like they're going to be doing these kind of –
they're just doing all sorts of cool stuff.
So you definitely got to go check it out.
And so I know we've talked about before that we mentioned the text-based courses versus watching videos to learn something and how important it is that you could just easily skim through the parts that you want.
But it's also important to mention that's in an interactive environment too.
So you can be doing coding exercises in that course. Uh, you know, there's like a playground that where you can actually
work in, uh, you know, like let's say you're taking a Python course, you could actually be
writing Python there in your browser, all within that part of the course, you know, and again,
focus on the parts that you want. And there's, they're always adding
so much great content. There's a whole new one now called DevOps for developers. And it's a path,
uh, you know, a collection of, of courses that's curate. Yeah. Well, that's easy for me to say,
but if only there was a, uh, a course on how to speak, I would be able to say that.
So, but it's a collection.
How's that?
Of, of courses, this focused on using DevOps includes, uh, you know, things like Docker for developers, Kubernetes, a practical guide to Kubernetes using Jenkins in Kubernetes. So, you know, there's other examples, like there's
a coding career handbook, which goes over the non-coding parts of being a successful engineer.
There's decoding, the decode the coding interview, which, you know, there's a whole great series,
like Joe mentioned, of, you know, just trying to be successful at the interview type courses,
and Decode the Coding Interview is another one.
So be sure to check out their best-selling
Grokking the Interview prep series
with courses like Grokking the System Design Interview
and Grokking the Coding Interview.
The newest edition, Grokking the Machine Learning Interview, actually focuses on the system design.
That's the system design side of machine learning by helping you design real machine learning systems, such as an ad prediction system.
It's the only course of its kind on the Internet.
Yeah, so go ahead and visit educative.io slash codingblocks to get an additional 10% off an Educative Unlimited annual subscription.
You'll have unlimited access to their entire course catalog, and it's a big one.
But hurry, because these deals don't run that often.
That's educative.io slash codingblocks to start your subscription today.
Are we done? What happened we we wrapped up yeah so uh we did thank you for noticing and uh yeah we'll have some links to
the resources we like obviously uh a copy of this book designing data intensive applications by Data Intensive Applications by Martin Kleppman. I hope I'm pronouncing that right.
And
yeah, with that, we head into
Alan's
favorite portion of the show.
It's the tip of the week.
Yeah, baby.
Alright, so it looks like I'm going first. So this is
a tip that I stole from Redcon. Thank you, Redcon.
And he filled this out on
our tip hotline, which is cb.show slash tips.
And, yeah, they'll take you to a form where you can enter tips like this
so that when I can't think of something, I can go steal it.
Wow.
Thank you, Redcon.
And what this is is a link to CSS generators on Smashing Magazine.
And what it means by CSS generators are basically, it's a list of tools
that you can use to interact with,
like a human, drag around,
make things look like you want,
and then grab the CSS for it.
So you don't have to kind of tweak the numbers in CSS
to get what you want.
So the first example is like a CSS border generator,
border radius generator, rather.
And so what it lets you do
is it gives you a really nice
way where you can drag a couple things around until you get a cool shape that you want so
for example maybe i'll make a circle here or maybe i'll try to make an egg and when it's done i hit
copy and now i've got that shape that i can go and take and drop in they've got tools like that
for basiar curves you want to make cool curves on your website and
CSS gradients.
Example, just different
color palettes. So there's a whole
thing on color theory and kind of finding
colors that are distinct from
each other in order to
differentiate. You have to
take into account accessibility
and stuff, which is also another great episode we should do.
And so this is just a really big list of tools
that you can use to do all sorts of really cool stuff
if you're working on front-end type code.
And so, yeah, a lot of them deal with gradients too,
which is kind of fun to be able to kind of click around
and come up with something that you've never seen before
that you never would have thought to go out and explicitly make,
but you drag some things around uh you had a little fun and now you've got a complex css uh selector for something
so yeah it's cool and now i'm doing front front end work again what's up uh it's really nice
oh here's one more i gotta talk about so a clip path generator so you can take an image upload it and let's say you want to um
just do a triangle so the triangle would for example only it would mask everything outside
the triangle so you only see the part of the images in the triangle well i'll tell you what
uh i want six points so let's do a hexagon and let's take it in so there we go i made a little
pac-man so now i've got a mask shape over an image that I can just copy and paste right here.
And now I'm obscuring the parts of the image that I don't want to see.
And yeah, it's got all sorts of cool like arrows and just different shapes and things that you can just tweak those little points around the image until it's something that maybe looks like your logo or looks like something you want to emphasize on that page.
And then boom, done.
Didn't have to tweak the numbers, didn't have to refresh, nothing.
So thank you. Well, that's interesting. It reminds me of, uh, um, a tip that, uh, I had back and I'm trying to find the episode number four. I found the episode, but I, uh, it was back before
we would put the episode numbers in there. So I don't remember, but it goes back to, um, 2018,
there was this, uh, link that we shared for a free code camp about, you know, this cool,
uh, bookmarklet that you could use to debug your CSS that I still use to this day where like,
you know, I just have it saved and I, I click that and, uh, you know, it,
it puts borders around all of the different elements. So you can see like, Hey, this span
is going here. This div is there. And like, why aren't these things lining up? And then you can
like see it, right. You know? So, uh, you know, you reminded me of that. I'll include a link to that in the show notes just to make that easy for others. case that I had for it at the time, but I basically wanted an easy way from the command
line to visualize a directory tree in Ubuntu. And it's kind of a hassle if you're just using like
LS to like see all the files and everything in the structure and whatnot. But there is a package or command called tree. You can app install tree.
And when you run it,
you can get a very pretty graphical view of a,
whatever directory you happen to point it at.
And of course there's like,
you know,
parameters that you can be like,
Hey,
filter out these files or don't go recursive or whatever.
But yeah, super,
super convenient command
for being able to see your directory tree
for wherever you are.
I love that tip.
I actually use that,
and I never even thought about that as a tip,
and it is a beautiful command.
Isn't it?
It really is.
Yeah.
Yeah, so I'll include a link to the documentation for it and the app install command.
Very cool.
All right, so I struggled to come up with tips on this one.
I feel like I haven't been doing anything useful of late, which is really frustrating.
And Outlaw actually reminded me of one that I had brought up on a chat channel the other day.
So I've been doing some Kotlin and there was something that was really
frustrating is I had went to set up a variable up above, right? So let's say var my var, right?
But I wasn't going to initialize that thing until I was inside a try block because I actually needed
to go to an external service to get the thing to set into this, right?
So I wanted that to be in a try block.
So in case the service failed, I would know.
Well, I kept getting compilers and Kotlin saying, you need to initialize this variable.
I'm like, yo, I did.
It's in the try block.
You know, look at it.
It's done. So what it turns out is Kotlin,
when, when the compiler goes through, it's not aware of the things in that try block
being something that's initializing, right? Because when I go to use it later outside that
try block, it thinks it hasn't been initialized. So there's a really nifty way in Kotlin to where you can initialize a variable
by setting it to the try block, which was really weird syntax for me. So I could say var my var
equal, and then try open curly brace, put my code inside. And if you're not familiar with Kotlin,
you don't have to put a return statement inside something.
Just whatever the last statement that executed
that had some output or some sort of value
will get assigned back to the variable for the try block.
So I was able to do what I needed to do,
which was keep that thing in the try block
and initialize the variable that way.
Have a link to the Stack Overflow
that kind of shows you exactly what's going on there.
And it worked out wonderfully.
It was, like I said, odd syntax, but kind of cool.
I found it.
It is episode 81.
Golly, man, that's been a minute.
80 episodes ago at two weeks per episode.
Yeah, 160 weeks ago. That's been a minute, um, 80 episodes ago at two weeks per episode. Yeah. 160 weeks ago. That's, that's been a minute. Like I said, it was 2018. So, so just over, uh,
three years ago. Yeah. Cause it was May 13th of 2018. So yeah, it's been a minute,
man. All right. So this other one, man, I searched on our site to see if we'd mentioned it. And I'm actually shocked that we haven't. So there is a product called HashiCorp Vault. It's actually called Vault. It's made by the company HashiCorp. And this thing's amazing. Like I've been reading on it because if you are, if you're writing an application, right,
it's going to come up at some point where you store secrets or where you manage keys or where
you do things like that, right? If you're in the cloud, you can totally use Azure Key Vault, right?
Or you can use AWS's key management services and things like that.
And that's all fine and dandy.
But what if you have an application that you're not running in Azure,
you're not running in AWS,
or what if you don't want to use their managed services for it?
HashiCorp Vault is actually a really good solution.
So this thing is really cool.
Basically, it is this thing that
allows you to, to manage your secrets, um, put, uh, put sensitive information into this thing.
Um, it will handle encryption and all kinds of other things. And what's really cool is it's 100%
API based, meaning all your interactions with it are through an API and it can even handle authentication for you.
So like one of the things that's really cool is let's say that you are writing your application in Azure or AWS or something like that.
And you need to authenticate to the vault to unseal a secret or something like that, you can actually use the, um, the, uh, uh,
the credential services from AWS, Azure, Google, Okta, whatever with token based type things to
go get that stuff, unseal what you need in the vault, and then provide that to your application.
So there's tons of use cases for this thing. It's, I believe it's free. It's open
source. You can use it in a regular application. It'll scale out on their main page here on
hashcorp.com slash product slash vault. They even have a use case, this vault case study that says
using vault to securely handle 100 trillion transactions for Adobe.
So basically Adobe uses this thing for all,
for basically logging in and handling secrets and all that,
all that kind of stuff.
So yeah,
this thing scales and it does all kinds of cool stuff.
So if you are storing anything in your application that is sensitive,
if you're not doing it in the proper way,
you should probably be thinking about that.
And this is probably a good thing to jump on and take a look at.
The cool thing about it too,
like there's some capabilities that like I've heard about,
we haven't even like really had a chance to explore in like,
you know,
the day to day work,
but like, do you remember secret server? Oh yeah. Right.
Where like a secret server was a product where you put all your passwords in
there and then, you know, applications could reach out to,
it was very similar applications could reach out to secret server to get the
passwords, but they don't really,
they never know it and you could restrict which people can even see the passwords or change the passwords. So, you know, HashiCorp
vault was very similar to that. And the reason why, you know, secret server came up was like
one of its big, um, claims to fame was like when it came to time to roll credentials,
then you're just doing it there. And those credentials can filter out. And there's like other, uh, thing, other services that were similar to secret server. I think, uh, it wasn't cyber
duck, I think was another one. Um, I know last pass also has an enterprise type thing as well.
Yeah. But I don't know that that's API based though for like applications. I think that's for
like, you know, enterprise in terms of like me me as administrator being able to allow you as one of my employees
access to the server but you know you can't you don't necessarily have you don't know what the
credential is yeah but vault also has similar capabilities from what i understand about like
how it can do credential rolling and specifically it has like dynamic secrets that can be created, but it also has this concept of like, um,
revoking secrets. So we can create time-based secrets that can, that can be, you know, revoked.
The dynamic secrets, by the way, are really cool because what you're talking about there, just so people understand is this. If your application needs to get access to some AWS service, right? For instance,
it can request temporary credentials to go access that service. HashiCorp Vault will actually handle
generating those credentials, those IM credentials for you, giving you back the stuff,
you can use this thing, but it may only be alive for like five minutes. Right. And so when you go
back to use it again, it'll be like, no, this thing's dead. So yeah, it's got that. And another
thing that's really important here that, that probably if you haven't been in this space,
you don't think about it audits everything. So if you unseal a secret,
it has an audit trail of it. If you create a new secret, it has an audit trail, creating dynamics,
whatever. So, so everything that happens is logged. So if you're dealing with things like
HIPAA compliance or any kind of access controls to where you need to be able to show that,
you know, steps A, B, and C were followed, this thing has that baked in for you so every time
you're doing something it's it's tracked and logged so um pretty powerful piece of software
that's free right so yeah i i definitely would like to understand it and you know more you know
take more better use uh advantage of it so it's pretty cool. Hey, we just figured out how to get our streaming thing working again.
So, you know, I don't know.
Yeah, that's true.
Okay.
Well, I won't beg for too much.
And this is why being a developer is hard,
and I wanted to be a monk, you know?
But I never got the chance.
Aww.
Come on, Joe. Come on, Joe. chance oh come on joe come on joe okay let me put it in perspective let me put it into perspective for you you remember um monty python's quest for the holy grail yep you remember the monks
walking around with the things goingants. Oh my gosh.
Okay.
Yeah.
It took me a second.
Yeah.
Yeah.
They,
you know,
it was,
it was a whole to do though.
I mean,
they asked why I made a vow of silence,
but I couldn't say.
So,
uh,
thank you,
Alec,
for that,
uh,
last, uh, final closing joke as we round out episode
one six zero there we are it's in the books finally an invisible number we finally got to
an even number yeah who knew it would take this long um yeah goofy so uh as i said before uh if
in case if like you know a friend turned you on to,
to like, you know, Hey, here's this link, check out this show or whatever. And if you're not
already subscribed, um, you can find us on iTunes, Spotify, Stitcher, or probably wherever you like
to find a podcast. We did get an email in this week. I think it was, uh, um, the listener was
using like the Google podcast app. So wherever you like to find it, we're probably there.
And if we're not, let us know.
We will take care of that issue.
We have a pretty good SLA for issues.
I don't know if you've heard, but yeah, definitely within the next two years, we will get to that.
Yep.
And as I asked before, if you haven't left us a review, I can't stress enough.
We really do appreciate it.
You know, we appreciate if you take the time out of your day to leave us that feedback.
And it does, you know, put a smile on our face to read some of the stories.
And some of the stories are just amazing that we read that we hear from listeners that, you know, the impact that this show has had on them.
So, you know, we greatly appreciate hearing it.
And if you would like to leave your feedback, you can find some helpful links at www.codingblocks.net slash review.
Hey, and while you're up there, we do have copious show notes.
So check those out.
We do have and hopefully some notes. So check those out. Um, we do have,
and hopefully some discussion on this one, right? Like, so, uh, codingblocks.net slash episode 160,
uh, leave a comment and, uh, enter yourself for a chance to win this fantastic, one of our favorite
books. Yeah. And make sure to follow us on Twitter. Now we don't actually tweet a lot,
but we do retweet. So if you are tweeting anything interesting, you know, just let us know or we'll follow you back and we'll watch out for that stuff and try to share that stuff out.
And also, you can go to thecoonboss.net and find our, you know, TikTok and all that other stuff at the top of the page.
I should probably go grab that name before.
Our 15-second videos.
I think you checked on TikTok onceok once we it was already taken
dang it i think i may be wrong maybe it might have been instagram or any other thing that we haven't
tried yeah