Coding Blocks - Designing Data-Intensive Applications – Lost Updates and Write Skew
Episode Date: March 20, 2023What are lost updates, and what can we do about them? Maybe we don’t do anything and accept the write skew? Also, Allen has sharp ears, Outlaw’s gort blah spotterfiles, and Joe is just thinking ab...out breakfast. The full show notes for this episode are available at https://www.codingblocks.net/episode206. News Preventing Lost Updates Detecting Lost Updates […]
Transcript
Discussion (0)
so yeah you're listening to coding box so is that going to be our new intro so yeah
so yeah that's actually a horrible way to do it like oh hey too oh hey oh hey oh hey
oh hey i was trying to make it sound kind of natural and i don't think that that worked
so hey because when do you ever start a sentence with like so hey oh i do all the time oh
okay well so hey you're listening to Coding Blocks.
It's episode 206.
Subscribe to us on iTunes, Spotify, wherever you even like to find your podcasts.
Visit us at codingblocks.net.
You can find examples and discussions.
All the show notes are there.
Yep.
And you can send your feedback, questions, and rants to comments at coding blocks or follow us
on the twitters at coding blocks yeah and uh the website we mentioned by the way we got a bunch of
sausage links there at the top of the page with that i'm joe zach i i'm michael outlaw but i gotta
say i noticed it like alan never plays with my shenanigans with the intro like i keep trying to
change it up and he's like no this is the script ain't having it yeah i gotta i gotta read it man so it's too early to mess with that anyways yeah he hasn't taken his
vitamins yet he can't he can't go off script so yeah i'm called out alan underwood hey i don't
believe in vitamins my wife takes them she gets sick more than i do okay this and other health
recommendations are all part of the show. That's right.
It's a health tips episode.
Don't take your vitamins is not one that I would have ever expected to have heard.
Yeah, I just don't believe in it, man.
I think it's wrong.
Okay.
Well, here's something you can believe in, and that is we truly appreciate your reviews. So we got a new one in from iTunes from Joe Millian.
So, yeah, thank you very much.
We appreciate those reviews.
And if you haven't already, you can leave us a review.
Find some helpful links at codingblocks.net slash review.
All right.
Excellent.
On with the show.
So last episode, we talked about weak isolation, committed reads, and snapshot isolation.
This episode, we're talking about lost updates, which is book, but maybe one way to explain it is – I forget.
It's too early.
Well, I mean, they kind of already hinted at one earlier in the book, right, where, like, if you and I both read a value and then updated – tried to do two updates to that value just by incrementing one to it, for example,
then one of our updates is going to just collide with the other one
and basically just be a repeat of it, but not truly update the value,
which was the intent of the application.
So that was an early example.
Okay, so a tangible example would be like,
we both are going to increment a number by one.
So I go in and read three, you go in and read three,
I go in and update to four, you go in and update to four.
And the answer we expected was five
because we're both doing two increments,
but because we both read at the same time and it was valid
and our updates got applied
the uh the actual intent was lost yeah like another another maybe even a better example of
this is that if you were to update like they gave the author gives an example of like a wiki page
but i mean you could imagine this is like you you know, like a WordPress type of site or Confluence, you know, page like older versions, maybe not current versions, but older versions of either of those products where like you and I are both updating the page and I click save.
And maybe I'm working on like the top half of the page and you're working on the bottom half of the page and you click save but because you had my old
version as as the top of the file your save loses my update like you undo essentially undo my update
yep i used to work for a company that had this problem a lot with content management systems
this is super naive like way back in the day and you'd have two people like say updating the front
page of a website and the way we had the back end administration stuff like it was all kind of on one page so
one person would go change the title another person would go change a paragraph and you know
they collide it's just kind of easy to think about this stuff if you think about it happening really
slow you know most of the time we're talking about these things happening you know very quickly but
if you could just kind of slow down the problem you can imagine like you know people two people
going in there both spending 15 minutes updating something and since both could just kind of slow down the problem, you can imagine two people going in there, both spending 15 minutes updating something.
And since both of them kind of got that copy of the front page to work with to start,
the changes ended up gliding and someone's stuff is going to get lost.
Even though with snapshot isolation, there's nothing
wrong that happened. This wasn't a bug. It's just because of the design.
This is how it ends up
happening where something can get lost yeah especially for like complex data types uh it can
it can get worse so like you know if we have like a a big json structure that you and i were both
editing that might have represented what that the content of that page was supposed to look like and the update only just
took the entire json blob as one whole thing then any change that i made could be over written by
your change because you had this older version of it you know what we did about the our problem
and this is this is before ajax this is before... We did Ajax, let alone called it things other than Ajax, right?
What we did is you would fetch the data from the front page to the administration panel.
And then when you click save, we'd go to see if someone else has updated it.
So it's kind of like a crappy version of snapshot isolation.
And so one of those people would be working on changes for a half hour.
And they'd click the button and it wouldn't work. It someone else has modified this before your changes are discarded and everyone hated that
but i think they actually referred to something like that in this portion of the book um and i'm
trying to remember it was like maybe it was the compare and set where like um well not exactly
like what you're doing but they were talking about in the compare and set version of that
where you could only do the update if the old value equaled
what you originally thought that it was.
So in your numerical example of like,
we both queried the value and got a three.
The second update couldn't update because the value no
longer equaled three it would have been four so that second update would have failed which would
be a way to to get around what you were just describing with the web page yeah you know it
still does that like i know that we've all been on like a confluent page uh the new wiki that that a
lot of people end up using and you'll all the time, when you go to save something,
hey, there's been changes to this page.
Are you sure you want to save them?
It's like, yeah, they can lose theirs.
It's fine.
Yeah, but it's got... The way the error messes,
it's never been clear to me whether they're losing it or not.
I know.
I've never known either.
I seriously wonder.
Yeah, it's gotten smarter about merging those changes in with what your current edits are
or you know what's funny about them compare set two is uh it's not an error if zero records are
updated or you know if you're updating multiple and only four of the five get updated or something
so it's up to the application to to know to check correctly it's's kind of funny. Yeah.
Yeah.
This is interesting.
I think that was like where there's kind of like a shift in the book though, starting with this chapter
where like a lot of the up to now in the book,
it's been talking about like how a database system,
regardless of what it is, whether it be relational or not,
but how that system can solve a lot of problems for you. Right. And,
and so like leading this pathway of like, Hey, here's,
here's things that, you know, it was originally documented in the seventies,
but not implemented until 2010, but you know,
of like things that could, could save the application developer a lot
of time but this this particular portion of the book it really felt like it was getting into like
well here's scenarios where there's only but so much that can be done for you there's gonna be
something station yeah yeah there's there's gonna be some things that you you're gonna you're gonna
have to do on your own that's not to say that the book doesn't get into some parts of how some of this can be handled.
But as a whole, this portion really felt more like, okay, as an application developer, you need to be aware of what you want to do.
But it also kind of made me think, too, that there was kind of like this underlying theme too of like well you need to think about
and plan out like how you write uh stuff down to whatever that storage mechanism is too because
that can have a big impact on it like i'm kind of getting ahead um a little bit so i'll save some of
those thoughts but yeah there are a couple things that database can do.
A couple rough solutions that have some trade-offs.
One is atomic writes. So some databases support atomic updates that effectively combine the read and write.
So we talked about that increment.
So imagine if that was just one operation.
And so the update and what you call the read and update happen at the same time,
which can happen for simple values.
We mentioned that before so increment is a great example of something
that's easy to do uh as things get more complicated it gets a little tougher
uh curse the stability that's locking the read object until the update is performed that's kind
of like what i talked about where like two people kind of went into the same page chance chances are
that they didn't click the button at the same you know exact time in my case anyway so one thing we
could have done is check to see if there was a lock ahead of time saying hey this person's got
this locked already so you should probably you know hold off or talk to them uh single threading
forcing all atomic operations to happen serially through a single thread um so this is basically
making everything run through one mechanism so it literally happens serially this is basically making everything run through one mechanism. So it literally happens serially.
This is the safest, but it doesn't distribute very well.
This is all about distributed data and making things scalable.
So if you've got one little thread that everything's got to go through and a lot of transactions to pop through, then this is not going to be a good solution.
Yeah, so both those last two that you mentioned, the cursor stability and that single threading, like both of those are probably fine in a smaller or a lower transaction database, right?
But as soon as you start getting some real traffic, you start seeing deadlocks and all kinds of things that start happening that are really bad and basically make applications unusable and unstable at some point. You know, one thing people kind of maybe don't realize,
especially you see this a lot with streaming,
but like people don't think about sometimes is falling behind can effectively
mean stopping.
Because what happens is like,
if you start falling behind and you can't keep up with the load,
it's possible to get into a condition where you will never catch up because
you're falling behind faster than the data is able to recover.
And so falling behind doesn't sound as bad.
It just sounds like a little bit late.
But if you can't catch up, then it's a lot worse than that.
At some point, I'm going to do enough of these episodes to where I will stop being embarrassed when I learn something new for the first time.
But I haven't gotten there yet.
So I'm still embarrassed that I learned something new while the first time i don't think so but i haven't gotten there yet so i'm still like embarrassed that i learned something new while reading this like something that i felt
like oh man i probably should have known that for a long time i never knew which one was this
so i'm embarrassed to say but the select for update oh i never knew that was a thing i'm like oh man that is awesome like i wish i had
known about that although that that's basically doing the cursor stability more or less where
it's it's locking the read rows and it's amazing but again as soon as you have a database that has
a lot of activity in it, like you start causing all.
And, you know, here's the worst part is if you've never actually dealt with deadlock problems like man, it will lead you down some paths that you're like, how in the world do I figure out what's going on?
And then what you realize is you've got thousands of database queries running, you know, every second.
And you start seeing this stuff and you're like well okay
now we've got to rethink exactly how we do everything right and that's rough i i don't
mean to say it as like oh this is going to be the new solution to every yeah you know i'm going to
bake it into every one of my queries but it was more like yeah you know i could see where if you
were selecting like a bunch of rows, you know,
that that could be,
that could cause deadlock problems.
But like for onesie twosie rows,
I'm like,
Oh,
that that's probably a pretty smart thing that I should have,
you know,
I should have already been doing in the past and haven't been,
but yeah.
All right.
So it's funny.
I bet a lot of people know about the,
with no lock because we want to get rid of these deadlocks, but they don't realize.
I mean, I think that's for me is the thing that you sort of pick up as you're going through this stuff, too, is I mean, sort of like you're saying, it's almost a little bit embarrassing.
Is, you know, somebody went and found this with no lock and then you start using it everywhere and you don't realize the types of problems you're introducing because you're doing it.
You know, that's my old friend with no lock.
I mean, that's that one gets baked into every query.
That's just a select statement.
I don't need to lock on anything with no lock.
Like I'm I'm doing it as a safety mechanism.
Listen, I'm the application developer here.
I know what I'm doing. That's right. Trust me. Do I know what I know what I'm doing it as a safety mechanism. Listen, I'm the application developer here. I know what I'm doing.
That's right.
Trust me.
Do I know what I'm doing?
I know what I'm doing.
You know, they say it's like, it's kind of just a suggestion, you know, like ultimately
it's up to the query optimizer and other rules of the database to kind of enforce that.
But every time I write them, I'm just like, I kindly suggest that you do what I tell you.
Right.
Forcefully.
Yeah.
Either you do what I tell you are i will find a database
system that will yeah right i suggest uh all right so what we got up next here uh yeah so another
solution is uh explicit locking so basically uh that kind of similar to what you talked about
before the application you're responsible for uh explicitly locking certain objects so uh placing
responsibility in the dev hands and that's where we say basically you know we're locking the front
page for edits and uh yeah uh it's it's okay in certain situations um the example they gave
was like a multiplayer game where each player can move a shared object which um i actually had a
hard time thinking this i'm like you know, right? Like, what is this example?
But the only way I could think of is like something like a Factorio
where like I can place a factory and either one of us can move it.
And so, you know, that's the definition of a shared object.
So if we both try to move it at the same time,
then we need to basically log, you know,
it would be a good idea for the application to lock it
so that one person grabs it and it's locked.
And then you do that for UI reasons because we don't want to both be trying to do the same thing and not realize that it's not going to work until the end you want to know as soon as possible
so the game can kind of make that a good experience yeah i had i had trouble trying to wrap my head
around the the game example too because i kept thinking like you know my my go-to game would be
overwatch and i'm like oh i mean i guess we
technically can't stand in the same spot so yeah that makes sense but the when i i would dumb it
down to like a game that might uh be a little less complex than my go-to is either a chess or
checkers type of game where it's like okay here's a very specific quadrant or area of the board that can't be consumed
or have two players on at the same time type of situation.
I'm surprised you guys struggle with this one.
Overwatch, you can't pick up a weapon.
Well, I guess I don't know if Overwatch you can pick up weapons,
but like Call of Duty, there's a weapon laying on the ground.
Somebody got there, and somebody got it first. The other person can pick up weapons, but like Call of Duty, like there's a weapon laying on the ground. Like somebody got there and somebody got it first.
The other person can't touch it.
Right.
Like that's that was kind of how I thought about it because it makes sense.
Right.
Like, hey, if this if if somebody comes down and gets this particular gun laying on the ground, that's locked.
Right.
This next person doesn't matter if he came a millisecond later.
It's done.
He can't touch it.
Right. So, yeah. Yeah yeah so that's a good point you know what's funny about that they talk
about this later too but um the example i mentioned where you like pick up something and move it and
both players could just try to pick up and move it so you can lock the object that you try to pick
up and move but what if we're both moving two different objects are you gonna lock the whole
world while we're both moving to try and make sure that we don't put them down in the same spot like how do you lock the destination i'm going to
go to yeah that's that's harder this is this is a part of the reason why i had a problem with the
game example though too is that like the games that we've described are like such fast-paced
kind of game between overwatch and and call of duty like those are fast-paced games so i'm like
well it's not like they're writing to a database of like what is available what isn't available they're not writing to a database so
that was part of the reason why i'm like i can't like i needed something like well what would be
an example of something that might be a little bit slower to where you know maybe it makes i
don't really think that it makes sense even in the chess and checkers but you know that was as good as i could get yeah uh diablo one actually had terrible problems with uh with duping so uh you could easily
dupe objects but like basically um you know dropping on the ground and picking up over and
over again like sometimes it would get duplicated and so it was common to be like hey can uh can
can you hop on a multiplayer game with me like you host a silver server to increase the latency
so then my friend would hop onto my server by dialing up to my house and then we'd be like they're trying to
dupe items and it would work a surprising amount of the time that's awesome dial up yeah good times
so uh so you know we mentioned problems with uh with lost updates but like you need to be able
to tell when that happens,
right?
That's the kind of scary part.
It's like in two people,
stupid object,
or,
uh,
you know,
you can imagine all sorts of cases like incrementing and bank accounts
balance or something.
If you don't realize that a mistake was made and the update was lost,
like that's really bad.
I guess there's no audit,
you know,
it's really tough to unwind unless your,
uh,
accountant or whatever is going to start looking through the transaction
log,
which,
uh,
if they're doing that,
all hope is lost.
Uh,
it really made me like think of,
uh,
which going back to the history of the name transaction,
I believe we said that it had to do with,
you know,
banking or financial type.
Uh,
uh,
it came from that kind of background,
right? Like why we call it a transaction in a database
right but it really made me think of like all the complexity like the needed ability to like
have audit trails and to verify that like you can't have a lost update right like that that's
not an option you know like how do we protect against that like all all of that type of
you know the requirements of like imagine imagine you know talking about a bank you know okay no big
deal because you know my one individual account might not get that many transactions per second
your individual account might not get that many transactions per second and but even if you were
to go to like a large scale bank you know okay fine they might get a that many transactions per second. But even if you were to go to a large-scale bank,
okay, fine, they might get a lot of transactions per second.
But that's really nothing when you compare it to the number
on a stock market, for example,
and the number of transactions that are flying through
and their need for keeping things accurate and reliable,
especially when they try to get as
close to the source servers as possible with as you know low latency a connection as possible so
they can uh you know be as up-to-date as possible so that was like the example of like um high speed
financial kind of transactions where you cannot afford to be wrong yeah you know a lot
of those stock market stuff um they look at like trade volume just the amount of trades happening
and stuff and so there's all examples where like there's even simple increments that are important
that you don't want to lose yeah uh yeah so we talked about snapshot isolation how um you can
keep a like basically a transaction id
on data to let you know like when the last thing was to update it so that's you know one solution
we could have here is basically to fail an update where the transaction id on the data we're changing
is higher than our transaction id it's just i read that i was like oh problem solved right what's
the problem uh it does allow for uh the applications to be done where basically the code is going to fail when it tries to update.
But you don't have a lot of flexibility.
Your option there is someone has updated this data.
Do you want to try and retry it again with the current data?
Or do you want to give up and have that application reevaluate?
This is your web pages,
uh,
solution.
Oh,
your updates are lost because it was already updated.
Right.
Yeah.
And you can hit save again and you'll get the new lock and overwrite their
changes.
But I write,
I reload the page,
right?
Like I'd imagine you'd have to reload the page to get a new,
but you can imagine a case where they didn't make them reload and be like,
are you sure you want to save? Yeah bye i i wrote three pages of new of
additional new content and alan corrected one comma mistake and his comma mistake causes me
to lose everything i've done you can see where a user would be like a little frustrated in that
type of situation a little bit uh but you can imagine
you know like so i we talked about having different options so if the database didn't
automatically fail it if it's a lot of transaction you could go back to the person say hey there has
been a change do you want to override it or do you want to reload the page those are things we
can do if we give the application more control uh so we kind of lose that with the transaction id but it's probably
okay uh they did they did have to uh take a little note in the book to throw a little bit more shade
towards my sequel uh in odb which is as far as i know still not the default uh storage engine i
don't remember but um they mentioned they've mentioned there were repeatable reads a couple
times uh and mentioned that it does not support uh this kind of um i forget what they call it but basically comparing the transaction id of the current
transaction to the transaction id stored on the uh data to be uh modified so some people say it
doesn't qualify as snapshot isolation hey so nodb is the default okay it didn't used to be used to
be at my isM. They have,
yeah,
they have my ISM.
They have memory,
CSV,
archive,
black hole.
They have a bunch,
but NODB currently is the default.
And I assume it's because it supports acid or,
you know,
mostly supports acid for what we're finding out in this book.
Yeah.
I don't think my ISM,
uh,
supported distributed transactions.
I forget.
I just don't.
I'm back in the day when I used to set up my SQL for like websites, i had like a little note there's like uh create my isa create my sql make
sure to change the uh storage engine to nodb the the the thing that you couldn't remember what it
was called that was just like the ability to detect lost updates yeah period i don't know
they never referred to that as like any other like feature name that I saw
that I recall.
Okay.
So what if your database doesn't even support transactions at all?
So,
you know,
no snapshot number.
You know,
this is where we talked,
we kind of jumped ahead a little bit here and said,
your best option there is compare and set,
which is,
you know,
where you can kind of take the value and say like hey i'm updating three two four assuming the data is
still three for this id and then you you have the option of you failing or we mentioned you know
you'll you'll get that um it won't fail it'll just say zero records updated and so it's up to the
application to say was there an update actually applied to records you need that kind of
functionality built in could you imagine
if you had to live in that world where every one of your your sql statements regardless i'm gonna
say sql generically i don't care what the storage mechanism is be it a kafka or elastic or mongo or
you know sql server if every one of your queries update queries you had to
pass in the old value because you were going to like,
you know,
it was going to be part of the predicate,
but then you also had to get the value of the records updated to make sure
that you're updating the intended number of records.
Like the additional complexity that you would have to add into your
application.
I mean,
it sounds awful and painful,
and I'm sure there was a time where that thing was done,
but this is just another example of the shoulders that we're standing on,
of the giants before us that solved these problems
so that we don't have to do that.
Because it doesn't sound like it's terribly hard,
but geez, the boilerplate boringness of it
garbage and like how easy that boilerplate would be to mess up even if you tried to abstract it
away as like okay there's a common library or whatever you know whatever your whatever way
you're going to try to justify it to yourself that like no i can solve this problem easy like
it still sounds awful yeah and you're wrong so back in the day i you know mentioned like working on the web stuff i was
working with cold fusion back then and uh around then is when the orms started coming out like the
first kind of orms i remember uh using and it was really common with the the earliest orms to
be really uh heavily focused on objects so you would like get the table object you would modify
your column and then you say dot save and it wasn't the earliest ones weren't smart about what values you updated it
would update the whole record so stuff was getting lost all the time it was such a common problem
because people were you know modifying enabled or something and uh it was the whole record would
get overwritten by somebody who you know was changing the title that was bad bad news well
then there was a time where the three of us worked on a homemade orm joy if you recall that bit of
fun yeah i don't mean to like you know cause you to twitch any more than you already do but throw
that thing in the trash. That's right.
All right.
So one thing that kind of sinks is that most of the examples that we've been talking here kind of assume like a simple setup without replicas.
Once you start talking about adding replicas in a true distributed database, then loss database get much tougher because, you know, the snapshot ID on the record on this replica could be different from that replica it could be different on the leader and so um comparing comparing set strategies
and locking strategies uh get a little bit tougher and so uh there's not a whole lot you can do and
this is a outlaw i was talking about this out of the book guys is like this is a situation you're
in there's not a lot the database can do for you so kind of the most common strategy here for these databases is just accept the rights and have an
application process to decide what to do so either you write something or maybe maybe they have
something that can kind of scan for problems and try to fix it up but um no it just gets into like oopsie mode i think the the the real takeaway though is
know your applications requirements because going back to that financial example right if
the strategy uh if you're going to go with a distributed database and the strategy is like
a last right wins or you know some, some kind of merge strategy, like
that might not be what you need, right? Like that might not meet your requirements. So you can't
just be in the game of, well, this is the thing I know. So that's the thing I'm going to use for
all of the problems I have. Right. Yeah. You know, what's funny is when, when you mentioned
the distributed databases and how things start falling back to the application instead of letting
the database side, because how can it, right? Like it's not going to know the right answer
for everybody's situation. I remember there was a clear point in time where I used to be
in favor of putting the logic in the database,
right? Store products, whatever. Like that was, that was where I lived because that was the heart
of the application. And I believe it was when we did domain driven design, when we went through
all that, that I was like, you know what? It does make a lot more sense to put this in the
application because for several reasons, one, you have it in source
control, which is amazing, which you can also put store proxy stuff in there. But the other was
your application logic can scale even if your database can't. Right. And that was always my
big thing is there's one place that it can always go back to. And if you know your application is
supposed to be doing something,
it's supposed to be the brains of it.
That's where that logic should live.
I mean,
I definitely recall those,
those episodes where we've,
we've described that and talked about like,
you know,
using,
like you mentioned stored procedures specifically,
like using stored procedures,
do everything.
And we kind of put it in the context of like,
you know,
keep your data,
your logic about your data close
to your data was you know maybe so i but i think that like i've kind of evolved to the point where
it's still a sometimes like maybe that maybe maybe that makes sense i'm not an always put
it into the app layer because there might be certain situations to where like well this example is fine yeah right it depends on what this scenario is and and maybe even parts
like what the technologies are involved are you know i thought uh like a good example of like uh
kind of the kind of checking application process might be is um you know like ach uh what's that
something check cleaning house basically it's a
whenever you write checks and you can even think about credit card companies like you'll see like
if you go and pay for gas at a gas station or something that might show as pending for a couple
days before it actually goes through and so i don't know why they do that you know but one
one exclamation could be that you've got uh you know all these kind of transactions flying by
and before those transactions are actually committed to these accounts there's a process could be that you've got, you know, all these kind of transactions flying by.
And before those transactions are actually committed to these accounts, there's a process that goes through like every 24 hours or something that says, OK, we think this is right.
Now let's go through and just double check all those accounts to make sure that the numbers
still add up.
So all the debits and the, you know, what you call credits all add up to zero and all
these modified accounts.
And, you know, maybe that takes a couple hours and it doesn't work.
They try again, you know, and keep doing it until it goes through.
And so it can take, you know, days for these transactions to go through.
Now, you know, I'm kind of making some guesses here and I haven't thought too much about it.
But just kind of the idea of having this like kind of accountability pass or reconciliation phase that goes through and make sure everything's good before we commit it yeah that's i mean that's sort of like a batch thing
at the end but i mean better safe than sorry right yeah all right well let's do it okay yeah
got in before jay-z did so i kind of already hinted at this before, but if you haven't left us a review,
we would greatly appreciate it. If you did take time out of your busy day,
you can find some helpful links at www.codingblocks.net slash review and,
and, you know, or email us, you could email comments at codingblocks.net.
We've gotten some great ones. In fact,
I forgot that we did recently get one in if i recall
um shoot i forgot to include that when we when we gave our thanks so um it was it was
cars and dax am i saying i don't know i'm probably saying that wrong probably just um but yeah so so it doesn't have to necessarily just be
in like your app of choice um you know it could be you know email you could hit us up on slack
twitter or whatever but we we really do get some truly uh heartfelt and inspiring uh feedback from
from all of you so we we do really appreciate it.
It really does mean a lot.
It really does help keep us going and help to keep us motivated.
So again,
you can find some helpful links at www.codingblocks.net slash review.
And with that,
I don't think it's just mine.
I think I want to say it's everyone's favorite.
It's time for everyone's favorite part of the show.
Survey says when I grow up, I'm going to be a everyone's favorite. It's time for everyone's favorite part of the show. Survey says.
When I grow up, I'm going to be a game show host.
Hey, I'm on a streak, just so you know.
Of one?
Yeah, of one.
That's pretty good to speak.
I'm on a one game winning streak.
You can double it here.
Every streak starts somewhere, right?
That's right.
All right.
Well, this is episode 206.
So according to TechHose trademark rules of engagement, Jay-Z, you are up first.
So you can break his spirits early.
I'm going to try to win this time.
Okay.
Yes, if you would put some effort in, we would greatly appreciate it.
Yeah, that would be awesome.
All right.
All right. For what reason might you choose not to go to a new restaurant even after someone recommended it to you?
Price.
Price. Okay.
That's really good. You don't like the type of food.
The type of food. Okay. Okay. So number one answer on the board.
Bad reviews online for 33.
Okay.
Wow.
Number one.
Okay.
So I'm not going to get slaughtered here.
Number two answer too expensive or the budget.
All right.
21 points awarded to Mr.
Joe.
What's up?
Too far away is the third answer at 16.
Oh,
don't trust the recommender at number four for 15.
Number five answer is no babysitter for seven.
I lose.
Wow.
Alan,
I lose.
Great.
Number six answer on the board.
Not like cuisine for five points.
So let's see.
How do you spell five again?
Okay, there we go.
In the hole.
In the hole.
Did I get the comma in the right place?
The decimal?
Okay.
Five points for Alan.
There's going to be an overwrite error here.
A lost update?
That's right.
Last right wins.
Okay.
A bad part of town for three points was the last answer on the board for that.
All right.
So, Alan's streak is off to a great start.
Yes.
Totally not.
My plan is working.
All right.
So, we go to Alan.
Name a reason you might call a taxi
go to the airport airport okay all right i'm gonna go with inebriated okay oh god it yes
i'm not sure how to take Alan's answer in.
I kind of want to stand on this, so I'm going to give you the answers first, and I'll tell you where I think Alan's answer is going to stand.
Number one answer on the board is needed a ride or no car for 46 points.
Wow.
You said needed a taxi.
That's what I should have said.
Right?
No.
Name a reason you might call a taxi, not need a taxi.
Yeah, but I need it.
Well, I'm going to call a taxi because I need it.
Yeah.
Number two answer on the board for 24 points.
Impaired.
All right.
Dang it, man.
Yeah.
Number two.
Number three answer. Lost or no directions for nine hard to find parking number seven and that's pretty airporty number five answer on the board
is not enough space in car now for four points it feels like it would have been the number one answer. I'm going to go with what's going to be
a controversial decision here,
but I think that
needed a ride
would best fit Alan's answer
of you need
a ride to the airport.
Yeah.
I didn't know you could just be like
call a taxi
if you need to call a taxi
wait what i didn't think i didn't think you could do that yeah well i mean the the question was why
would you call a taxi well because you need one yeah that was the number one answer
you need a ride yeah i need a ride yeah. Why else would you call a taxi?
Because I forgot my wallet in the taxi.
Number two.
Okay.
Okay, I see where we're going with this.
All right.
At any rate, I think that that was, this is why I was torn.
Because, you know, the destination wasn't like one of the choices here.
I like your choice, though choice though i guess i kind of
sound reasoning but i'm trying to be as impartial as possible um so i gave you i gave you what the
options were so you could see where my head was at but that was the one that i was immediately
thinking aligned with best okay game interesting it does because otherwise it was a blowout. Yeah. So, yeah. So we are now 51 to 45 in Alan's favor.
Okay.
And as is tradition, Joe, you get to pick the choice of the next question.
And your question choices are name something you might be asked to bring to a friend's party.
Or name the most annoying thing other drivers do on the road.
Or what is the most expensive thing you've ever bought while married without telling your spouse first?
So you pick the question.
Okay, so.
He's not picking number three because if Sarah.
Yeah, no way.
If she listens to this, he's in trouble.
I'm in trouble.
Yeah, no way.
What was the first one again?
That was...
Name something you might be asked to bring to a friend's party.
Friend's party.
Okay.
So friend's party or driving or buying something.
Geez, I'm going to regret this with driving.
I think I'll drive sometimes.
Name the most annoying thing other drivers do on the road.
That's your go-to?
Yeah.
It's tough.
There's several answers.
What's the most annoying?
I'm going to say lack of turn signals.
Turn signals, okay.
Lack of turn signals.
So that might be mine, but I think with the onset of phones, it's now people sitting at green lights.
Oh.
Not going.
Sitting at green lights.
Not going at a green light.
That happens a lot.
God, I'm so glad I gave Alan the 46 points on the last one.
It makes this one interesting.
No, no, no.
I'm saying it makes it interesting
right like you can feel the tension like you want to know is it because because if if if i didn't
then it would have been like 45 to 5 and like what kind of comeback would that be right right
so it's so much more interesting now that there's only like a six-point gap.
Which way do I want to answer these?
He's going to let it linger.
That's awesome.
So let me say I was really surprised that you did not say the number two answer on the board, Alan.
Cut you off.
Oh, yeah. For 26 points. Number three answer on the board is speed for 13 points.
People get irritated about that?
Come on.
Oh, how slow they're going.
Yeah, how slow.
Well, no.
Drive too slow is the fourth answer on the board for 12 points.
Wow.
Okay.
Tailgate is the fifth answer for seven points the seventh answer on the board is
loud music for three now to the astute listener the average answer was you will not even music
it's just the bass yeah for the astute listener you will notice that i did not yet say what the number one
answer was nor did i say what the number six answer was no so
number one answer on the board not use turn signals 28 points on the board mr joe's that i got to go first hot garbage use use cell phone
which is what i was going to classify your stoplight one since you said that uh using
the cell phone four points on the board four there's no way people are only that bad about that
so so uh alan's winning streak of one just got squashed.
It is now Alan's longest winning streak.
It'll go down in the history books of survey says as longest winning streak of one.
Jay-Z wins. I did try to win today.
73 to 55.
You know what?
I call bunk on the turn signals because nobody uses them.
So how could everybody be that mad about it?
Nobody uses them.
Everyone complains about other people not using them.
Right.
That's what I'm saying.
That's the most annoying thing.
It's not complaining about what you do.
It's complaining about what other people do.
Oh, that's so ridiculous, man.
But I swear to you, every day I get behind somebody that's sitting at a green light
with their head down looking at their phone and and there's people that honk at them non-stop like
i've gotten out the traffic lights where people are laying on the horn as soon as the light turns
green because they just don't want to deal with it but you know why that people aren't annoyed by
that though is that the person honking is like car number eight but cars one through seven are
all looking their phone none of them have noticed yet so you're right so only one out of eight were annoyed you know the part that gets
me but then we get on with the show what drives me insane is when they finally look up from their
phone and the lights yellow now all right they're the one car that squeaks through and you're like
dude if i could catch you i don't know what I'd do.
But I would be mad.
In your mind, you're thinking, listen, sir, I have a certain set of skills that I've acquired for a long career.
That's right.
My name is Jason Bourne.
Bourne, do you – wait a minute.
Your pop culture references are off.
Do you not know which movie i was referring to
no i don't which one really was it wait wait knowing you it's got to be pulp fiction or
something like that no i mean i i like i like where your head well first of all thank you
for the that kind compliment that it would be like knowing me you know be some pulp fiction
because that's like one of the greatest movies of all time so yeah uh fair but no i was it was the taken series
oh man that's neeson i haven't seen that in like 15 years man but i'll remember because because
that that's that quote was like so uh iconic to the movie and has been parodied so much man the way i remember that stuff
uh i barely remember in a galaxy far far away like that's oh from star trek
well done
oh man that's amazing i'm gonna get some hate mail now all right so i guess we'll pick back up here so the last thing we talked about was the conflict
resolution and replication so now we're getting into right skew and phantoms um so this one's
kind of interesting this is this is nuanced um off past stuff. So write skew is when there's
a race condition that occurs that allows writes to different records to take place at the same time
that violates some sort of constraint on state. So the best way I know how to put this is kind
of what we've been talking about up to this point is people modifying the same record, right?
And so when you modify the same record, there's things that you can do even in the database layer that takes care of those
for you. However, what if you are, you're basically writing a lock to something,
but you create a record for it. So, so Joe goes to create something, and so he tries to write a
lock record into a table, and Outlaw tries to create something at the same time, and he creates
a write lock record in that table. Well, there's no constraint on the database that says you can't
have two records in the same table. So now you have this situation to where this happened because
these both started at about
the same time. And it's the same situation we talked about before you read something
and said it was good. And so then you continued with your transaction and they actually gave an
example in the book that I thought was pretty good. Basically they had doctors on call, right?
And if you have a doctor scheduling system, you always have to have at least two doctors on call, right? And if you have a doctor scheduling system, you always have to
have at least two doctors on call. And that's because if one of them gets sick or whatever,
you still have another one available, right? Well, what if two doctors at roughly the same time end
up starting feeling like they're sick and they go in to say, hey, I'm not on call, you know, take me off call. Well, if they read the state of the of the system at the time it started, there were two doctors on call.
Right. Well, they both get that state back and now they say, OK, well, then we're going to make us off call.
And so both those records get written at the same time.
And now you have no doctors on call because the state of the system when it started the transaction was good so it's the same type situation but you're
writing to multiple records is the problem so it's not a lost update because both updates were made
and preserved they have which was the other problem of writing overwriting a value to the same record. It's just that the right skew here is that this is where application logic is bleeding
into your storage layer, right?
Yep.
The applications logic requires that the two doctors be on call, but it's not a database
constraint, so therefore there's no storage violation of it. So
you can get into this situation. Yep. And they even called out, you know,
Hey, if, if Dr. One had gone into the system and the system was saying, Hey, there's two
doctors on call or more than two on call. And they went and took themselves off call.
And then right after that happened, the other one did it.
Then everything would have been fine, right?
Because when that second doctor read the system, it would have said, hey, there's only two doctors on call.
You cannot go off call.
So that would have been fine because the transaction would have completed.
But when they both open it roughly the same time, that's where you start having these problems. So, and they say that this particular right skew thing is sort of a generalization of that lost update problem that we were talking about before.
Really, the nuance here is you're could enforce a constraint, for example.
Like they talked about materialized views as being like possibly one way to, if I recall correctly, using a materialized view as one way to potentially solve this type of issue or to ensure that you don't get into this situation.
But I was thinking like, well, I wonder if there's a way that you could just write it out to where it is a single record that is being maintained.
So this is tough, right?
Like they don't go into this in the book like what you're talking about here.
I'm always torn on that. Like, do you try and make
your storage layer look a particular way to support a business case that you're trying to do?
Because at some point that breaks down, right? Like whatever your business case is right now,
sure, you designed it. It works for that business case. But then at some point, they end up having to add another business case to it, and your storage structuring no longer works well for that.
So should you try and do that?
Well, here's what made my head go in that place, though, was that in the example in the book, the author includes some example, you know, sequel of what this might
look like, right? And of course, we don't know like all of the structure of that, you know,
we just have like a statement. But the table name in question was called Doctors. And I was like,
well, man, that seems like a horrible place that you would also store on-call information. That
seems like it should just be generic information. Like,
these are the doctors that serve this office, right? With like, you know, HR type information,
not an on-call schedule. And that's why I was like, started thinking in my head, like, well,
I wonder if there's a better place, you know, a better way that you could structure that type of
need. And then that way you could have the database help you out here rather than you know maybe you're
just putting like more than is necessary in this one table but i mean it's way beyond the scope of
what the author was trying to get i mean like he you know he i'm overthinking it i i i get i grant
you that but you know the whole purpose of this book though like as you're reading it like in my
mind i keep trying to think like okay well how could, how would I do this if I were to write this out to a flat file system?
How would I do this?
Whatever.
And so in this particular part of the book, it was like, well, how might I be able to – maybe this is a case – this specific example is a case of you're structuring the data wrong in the database.
Yeah, that's totally possible. Like I said, the, the only thing that sticks in my head when I start looking at things like this now, though, is it's easy when you're designing for today, as opposed to, Hey, what are the business problems going to be tomorrow that come up? I mean, we, we even, God, this, this has been a long time ago. We had, I think outlaw, maybe, maybe Jay-Z, you went, we went to this one meetup. This has been years ago
where somebody, I think from GitHub was talking about their interview strategies
and he would basically be like, Hey, um, this is what we're trying to do. Um, design your schema
or your data structure layout. And then he would actually purposely, as soon as they did it, let's say that they went to
a mini to mini type setup or a one to many type setup. He would throw a monkey wrench at them
right after that and be like, okay, well now I have this situation. What are you going to do?
And it was to force them to be like, oh, you have to change your data structure now.
And, and that's why, like, when I think about this, I'm like, well,
should you be trying to tailor your data storage needs for this as opposed to, hey, put the right application constraints in place, right?
So it's an interesting question, and I totally agree, right?
Like they could have completely designed this different, but then what kind of monkey wrench could you have thrown at it and been like, okay, that data structure is now terrible for this next use case?
It was, that was you and me. Jay-Z wasn't there. It was Stack Overflow.
Stack Overflow.
And they did a mock interview, a live mock interview, and it was one of our favorite
meetups that we went to.
Yeah, it was really good.
They had a real, you know, the presenter was from Stack Overflow. And I think that the person
doing the interview, like, I, I don't recall if he was actually applying. I don't believe so.
I think, cause I think it was just mock, but they, he actually walked through like, okay,
we're going to, this is what an interview at Stack Overflow would be like.
Yeah. And it was fun. I mean, it was good stuff like that, right? Like basically seeing how
people thought. So, all right.
So we're going to jump into this thing that Outlaw actually just alluded to a second ago.
So how do you prevent this right skew?
So you can do something like this atomic single object lock, but it won't work in this situation because you're doing more than one thing, right?
So you can't, unlike where you're trying to lock that object that two players in a game are trying to pick up, there's more than one object.
So you can't just do that.
Snapshot isolation also doesn't work in most implementations.
This is interesting to me.
SQL Server, Postgres, Oracle, and MySQL will not prevent write skew.
So none of your big databases out there, probably the top four of the top five, I'd say, aren't there.
And we could go to that database site that tells you, but those are pretty high up there.
So in order to do this, you actually need true serializable isolation.
And that's why it doesn't work in those databases.
They just don't have that set up.
Most databases don't allow you to create constraints on multiple objects, but this is where you could potentially use a materialized view, or you could use a trigger.
This right here would be the spark of holy wars everywhere, because I mean,
as I've worked on databases over the years, I've heard so many people that are for triggers and
I've heard so many people against triggers because triggers hide database logic, right? Like that's,
that's what said, and they also complicate things. So, but the idea here is you can't put a
constraint on a table saying that, Oh, there can't be two records with this. Like usually it won't
let you do something that complex, but you can do it with a trigger, right? So after an update or an insert or delete, Hey, I can't have more than two records
with this setting. Right. And then that could, that could throw some sort of error or something
or roll it back to whatever the state is. So you can do that, right? You can go in and do this,
but again, is it the right thing to do? Who knows? Maybe. I think, was this the portion of the book where they described like maybe creating
something else,
like another table that you could put a lock on.
So we're about to get into the schedule table.
Yeah.
And in the schedule table,
you can say like,
this is the block of time I want to select for update that block of time.
And then that way you could have like like one yeah that's coming up here
in a second so before we get to that though um they do mention if you can't use serializable
isolation your next best option may to be lock the rows for the update in a transaction meaning
nothing else can access them while the transaction is open. And this again is that select for update that Outlaw was talking about earlier, right?
So as soon as you read the record for Dr. One, as you're reading it, it's locking that record.
And it's not released until you've either updated or just released the transaction.
So it can be done that way but that's also that can be hairy so
phantoms causing right skew um there's a pattern excuse me one sec this was your star wars movie
right the phantom uh the phantom something phantom right skew that's star trek yeah don't count that movie oh right really oh man terrible okay um
so the pattern is you query for some business requirement so for instance um in the case of
the the doctors there has to be at least there has to be more than two on call you're going to
query the database select star from doctors on call um and you going to query the database, select star from doctors on call. Um, and you
need to have at least two records come back, right? So that's, that's the first part of your
business requirement. Um, the application then decides what to do with those query results.
This is step two. Hey, if I have more than two, then I can continue on. If this doctor's trying
to get off call, if I don't have more than two,
then I'm going to say, hey, you can't do anything. I'm not going to update any records, right?
The next one, if the application decides to go forward with the change, meaning that, hey, this doctor's trying to go off call, get off call, then an insert, update, or delete operation will
occur. And then that would change the outcome of that previous step where the application decides what's what to do when it gets those results back.
The important thing here is they said that these steps could happen in a different order.
And I've actually seen this done before. So for instance, what this was doing was,
hey, is it okay for me? Let me find out if there's more than two doctors on call.
If there are, now I'm going to update and take this one doctor off call. You could do it in the
reverse order and say, hey, set this doctor, take this doctor off call, then query to see, hey,
do I have enough doctors on call? And if you don't, then reset that update, right? So
you could do it in the reverse order, but the general idea is the same. Um, and I'm curious,
what do you guys feel about that? Do you always like checking for the precondition before? Do
you like checking for the, the, the condition after doing something?
Uh, geez. Yeah. What he said. So, you know, what's funny is i actually have a preference on this
um i generally like to do this is kind of stupid but i like to do the pre-check first because it's
less expensive operation um because reading is typically way, way cheaper than writing.
But to me, even after you do the right, you kind of need to check again. So you're doing that read twice a lot of times. Um, so, but,
but it seems like doing the right and then rolling back to right is more
expensive because like, like we've talked about before,
like you're writing to write a head logs,
you're doing all kinds of other stuff that is way more, io intense than just doing a read for a record yeah i mean where my head was at
when you asked that question is like well out of what i probably thought was
just in my head like an order of operations I would probably do the check first and then the right second.
And so maybe that's two things at play.
Like maybe there's like a little bit of laziness in there on my part to not like think through, you know,
oh, is there a different way I should be doing this?
But then also just the simplicity of like,
as I'm thinking through, I'm like,
well, I only want to do this if this thing is true.
So let me check
this condition first and then do the thing it's basically the equivalent of like do you prefer to
write a while statement or a do while statement right right and like typically i would write a
while statement first i'll say well as long as this condition is true go do the thing instead
of the do while where you're like well let's write let's do the thing and then see like right where
do we keep doing the thing it's the ask for forgiveness versus ask for permission right
like it's that order of operations there which is interesting um all right so in the case of
checking for records that meet this condition,
you could do that select for update to lock the rows,
which I mean,
we have all known about that for a long time.
Yeah.
Like years or minutes that that existed.
I forgot all about it entirely.
So,
you know,
don't feel bad.
Yeah.
A Friday existed.
Well,
here's the part that gets all gnarly.
What if you're querying for a condition that checks to see if a record exists?
Like the precondition is the thing that, hey, if there's records in this table, then that's what indicates that something's on.
So instead of like what I was talking about earlier, like you're updating a doctor's table.
What if instead you had a doctor's on call table that had these records in it right um if you're checking for records in a table and they don't exist there's nothing to lock right so
so there's no way for this like for update could actually even help you out there so it's it
creates a bit of a monkey wrench in this whole locking paradigm.
Yeah, this is where the scheduling table can help. And the author kind of implied that that's a
common answer to these types of problems, whether it be a scheduling on-call doctors or appointment
or conference room booking, you know,
like to make sure that the room isn't already taken that you might like
pre-allocate in some table,
like,
Hey,
for the next six months,
here's all the time slots that are available for that conference room.
And then you would never update that table for it,
but it would give you in your application something to lock on.
So we're going to go to that right now. Last thing on this other one is snapshot isolation
avoids phantoms and read-only queries. So what we were talking about were these read-write
transactions where you read to see if something's there and then you write if it's not. So the
snapshot isolation doesn't help with that. And that's why it's such a tricky problem. I feel like as far as this episode is concerned, like there's like some
latency issues or, you know, like some out of order packets happening. Cause like every one
of my comments have been like, well, that's not, that's for the next section. Right? Like,
yeah, he's wanting to get to the, to the meaty stuff. So now for the resources we like,
oh wait, right, right. We'll come back to that in a second.
Oh, so what he just mentioned, there's actually a name for it.
It's called materializing conflicts.
So this whole thing where you create a table of all combinations,
this is what kind of sucks about it, honestly, is you create this Cartesian product of, of every possible combination.
So he was talking about scheduling for a room,
right?
You have a meeting room that people can go and lock up for a time.
So you're going to have to figure out what increments that people are allowed
to book this room,
right?
So is it every five minutes?
Is it every 15?
Is it every 30?
Like,
so,
so now you have to figure out,
do my time slots look like 10 o'clock, 10, 15,
10 30, or do they look like 10, 10 Oh one, 10 Oh two.
Like, so there's already a problem you have to solve up front, which is a nasty one, right?
Then, then on top of that, you have to figure out, okay, well, how am I going to set this
thing up to where there's no overlaps and people don't hit
this because now you're going to have to lock these objects as you're creating these bookings
and all that. It gets really nasty. And that's one thing that they call out is when you're doing
this and it's truly a Cartesian product, right? Like every single possible combination of time
slots that can, that can happen. you're materializing conflicts because what you're
saying is you're taking these phantom rights that could have occurred and you're turning them
into lock records so it still doesn't yeah yeah lockable records it still doesn't map well for
materializing conflicts in my mind but whatever um yeah and you know we mentioned that game thing
where like you know we both grabbed the same object okay fine we could lock on that but now if we're trying to place them in the same spot like
you get a cartesian the product of all the pixels on screen you know like i've tried to place the
object that's like 10 by 10 and i want to put it starting at pixel like oh geez no you can't do it
it's too much data and it would it would bog down your thing but this is where they say hey this is
effective like this is a solution and
it can be done, but it should be a last resort because it's really hard to do. And, and the
errors that you end up creating out of this are really hard to figure out. Like it's just, it's
a difficult thing to do if you think about it. And the other thing that they call out,
if you want to be a purist about this is it is a nasty leakage of your
storage layer into your application tier,
right?
Because you're only doing this because you need your database to effectively
be able to make sure that there's not a conflict that comes up.
So you're,
you're now exposing that to other areas yeah last resort
you know now that we're done the episode i came up with the perfect example of the shared object
stuff minecraft like that's the game where you have shared objects in the world that people can
kind of manipulate and stuff like yeah totally figures yeah all right well, we'll have links to this book and other things in the resources we like section.
But now it's time to head into Alan's favorite portion of the show.
It's the tip of the week.
Yeah.
First, so we've talked a little bit about BuildKit before, which is like, you know, kind of a next evolution of Docker builds.
It's really efficient, has a lot of cool tricks that
are kind of almost like hidden you know it's um it's tough to try and involve what they've got
now and add new functionality that changes old functionality and so you kind of have to opt in
for some of these things in different ways like either by passing a flag to say use build kit or
saying like a syntax flag at the top of the file to say hey we're using newer features
uh so it's kind of tough but it's so worth it there's so much cool stuff in there and so i wanted to mention a couple things that i just
found out about so i think we've even talked about um the mount flag before which is a way that you
can kind of pass basically dash dash mount inside of a run statement in docker and mount files uh
that uh at build time.
So this is kind of similar to the kinds of things you can do when you do a Docker run and you can mount like a volume and that isn't actually part of
the container is way to kind of share files.
Real quick,
backing up.
So you said it's when you do the run,
he's talking about the actual run command inside a Docker file.
Yeah.
Right.
The Docker file.
So not a Docker run. Yeah. This is the run statement inside a docker file yeah right the docker file so not a docker run
yeah this is the run statement inside the docker file yep and so it lets you mount things in at
runtime and what's interesting about uh the the kinds i'm going to mention now the different
kinds of mounts is that they only apply in the scope of that statement so i'm mounting uh the
directory and that applies only for the things that happen later in this run statement.
So one of the flags you can pass, for example, is you can mount and set the type equal to cache.
And then say, you know, do a build, for example, like a.NET build or something.
And you're going to mount files that are going to bring in files and it's going to cache them locally.
And so that next build that
runs can potentially use those files in the same cache and this is a cache that docker uh kind of
puts and you know hides away in its own little location but it's a way of persisting some sort
of files between uh different builds which is super cool uh what's tricky though is that it
does only apply to the same the actual statement that does the mount so it's kind of like this weird thing where these files are here and they're gone uh in in future
phases they aren't uh they don't exist in future layers of the uh of the docker file unless you
explicitly copy them so you can imagine where you like do like a run statement say mount type equals
cache downloads the files and copy them to my layer or copy some of them to my layer that's something you could do and we'll have a link in the show
notes so cache is one of them i wanted to mention bind which is a really cool one too
so we say mount bind and i can bind files that are within my docker context which means inside
the folder that i'm doing the build. And I can bind files in there.
And it's kind of like mounting them in, but they only exist for that layer.
Why would you want to do that?
Well, it's a cool way of bringing in files that you need for builds, for example,
without actually bringing them into the layer.
So even if you were to delete those, well, let's not go there yet.
Sorry, I'm getting ahead of myself so you
you combine these files in at build time and then they don't exist they're not persisted to any layer
so if you've got some like large binaries or something that you want to do for a build or
maybe even your source files that you're going to do a build on um then this is a good example
where you can say i'm going to bind these files and
I'm going to do my bill and I'm going to take the artifacts that build to get into the DLLs
or the jars or whatever.
And I need to keep those in my image, but I don't need to keep the source files in my
image.
And this effectively lets you replace the ad or copies with this, um, build time mount.
And the advantage here is that those files aren't ever stored in the layer.
And we've talked about this a little bit before,
where a lot of times people don't realize that even if you do a delete of files in a Docker file,
they're still in the history because that's part of how that, you know, that Docker image came to be.
So even though you may bring in a bunch of files, do some stuff with them and then delete them,
the,
the,
um,
the layers are still going to be big.
So if you're doing,
you know,
builds and stuff and moving stuff around,
like that's part of the history and you're still slipping that data around.
So,
um,
even though files are maybe deleted from your final image,
they can still kind of end up gumming up the works.
So this is an interesting way around it.
That's assuming that you let the files persist from one statement to another
statement.
So they would be in one layer to the next statement.
If you did the something with files and then deleted them as part of the same
statement.
So you like you and ended double ampersanded a bunch of things together,
including the delete,
then they're not there.
The problem here, though, is that you're not going to do an add or copy and and something else.
Yep.
So this what you're describing is how can I bring those files in as part of a run statement
so that I can use them in that run statement without them needing to persist in the layer.
Yeah, and that's pretty cool, right?
Like you said, it's not something I could do before.
I think the thing, though, that's missing for context is what's a situation where I want files available in the run statement that I don't want to persist?
Oh, compile.
So in compile, I only care about the artifacts
of my build i don't need the source files after they've been compiled so then so like if you were
use docker as part of your build system you might have previously done something like a from whatever your compiler might need and then do an add or copy to bring those
files into the docker image and then in a run statement you might do like a maven package or
a dot net build or whatever but now because you separated the add and the run statement, the source files persisted in that previous layer.
But you're saying the only thing you really cared about
was the final jar or DLL that came out of it at the end.
But-
So I can mount my source directory.
But then do you run into context problems where like,
because you want to detect that the context changed in order to know
to rebuild and uh but i'm assuming that that will still be okay because those files are not part of
a docker ignore so it'll still uh it'll still detect that like oh one of the source files
changed i need to rebuild this image yeah you
got you nailed it so exactly the trick with bind is it only works on things in your context and
that way uh you know at first i thought it seemed like kind of an unnecessary limitation like why
can't i just mount some stuff from another directory but then i realized like oh it's because
if it was in another directory then the context checks something to figure out if the the layer
was still valid isn't going to work so this, you can only bind things in your context.
So I think you got another one, listen, their secret that I've actually used in the past.
And this is a perfect example of one that is, is really good.
Let's, so in the past, if you had a Docker file that needed to do something that needed
some sort of credentials, right?
The only way to
kind of do it was to have like either environment variables that were passed into it. And the
problem with that is you can always inspect that stuff on the layers later and see what those
credentials were. Yeah, even if you delete them, you can go back in a previous layer,
you can go look back and there they are yeah totally with the secret and this
is the important takeaway of what jay-z is talking about here with all of these is when you use these
the bind or the secret or any of these you do it in that run statement it never gets persisted to
any layer but you get to use all of the things that you did so you can bind a secret on your
layer and let's say that
you're pulling something from artifact or you're from Docker or whatever, and you have to pass
credentials that secrets used on that run command, but it's never persisted anywhere in the layer.
So it's ephemeral. It gets thrown away at during the build kit build of this thing. So, um,
that's the important part is nobody could go back through and then
scrape the stuff from those layers that you didn't want prying eyes to see.
Yeah, it was fantastic. And, uh, the next one SSH is similar, except instead of environment
variables, it's for SSH keys. So you can give, you can give access to your SSH keys in your
Docker build without persisting them. So if you need to do some sort of stuff with certificates or whatever that are sailing on a build machine then you can go
ahead and just borrow those ssh keys from the build server safely because they're never saved
to any sort of layer and so no one can go and steal your keys later and the final one temp fs
is i'm not familiar with temp fs it sounds It sounds like it's almost like a kind of a temporary directory
that's all in memory.
So I don't know, it sounds like something that might be kind of fast
and interesting I should learn more about,
but I just didn't get a chance to explore that one.
Yeah, if you do Docker builds, definitely look at this stuff
because it will open your eyes to some possibilities
and probably solve some problems
for you especially security related that that you would want to know about yeah and some of these
are like really hard problems to solve without this like it's it's kind of crazy to think that
uh build kit still feels experimental but it's not you know it's it's the way forward it's the
future it just in order to maintain backwards compatibility, it, you almost have to like bolt the stuff on, which is unfortunate.
Yeah.
I don't know.
I'll have a link to the show notes.
All right.
Well then,
uh,
my boring tip of the week by comparison,
uh,
well,
I'm going to now,
I guess what I'm dreading is that now I'm imagining my next pull request.
I got to go and update a bunch of Docker files.
Um, thanks Jay-Z for more work to do. Uh, I'm just kidding. So, uh, I attended a conference recently and one of the things that, uh, came out of it, there was a whole bunch of
things. So I'll probably like trickle in things, um, over time. But,. But one of the first I thought I would share was there was a
Google presentation and they gave, they showed a Google cloud architecture diagramming tool.
So the cool thing about this was that one of the things I liked about it was like,
you kind of had this like, we've talked about like ubiquitous languages, right? But what if
you had that for like drawing right to where like everybody's
using the same things to draw whatever, you know, you're doing.
But also what you could do is you could draw out what you want for an
environment in the Google cloud architecture diagramming tool.
And then you could say, Hey,
go deploy this thing and Google would go deploy it from your drawing.
I thought that was super cool that is sweet yeah so uh so i'll have a link to that and then the next one that i am super super
excited so this is like coming to a slack near you um it hasn't been released yet to the best
of my knowledge as of this recording It's still only been announced.
But Salesforce, who owns Slack, have introduced a chat GPT integration for Slack.
And there's a bunch of capabilities where the bot could automatically respond to messages for you and things like that, which, okay,
now we're going to be, now this is going to be interesting where we can't even be bothered to
respond to our own messages. Messaging our friends has gotten to be too complex. We need a bot to do
it for us. But my favorite feature that they showed was a summarize capability. So we've all been, it doesn't even have to be Slack.
I don't care if it was an old school forum
or if it was a Google Hangouts or a Google Chat,
if that's what your company uses, Teams, whatever.
We've all been in those situations
where there's like some common thread
where a bunch of people are writing
and you, like an insane person,
take the evening off to go spend time with your family and get some sleep or, you know,
you go out for the weekend or whatever. But meanwhile, while you're in your absence,
there was a whole bunch of things going on. And one of the things they showed was this
summarize capability where I don't know the exact syntax of it. So I like that. I described
it in my show notes as like slash chat GPT space summarize. But the point is, is that it, the bot
would summarize everything that you've missed in that thread. And like all the key points of like,
Hey, you need to be here at this date or turn blah, blah, blah in. I cannot wait for this feature to become more commonplace.
So an automated TLDR.
Yeah, yeah.
Yeah.
Pretty much.
Yeah.
Except they decided to use more letters with summarize, but yeah.
Right, yeah, yeah.
So mine, this, this was born
out of some things that I needed to do this week that ended up being pretty helpful. So, um, if
you're ever trying to debug anything in Kubernetes, it can be a bit of a pain, uh, depending on what
you're trying to do. Like, let's say that you have a pod where a container just crashes over and over
and over. If you've ever tried to shell into it, it'll kick you out like three seconds after you're in it. Like there's a lot of things that can happen.
Well, I found it's actually a Kubernetes thing.
There's these ephemeral debug containers.
Excuse me.
And what you can do.
Hold on one sec.
Figured I'd clear my throat without killing you guys.
So there's actually a command that you can run that is kubectl debug dash it for interactive terminal.
Then name it, whatever you want to call it. In this case, I did a femoral demo.
And then you pass in an image like busy box.
So if you're not familiar, busy box is one that people use a lot of times.
It's basically just sitting in a wait loop, but it's so you can sit on, on terminal and shell and do other things.
And then you tell it the target that you're trying to use, and this will tell it which pod to
basically, um, attach to. And so it'll launch a debug container and a pod that you want it to go
into. And so it's really nice if you need
to get in there and take a look around, like maybe you're having networking problems and you need to
see what's going on there. Maybe it can't talk to something else or whatever. So this is a really
good way to be able to launch a debug container, the Kubernetes way and take a look at things.
Um, I have a link in the show notes for that. And then I couldn't find a link for this one,
but this is something that somebody pointed me to the other day. So this is in Google Cloud.
I'm sure that Azure and AWS probably have something similar, but this has to do with logs.
So if you're taking a look at logs, a lot of times, you know, there'll be a time constraint
on what logs you're looking at, right? So you go up there and if you hit the log explorer, it might just be the most current
hour, right?
Well, let's say that you know that there's an error that happened four hours ago that
you need to see.
Well, a lot of times when you're looking at logs, you kind of need to see a little bit
before and a little bit after that one error that you saw.
And so you might need to see,
I don't know, five minutes worth of stuff. There is a feature in Google logs to where you can say,
Hey, show me entries around this time. And that around this time feature is amazing. So if you
click the edit time thing down at the bottom, there'll be a selection for around this time.
You can put in the time of the thing that you're looking for.
Then you can say plus or minus 30 seconds, plus or minus a minute, plus or minus five minutes, that kind of thing.
And so instead of you having to go in and like manually hand type in your time ranges, you can put in the one time thing and then tell it hey give me everything within a minute on either side of this and it's really nice for being able to go in and quickly
see what you're looking for and if i can find if i can find a link to it i will otherwise i'll go
screenshot it and and show you where it is in the ui or maybe just an example of the query query like or or yeah yeah that'd work too huh if only he had a way yeah right uh all right so yeah
yeah okay well uh we'll include the log lyrics um i'm just kidding that's all i could think of
though when you described like uh you know it's like oh yeah it's what rolls downstairs
loan or impairs okay so uh subscribe to us if you haven't already on iTunes or Spotify.
I don't know.
Maybe like somebody said, hey, check out these crazy guys.
And they sent you a link or something or whatever.
Or you heard it on their car while they were driving.
You can find us on iTunes, Spotify, all the major podcasting apps or platforms.
And if you haven't already left us a review, like I said before, we would greatly appreciate it.
You can find some helpful links at www.codingblocks.net slash review.
While Alan looks puzzled at something I said.
Didn't you say Spotter Stitchify?
Did he say that?
Yeah.
I thought he did.
We're going to check the record uh but uh
the gentleman from georgia does not believe he said we gotta roll that beautiful bean footage
back that's interesting all right so yeah hey while you're up there at spotter and stitchify
um i did not say that i swear you did i swear okay I might be having a stroke. I might need to call for medical assistance.
Either you did or I did.
That's what I'm really concerned about.
Like, I don't know.
Hey, so while you're up there, make sure you check out our show notes, examples, discussions, and more.
And make sure you send your feedback, questions, and rants to our Slack channel.
Yeah, and we've got a Twitter, at ConanBlocks.
You can go to ConanBlocks.net and find all our sausage links there
at the top of the page.
You always say sausage links, too, and I'm like,
I think you said sausage links.
That's not what I heard.
Man, I swear.
All right, I'm having a stroke.
I think so. I think it's Alan, yeah.
It is. Awesome.
Thanks, guys. I'm going to the doctor.