Coding Blocks - Designing Data-Intensive Applications – Lost Updates and Write Skew

Starting point is 00:00:00 so yeah you're listening to coding box so is that going to be our new intro so yeah so yeah that's actually a horrible way to do it like oh hey too oh hey oh hey oh hey oh hey i was trying to make it sound kind of natural and i don't think that that worked so hey because when do you ever start a sentence with like so hey oh i do all the time oh okay well so hey you're listening to Coding Blocks. It's episode 206. Subscribe to us on iTunes, Spotify, wherever you even like to find your podcasts. Visit us at codingblocks.net.

Starting point is 00:00:35 You can find examples and discussions. All the show notes are there. Yep. And you can send your feedback, questions, and rants to comments at coding blocks or follow us on the twitters at coding blocks yeah and uh the website we mentioned by the way we got a bunch of sausage links there at the top of the page with that i'm joe zach i i'm michael outlaw but i gotta say i noticed it like alan never plays with my shenanigans with the intro like i keep trying to change it up and he's like no this is the script ain't having it yeah i gotta i gotta read it man so it's too early to mess with that anyways yeah he hasn't taken his

Starting point is 00:01:09 vitamins yet he can't he can't go off script so yeah i'm called out alan underwood hey i don't believe in vitamins my wife takes them she gets sick more than i do okay this and other health recommendations are all part of the show. That's right. It's a health tips episode. Don't take your vitamins is not one that I would have ever expected to have heard. Yeah, I just don't believe in it, man. I think it's wrong. Okay.

Starting point is 00:01:37 Well, here's something you can believe in, and that is we truly appreciate your reviews. So we got a new one in from iTunes from Joe Millian. So, yeah, thank you very much. We appreciate those reviews. And if you haven't already, you can leave us a review. Find some helpful links at codingblocks.net slash review. All right. Excellent. On with the show.

Starting point is 00:02:04 So last episode, we talked about weak isolation, committed reads, and snapshot isolation. This episode, we're talking about lost updates, which is book, but maybe one way to explain it is – I forget. It's too early. Well, I mean, they kind of already hinted at one earlier in the book, right, where, like, if you and I both read a value and then updated – tried to do two updates to that value just by incrementing one to it, for example, then one of our updates is going to just collide with the other one and basically just be a repeat of it, but not truly update the value, which was the intent of the application. So that was an early example.

Starting point is 00:03:03 Okay, so a tangible example would be like, we both are going to increment a number by one. So I go in and read three, you go in and read three, I go in and update to four, you go in and update to four. And the answer we expected was five because we're both doing two increments, but because we both read at the same time and it was valid and our updates got applied

Starting point is 00:03:25 the uh the actual intent was lost yeah like another another maybe even a better example of this is that if you were to update like they gave the author gives an example of like a wiki page but i mean you could imagine this is like you you know, like a WordPress type of site or Confluence, you know, page like older versions, maybe not current versions, but older versions of either of those products where like you and I are both updating the page and I click save. And maybe I'm working on like the top half of the page and you're working on the bottom half of the page and you click save but because you had my old version as as the top of the file your save loses my update like you undo essentially undo my update yep i used to work for a company that had this problem a lot with content management systems this is super naive like way back in the day and you'd have two people like say updating the front page of a website and the way we had the back end administration stuff like it was all kind of on one page so

Starting point is 00:04:29 one person would go change the title another person would go change a paragraph and you know they collide it's just kind of easy to think about this stuff if you think about it happening really slow you know most of the time we're talking about these things happening you know very quickly but if you could just kind of slow down the problem you can imagine like you know people two people going in there both spending 15 minutes updating something and since both could just kind of slow down the problem, you can imagine two people going in there, both spending 15 minutes updating something. And since both of them kind of got that copy of the front page to work with to start, the changes ended up gliding and someone's stuff is going to get lost. Even though with snapshot isolation, there's nothing

Starting point is 00:04:59 wrong that happened. This wasn't a bug. It's just because of the design. This is how it ends up happening where something can get lost yeah especially for like complex data types uh it can it can get worse so like you know if we have like a a big json structure that you and i were both editing that might have represented what that the content of that page was supposed to look like and the update only just took the entire json blob as one whole thing then any change that i made could be over written by your change because you had this older version of it you know what we did about the our problem and this is this is before ajax this is before... We did Ajax, let alone called it things other than Ajax, right?

Starting point is 00:05:46 What we did is you would fetch the data from the front page to the administration panel. And then when you click save, we'd go to see if someone else has updated it. So it's kind of like a crappy version of snapshot isolation. And so one of those people would be working on changes for a half hour. And they'd click the button and it wouldn't work. It someone else has modified this before your changes are discarded and everyone hated that but i think they actually referred to something like that in this portion of the book um and i'm trying to remember it was like maybe it was the compare and set where like um well not exactly like what you're doing but they were talking about in the compare and set version of that

Starting point is 00:06:26 where you could only do the update if the old value equaled what you originally thought that it was. So in your numerical example of like, we both queried the value and got a three. The second update couldn't update because the value no longer equaled three it would have been four so that second update would have failed which would be a way to to get around what you were just describing with the web page yeah you know it still does that like i know that we've all been on like a confluent page uh the new wiki that that a

Starting point is 00:07:01 lot of people end up using and you'll all the time, when you go to save something, hey, there's been changes to this page. Are you sure you want to save them? It's like, yeah, they can lose theirs. It's fine. Yeah, but it's got... The way the error messes, it's never been clear to me whether they're losing it or not. I know.

Starting point is 00:07:17 I've never known either. I seriously wonder. Yeah, it's gotten smarter about merging those changes in with what your current edits are or you know what's funny about them compare set two is uh it's not an error if zero records are updated or you know if you're updating multiple and only four of the five get updated or something so it's up to the application to to know to check correctly it's's kind of funny. Yeah. Yeah. This is interesting.

Starting point is 00:07:49 I think that was like where there's kind of like a shift in the book though, starting with this chapter where like a lot of the up to now in the book, it's been talking about like how a database system, regardless of what it is, whether it be relational or not, but how that system can solve a lot of problems for you. Right. And, and so like leading this pathway of like, Hey, here's, here's things that, you know, it was originally documented in the seventies, but not implemented until 2010, but you know,

Starting point is 00:08:23 of like things that could, could save the application developer a lot of time but this this particular portion of the book it really felt like it was getting into like well here's scenarios where there's only but so much that can be done for you there's gonna be something station yeah yeah there's there's gonna be some things that you you're gonna you're gonna have to do on your own that's not to say that the book doesn't get into some parts of how some of this can be handled. But as a whole, this portion really felt more like, okay, as an application developer, you need to be aware of what you want to do. But it also kind of made me think, too, that there was kind of like this underlying theme too of like well you need to think about and plan out like how you write uh stuff down to whatever that storage mechanism is too because

Starting point is 00:09:13 that can have a big impact on it like i'm kind of getting ahead um a little bit so i'll save some of those thoughts but yeah there are a couple things that database can do. A couple rough solutions that have some trade-offs. One is atomic writes. So some databases support atomic updates that effectively combine the read and write. So we talked about that increment. So imagine if that was just one operation. And so the update and what you call the read and update happen at the same time, which can happen for simple values.

Starting point is 00:09:44 We mentioned that before so increment is a great example of something that's easy to do uh as things get more complicated it gets a little tougher uh curse the stability that's locking the read object until the update is performed that's kind of like what i talked about where like two people kind of went into the same page chance chances are that they didn't click the button at the same you know exact time in my case anyway so one thing we could have done is check to see if there was a lock ahead of time saying hey this person's got this locked already so you should probably you know hold off or talk to them uh single threading forcing all atomic operations to happen serially through a single thread um so this is basically

Starting point is 00:10:21 making everything run through one mechanism so it literally happens serially this is basically making everything run through one mechanism. So it literally happens serially. This is the safest, but it doesn't distribute very well. This is all about distributed data and making things scalable. So if you've got one little thread that everything's got to go through and a lot of transactions to pop through, then this is not going to be a good solution. Yeah, so both those last two that you mentioned, the cursor stability and that single threading, like both of those are probably fine in a smaller or a lower transaction database, right? But as soon as you start getting some real traffic, you start seeing deadlocks and all kinds of things that start happening that are really bad and basically make applications unusable and unstable at some point. You know, one thing people kind of maybe don't realize, especially you see this a lot with streaming, but like people don't think about sometimes is falling behind can effectively

Starting point is 00:11:13 mean stopping. Because what happens is like, if you start falling behind and you can't keep up with the load, it's possible to get into a condition where you will never catch up because you're falling behind faster than the data is able to recover. And so falling behind doesn't sound as bad. It just sounds like a little bit late. But if you can't catch up, then it's a lot worse than that.

Starting point is 00:11:33 At some point, I'm going to do enough of these episodes to where I will stop being embarrassed when I learn something new for the first time. But I haven't gotten there yet. So I'm still embarrassed that I learned something new while the first time i don't think so but i haven't gotten there yet so i'm still like embarrassed that i learned something new while reading this like something that i felt like oh man i probably should have known that for a long time i never knew which one was this so i'm embarrassed to say but the select for update oh i never knew that was a thing i'm like oh man that is awesome like i wish i had known about that although that that's basically doing the cursor stability more or less where it's it's locking the read rows and it's amazing but again as soon as you have a database that has a lot of activity in it, like you start causing all.

Starting point is 00:12:25 And, you know, here's the worst part is if you've never actually dealt with deadlock problems like man, it will lead you down some paths that you're like, how in the world do I figure out what's going on? And then what you realize is you've got thousands of database queries running, you know, every second. And you start seeing this stuff and you're like well okay now we've got to rethink exactly how we do everything right and that's rough i i don't mean to say it as like oh this is going to be the new solution to every yeah you know i'm going to bake it into every one of my queries but it was more like yeah you know i could see where if you were selecting like a bunch of rows, you know, that that could be,

Starting point is 00:13:06 that could cause deadlock problems. But like for onesie twosie rows, I'm like, Oh, that that's probably a pretty smart thing that I should have, you know, I should have already been doing in the past and haven't been, but yeah.

Starting point is 00:13:18 All right. So it's funny. I bet a lot of people know about the, with no lock because we want to get rid of these deadlocks, but they don't realize. I mean, I think that's for me is the thing that you sort of pick up as you're going through this stuff, too, is I mean, sort of like you're saying, it's almost a little bit embarrassing. Is, you know, somebody went and found this with no lock and then you start using it everywhere and you don't realize the types of problems you're introducing because you're doing it. You know, that's my old friend with no lock. I mean, that's that one gets baked into every query.

Starting point is 00:13:55 That's just a select statement. I don't need to lock on anything with no lock. Like I'm I'm doing it as a safety mechanism. Listen, I'm the application developer here. I know what I'm doing. That's right. Trust me. Do I know what I know what I'm doing it as a safety mechanism. Listen, I'm the application developer here. I know what I'm doing. That's right. Trust me. Do I know what I'm doing?

Starting point is 00:14:07 I know what I'm doing. You know, they say it's like, it's kind of just a suggestion, you know, like ultimately it's up to the query optimizer and other rules of the database to kind of enforce that. But every time I write them, I'm just like, I kindly suggest that you do what I tell you. Right. Forcefully. Yeah. Either you do what I tell you are i will find a database

Starting point is 00:14:26 system that will yeah right i suggest uh all right so what we got up next here uh yeah so another solution is uh explicit locking so basically uh that kind of similar to what you talked about before the application you're responsible for uh explicitly locking certain objects so uh placing responsibility in the dev hands and that's where we say basically you know we're locking the front page for edits and uh yeah uh it's it's okay in certain situations um the example they gave was like a multiplayer game where each player can move a shared object which um i actually had a hard time thinking this i'm like you know, right? Like, what is this example? But the only way I could think of is like something like a Factorio

Starting point is 00:15:08 where like I can place a factory and either one of us can move it. And so, you know, that's the definition of a shared object. So if we both try to move it at the same time, then we need to basically log, you know, it would be a good idea for the application to lock it so that one person grabs it and it's locked. And then you do that for UI reasons because we don't want to both be trying to do the same thing and not realize that it's not going to work until the end you want to know as soon as possible so the game can kind of make that a good experience yeah i had i had trouble trying to wrap my head

Starting point is 00:15:37 around the the game example too because i kept thinking like you know my my go-to game would be overwatch and i'm like oh i mean i guess we technically can't stand in the same spot so yeah that makes sense but the when i i would dumb it down to like a game that might uh be a little less complex than my go-to is either a chess or checkers type of game where it's like okay here's a very specific quadrant or area of the board that can't be consumed or have two players on at the same time type of situation. I'm surprised you guys struggle with this one. Overwatch, you can't pick up a weapon.

Starting point is 00:16:19 Well, I guess I don't know if Overwatch you can pick up weapons, but like Call of Duty, there's a weapon laying on the ground. Somebody got there, and somebody got it first. The other person can pick up weapons, but like Call of Duty, like there's a weapon laying on the ground. Like somebody got there and somebody got it first. The other person can't touch it. Right. Like that's that was kind of how I thought about it because it makes sense. Right. Like, hey, if this if if somebody comes down and gets this particular gun laying on the ground, that's locked.

Starting point is 00:16:40 Right. This next person doesn't matter if he came a millisecond later. It's done. He can't touch it. Right. So, yeah. Yeah yeah so that's a good point you know what's funny about that they talk about this later too but um the example i mentioned where you like pick up something and move it and both players could just try to pick up and move it so you can lock the object that you try to pick up and move but what if we're both moving two different objects are you gonna lock the whole

Starting point is 00:17:02 world while we're both moving to try and make sure that we don't put them down in the same spot like how do you lock the destination i'm going to go to yeah that's that's harder this is this is a part of the reason why i had a problem with the game example though too is that like the games that we've described are like such fast-paced kind of game between overwatch and and call of duty like those are fast-paced games so i'm like well it's not like they're writing to a database of like what is available what isn't available they're not writing to a database so that was part of the reason why i'm like i can't like i needed something like well what would be an example of something that might be a little bit slower to where you know maybe it makes i don't really think that it makes sense even in the chess and checkers but you know that was as good as i could get yeah uh diablo one actually had terrible problems with uh with duping so uh you could easily

Starting point is 00:17:50 dupe objects but like basically um you know dropping on the ground and picking up over and over again like sometimes it would get duplicated and so it was common to be like hey can uh can can you hop on a multiplayer game with me like you host a silver server to increase the latency so then my friend would hop onto my server by dialing up to my house and then we'd be like they're trying to dupe items and it would work a surprising amount of the time that's awesome dial up yeah good times so uh so you know we mentioned problems with uh with lost updates but like you need to be able to tell when that happens, right?

Starting point is 00:18:25 That's the kind of scary part. It's like in two people, stupid object, or, uh, you know, you can imagine all sorts of cases like incrementing and bank accounts balance or something.

Starting point is 00:18:33 If you don't realize that a mistake was made and the update was lost, like that's really bad. I guess there's no audit, you know, it's really tough to unwind unless your, uh, accountant or whatever is going to start looking through the transaction log,

Starting point is 00:18:45 which, uh, if they're doing that, all hope is lost. Uh, it really made me like think of, uh, which going back to the history of the name transaction,

Starting point is 00:18:56 I believe we said that it had to do with, you know, banking or financial type. Uh, uh, it came from that kind of background, right? Like why we call it a transaction in a database right but it really made me think of like all the complexity like the needed ability to like

Starting point is 00:19:14 have audit trails and to verify that like you can't have a lost update right like that that's not an option you know like how do we protect against that like all all of that type of you know the requirements of like imagine imagine you know talking about a bank you know okay no big deal because you know my one individual account might not get that many transactions per second your individual account might not get that many transactions per second and but even if you were to go to like a large scale bank you know okay fine they might get a that many transactions per second. But even if you were to go to a large-scale bank, okay, fine, they might get a lot of transactions per second. But that's really nothing when you compare it to the number

Starting point is 00:19:52 on a stock market, for example, and the number of transactions that are flying through and their need for keeping things accurate and reliable, especially when they try to get as close to the source servers as possible with as you know low latency a connection as possible so they can uh you know be as up-to-date as possible so that was like the example of like um high speed financial kind of transactions where you cannot afford to be wrong yeah you know a lot of those stock market stuff um they look at like trade volume just the amount of trades happening

Starting point is 00:20:33 and stuff and so there's all examples where like there's even simple increments that are important that you don't want to lose yeah uh yeah so we talked about snapshot isolation how um you can keep a like basically a transaction id on data to let you know like when the last thing was to update it so that's you know one solution we could have here is basically to fail an update where the transaction id on the data we're changing is higher than our transaction id it's just i read that i was like oh problem solved right what's the problem uh it does allow for uh the applications to be done where basically the code is going to fail when it tries to update. But you don't have a lot of flexibility.

Starting point is 00:21:12 Your option there is someone has updated this data. Do you want to try and retry it again with the current data? Or do you want to give up and have that application reevaluate? This is your web pages, uh, solution. Oh, your updates are lost because it was already updated.

Starting point is 00:21:30 Right. Yeah. And you can hit save again and you'll get the new lock and overwrite their changes. But I write, I reload the page, right? Like I'd imagine you'd have to reload the page to get a new,

Starting point is 00:21:40 but you can imagine a case where they didn't make them reload and be like, are you sure you want to save? Yeah bye i i wrote three pages of new of additional new content and alan corrected one comma mistake and his comma mistake causes me to lose everything i've done you can see where a user would be like a little frustrated in that type of situation a little bit uh but you can imagine you know like so i we talked about having different options so if the database didn't automatically fail it if it's a lot of transaction you could go back to the person say hey there has been a change do you want to override it or do you want to reload the page those are things we

Starting point is 00:22:19 can do if we give the application more control uh so we kind of lose that with the transaction id but it's probably okay uh they did they did have to uh take a little note in the book to throw a little bit more shade towards my sequel uh in odb which is as far as i know still not the default uh storage engine i don't remember but um they mentioned they've mentioned there were repeatable reads a couple times uh and mentioned that it does not support uh this kind of um i forget what they call it but basically comparing the transaction id of the current transaction to the transaction id stored on the uh data to be uh modified so some people say it doesn't qualify as snapshot isolation hey so nodb is the default okay it didn't used to be used to be at my isM. They have,

Starting point is 00:23:05 yeah, they have my ISM. They have memory, CSV, archive, black hole. They have a bunch, but NODB currently is the default.

Starting point is 00:23:11 And I assume it's because it supports acid or, you know, mostly supports acid for what we're finding out in this book. Yeah. I don't think my ISM, uh, supported distributed transactions. I forget.

Starting point is 00:23:22 I just don't. I'm back in the day when I used to set up my SQL for like websites, i had like a little note there's like uh create my isa create my sql make sure to change the uh storage engine to nodb the the the thing that you couldn't remember what it was called that was just like the ability to detect lost updates yeah period i don't know they never referred to that as like any other like feature name that I saw that I recall. Okay. So what if your database doesn't even support transactions at all?

Starting point is 00:23:54 So, you know, no snapshot number. You know, this is where we talked, we kind of jumped ahead a little bit here and said, your best option there is compare and set, which is,

Starting point is 00:24:03 you know, where you can kind of take the value and say like hey i'm updating three two four assuming the data is still three for this id and then you you have the option of you failing or we mentioned you know you'll you'll get that um it won't fail it'll just say zero records updated and so it's up to the application to say was there an update actually applied to records you need that kind of functionality built in could you imagine if you had to live in that world where every one of your your sql statements regardless i'm gonna say sql generically i don't care what the storage mechanism is be it a kafka or elastic or mongo or

Starting point is 00:24:37 you know sql server if every one of your queries update queries you had to pass in the old value because you were going to like, you know, it was going to be part of the predicate, but then you also had to get the value of the records updated to make sure that you're updating the intended number of records. Like the additional complexity that you would have to add into your application.

Starting point is 00:25:02 I mean, it sounds awful and painful, and I'm sure there was a time where that thing was done, but this is just another example of the shoulders that we're standing on, of the giants before us that solved these problems so that we don't have to do that. Because it doesn't sound like it's terribly hard, but geez, the boilerplate boringness of it

Starting point is 00:25:27 garbage and like how easy that boilerplate would be to mess up even if you tried to abstract it away as like okay there's a common library or whatever you know whatever your whatever way you're going to try to justify it to yourself that like no i can solve this problem easy like it still sounds awful yeah and you're wrong so back in the day i you know mentioned like working on the web stuff i was working with cold fusion back then and uh around then is when the orms started coming out like the first kind of orms i remember uh using and it was really common with the the earliest orms to be really uh heavily focused on objects so you would like get the table object you would modify your column and then you say dot save and it wasn't the earliest ones weren't smart about what values you updated it

Starting point is 00:26:10 would update the whole record so stuff was getting lost all the time it was such a common problem because people were you know modifying enabled or something and uh it was the whole record would get overwritten by somebody who you know was changing the title that was bad bad news well then there was a time where the three of us worked on a homemade orm joy if you recall that bit of fun yeah i don't mean to like you know cause you to twitch any more than you already do but throw that thing in the trash. That's right. All right. So one thing that kind of sinks is that most of the examples that we've been talking here kind of assume like a simple setup without replicas.

Starting point is 00:27:05 Once you start talking about adding replicas in a true distributed database, then loss database get much tougher because, you know, the snapshot ID on the record on this replica could be different from that replica it could be different on the leader and so um comparing comparing set strategies and locking strategies uh get a little bit tougher and so uh there's not a whole lot you can do and this is a outlaw i was talking about this out of the book guys is like this is a situation you're in there's not a lot the database can do for you so kind of the most common strategy here for these databases is just accept the rights and have an application process to decide what to do so either you write something or maybe maybe they have something that can kind of scan for problems and try to fix it up but um no it just gets into like oopsie mode i think the the the real takeaway though is know your applications requirements because going back to that financial example right if the strategy uh if you're going to go with a distributed database and the strategy is like

Starting point is 00:28:02 a last right wins or you know some, some kind of merge strategy, like that might not be what you need, right? Like that might not meet your requirements. So you can't just be in the game of, well, this is the thing I know. So that's the thing I'm going to use for all of the problems I have. Right. Yeah. You know, what's funny is when, when you mentioned the distributed databases and how things start falling back to the application instead of letting the database side, because how can it, right? Like it's not going to know the right answer for everybody's situation. I remember there was a clear point in time where I used to be in favor of putting the logic in the database,

Starting point is 00:28:45 right? Store products, whatever. Like that was, that was where I lived because that was the heart of the application. And I believe it was when we did domain driven design, when we went through all that, that I was like, you know what? It does make a lot more sense to put this in the application because for several reasons, one, you have it in source control, which is amazing, which you can also put store proxy stuff in there. But the other was your application logic can scale even if your database can't. Right. And that was always my big thing is there's one place that it can always go back to. And if you know your application is supposed to be doing something,

Starting point is 00:29:25 it's supposed to be the brains of it. That's where that logic should live. I mean, I definitely recall those, those episodes where we've, we've described that and talked about like, you know, using,

Starting point is 00:29:35 like you mentioned stored procedures specifically, like using stored procedures, do everything. And we kind of put it in the context of like, you know, keep your data, your logic about your data close to your data was you know maybe so i but i think that like i've kind of evolved to the point where

Starting point is 00:29:53 it's still a sometimes like maybe that maybe maybe that makes sense i'm not an always put it into the app layer because there might be certain situations to where like well this example is fine yeah right it depends on what this scenario is and and maybe even parts like what the technologies are involved are you know i thought uh like a good example of like uh kind of the kind of checking application process might be is um you know like ach uh what's that something check cleaning house basically it's a whenever you write checks and you can even think about credit card companies like you'll see like if you go and pay for gas at a gas station or something that might show as pending for a couple days before it actually goes through and so i don't know why they do that you know but one

Starting point is 00:30:38 one exclamation could be that you've got uh you know all these kind of transactions flying by and before those transactions are actually committed to these accounts there's a process could be that you've got, you know, all these kind of transactions flying by. And before those transactions are actually committed to these accounts, there's a process that goes through like every 24 hours or something that says, OK, we think this is right. Now let's go through and just double check all those accounts to make sure that the numbers still add up. So all the debits and the, you know, what you call credits all add up to zero and all these modified accounts. And, you know, maybe that takes a couple hours and it doesn't work.

Starting point is 00:31:07 They try again, you know, and keep doing it until it goes through. And so it can take, you know, days for these transactions to go through. Now, you know, I'm kind of making some guesses here and I haven't thought too much about it. But just kind of the idea of having this like kind of accountability pass or reconciliation phase that goes through and make sure everything's good before we commit it yeah that's i mean that's sort of like a batch thing at the end but i mean better safe than sorry right yeah all right well let's do it okay yeah got in before jay-z did so i kind of already hinted at this before, but if you haven't left us a review, we would greatly appreciate it. If you did take time out of your busy day, you can find some helpful links at www.codingblocks.net slash review and,

Starting point is 00:31:55 and, you know, or email us, you could email comments at codingblocks.net. We've gotten some great ones. In fact, I forgot that we did recently get one in if i recall um shoot i forgot to include that when we when we gave our thanks so um it was it was cars and dax am i saying i don't know i'm probably saying that wrong probably just um but yeah so so it doesn't have to necessarily just be in like your app of choice um you know it could be you know email you could hit us up on slack twitter or whatever but we we really do get some truly uh heartfelt and inspiring uh feedback from from all of you so we we do really appreciate it.

Starting point is 00:32:45 It really does mean a lot. It really does help keep us going and help to keep us motivated. So again, you can find some helpful links at www.codingblocks.net slash review. And with that, I don't think it's just mine. I think I want to say it's everyone's favorite. It's time for everyone's favorite part of the show.

Starting point is 00:33:04 Survey says when I grow up, I'm going to be a everyone's favorite. It's time for everyone's favorite part of the show. Survey says. When I grow up, I'm going to be a game show host. Hey, I'm on a streak, just so you know. Of one? Yeah, of one. That's pretty good to speak. I'm on a one game winning streak. You can double it here.

Starting point is 00:33:18 Every streak starts somewhere, right? That's right. All right. Well, this is episode 206. So according to TechHose trademark rules of engagement, Jay-Z, you are up first. So you can break his spirits early. I'm going to try to win this time. Okay.

Starting point is 00:33:35 Yes, if you would put some effort in, we would greatly appreciate it. Yeah, that would be awesome. All right. All right. For what reason might you choose not to go to a new restaurant even after someone recommended it to you? Price. Price. Okay. That's really good. You don't like the type of food. The type of food. Okay. Okay. So number one answer on the board.

Starting point is 00:34:07 Bad reviews online for 33. Okay. Wow. Number one. Okay. So I'm not going to get slaughtered here. Number two answer too expensive or the budget. All right.

Starting point is 00:34:22 21 points awarded to Mr. Joe. What's up? Too far away is the third answer at 16. Oh, don't trust the recommender at number four for 15. Number five answer is no babysitter for seven. I lose.

Starting point is 00:34:41 Wow. Alan, I lose. Great. Number six answer on the board. Not like cuisine for five points. So let's see. How do you spell five again?

Starting point is 00:34:51 Okay, there we go. In the hole. In the hole. Did I get the comma in the right place? The decimal? Okay. Five points for Alan. There's going to be an overwrite error here.

Starting point is 00:35:02 A lost update? That's right. Last right wins. Okay. A bad part of town for three points was the last answer on the board for that. All right. So, Alan's streak is off to a great start. Yes.

Starting point is 00:35:18 Totally not. My plan is working. All right. So, we go to Alan. Name a reason you might call a taxi go to the airport airport okay all right i'm gonna go with inebriated okay oh god it yes i'm not sure how to take Alan's answer in. I kind of want to stand on this, so I'm going to give you the answers first, and I'll tell you where I think Alan's answer is going to stand.

Starting point is 00:35:55 Number one answer on the board is needed a ride or no car for 46 points. Wow. You said needed a taxi. That's what I should have said. Right? No. Name a reason you might call a taxi, not need a taxi. Yeah, but I need it.

Starting point is 00:36:12 Well, I'm going to call a taxi because I need it. Yeah. Number two answer on the board for 24 points. Impaired. All right. Dang it, man. Yeah. Number two.

Starting point is 00:36:28 Number three answer. Lost or no directions for nine hard to find parking number seven and that's pretty airporty number five answer on the board is not enough space in car now for four points it feels like it would have been the number one answer. I'm going to go with what's going to be a controversial decision here, but I think that needed a ride would best fit Alan's answer of you need a ride to the airport.

Starting point is 00:36:58 Yeah. I didn't know you could just be like call a taxi if you need to call a taxi wait what i didn't think i didn't think you could do that yeah well i mean the the question was why would you call a taxi well because you need one yeah that was the number one answer you need a ride yeah i need a ride yeah. Why else would you call a taxi? Because I forgot my wallet in the taxi.

Starting point is 00:37:27 Number two. Okay. Okay, I see where we're going with this. All right. At any rate, I think that that was, this is why I was torn. Because, you know, the destination wasn't like one of the choices here. I like your choice, though choice though i guess i kind of sound reasoning but i'm trying to be as impartial as possible um so i gave you i gave you what the

Starting point is 00:37:52 options were so you could see where my head was at but that was the one that i was immediately thinking aligned with best okay game interesting it does because otherwise it was a blowout. Yeah. So, yeah. So we are now 51 to 45 in Alan's favor. Okay. And as is tradition, Joe, you get to pick the choice of the next question. And your question choices are name something you might be asked to bring to a friend's party. Or name the most annoying thing other drivers do on the road. Or what is the most expensive thing you've ever bought while married without telling your spouse first? So you pick the question.

Starting point is 00:38:38 Okay, so. He's not picking number three because if Sarah. Yeah, no way. If she listens to this, he's in trouble. I'm in trouble. Yeah, no way. What was the first one again? That was...

Starting point is 00:38:49 Name something you might be asked to bring to a friend's party. Friend's party. Okay. So friend's party or driving or buying something. Geez, I'm going to regret this with driving. I think I'll drive sometimes. Name the most annoying thing other drivers do on the road. That's your go-to?

Starting point is 00:39:10 Yeah. It's tough. There's several answers. What's the most annoying? I'm going to say lack of turn signals. Turn signals, okay. Lack of turn signals. So that might be mine, but I think with the onset of phones, it's now people sitting at green lights.

Starting point is 00:39:29 Oh. Not going. Sitting at green lights. Not going at a green light. That happens a lot. God, I'm so glad I gave Alan the 46 points on the last one. It makes this one interesting. No, no, no.

Starting point is 00:39:43 I'm saying it makes it interesting right like you can feel the tension like you want to know is it because because if if if i didn't then it would have been like 45 to 5 and like what kind of comeback would that be right right so it's so much more interesting now that there's only like a six-point gap. Which way do I want to answer these? He's going to let it linger. That's awesome. So let me say I was really surprised that you did not say the number two answer on the board, Alan.

Starting point is 00:40:19 Cut you off. Oh, yeah. For 26 points. Number three answer on the board is speed for 13 points. People get irritated about that? Come on. Oh, how slow they're going. Yeah, how slow. Well, no. Drive too slow is the fourth answer on the board for 12 points.

Starting point is 00:40:40 Wow. Okay. Tailgate is the fifth answer for seven points the seventh answer on the board is loud music for three now to the astute listener the average answer was you will not even music it's just the bass yeah for the astute listener you will notice that i did not yet say what the number one answer was nor did i say what the number six answer was no so number one answer on the board not use turn signals 28 points on the board mr joe's that i got to go first hot garbage use use cell phone which is what i was going to classify your stoplight one since you said that uh using

Starting point is 00:41:33 the cell phone four points on the board four there's no way people are only that bad about that so so uh alan's winning streak of one just got squashed. It is now Alan's longest winning streak. It'll go down in the history books of survey says as longest winning streak of one. Jay-Z wins. I did try to win today. 73 to 55. You know what? I call bunk on the turn signals because nobody uses them.

Starting point is 00:42:07 So how could everybody be that mad about it? Nobody uses them. Everyone complains about other people not using them. Right. That's what I'm saying. That's the most annoying thing. It's not complaining about what you do. It's complaining about what other people do.

Starting point is 00:42:19 Oh, that's so ridiculous, man. But I swear to you, every day I get behind somebody that's sitting at a green light with their head down looking at their phone and and there's people that honk at them non-stop like i've gotten out the traffic lights where people are laying on the horn as soon as the light turns green because they just don't want to deal with it but you know why that people aren't annoyed by that though is that the person honking is like car number eight but cars one through seven are all looking their phone none of them have noticed yet so you're right so only one out of eight were annoyed you know the part that gets me but then we get on with the show what drives me insane is when they finally look up from their

Starting point is 00:42:55 phone and the lights yellow now all right they're the one car that squeaks through and you're like dude if i could catch you i don't know what I'd do. But I would be mad. In your mind, you're thinking, listen, sir, I have a certain set of skills that I've acquired for a long career. That's right. My name is Jason Bourne. Bourne, do you – wait a minute. Your pop culture references are off.

Starting point is 00:43:23 Do you not know which movie i was referring to no i don't which one really was it wait wait knowing you it's got to be pulp fiction or something like that no i mean i i like i like where your head well first of all thank you for the that kind compliment that it would be like knowing me you know be some pulp fiction because that's like one of the greatest movies of all time so yeah uh fair but no i was it was the taken series oh man that's neeson i haven't seen that in like 15 years man but i'll remember because because that that's that quote was like so uh iconic to the movie and has been parodied so much man the way i remember that stuff uh i barely remember in a galaxy far far away like that's oh from star trek

Starting point is 00:44:13 well done oh man that's amazing i'm gonna get some hate mail now all right so i guess we'll pick back up here so the last thing we talked about was the conflict resolution and replication so now we're getting into right skew and phantoms um so this one's kind of interesting this is this is nuanced um off past stuff. So write skew is when there's a race condition that occurs that allows writes to different records to take place at the same time that violates some sort of constraint on state. So the best way I know how to put this is kind of what we've been talking about up to this point is people modifying the same record, right? And so when you modify the same record, there's things that you can do even in the database layer that takes care of those

Starting point is 00:45:11 for you. However, what if you are, you're basically writing a lock to something, but you create a record for it. So, so Joe goes to create something, and so he tries to write a lock record into a table, and Outlaw tries to create something at the same time, and he creates a write lock record in that table. Well, there's no constraint on the database that says you can't have two records in the same table. So now you have this situation to where this happened because these both started at about the same time. And it's the same situation we talked about before you read something and said it was good. And so then you continued with your transaction and they actually gave an

Starting point is 00:45:57 example in the book that I thought was pretty good. Basically they had doctors on call, right? And if you have a doctor scheduling system, you always have to have at least two doctors on call, right? And if you have a doctor scheduling system, you always have to have at least two doctors on call. And that's because if one of them gets sick or whatever, you still have another one available, right? Well, what if two doctors at roughly the same time end up starting feeling like they're sick and they go in to say, hey, I'm not on call, you know, take me off call. Well, if they read the state of the of the system at the time it started, there were two doctors on call. Right. Well, they both get that state back and now they say, OK, well, then we're going to make us off call. And so both those records get written at the same time. And now you have no doctors on call because the state of the system when it started the transaction was good so it's the same type situation but you're

Starting point is 00:46:51 writing to multiple records is the problem so it's not a lost update because both updates were made and preserved they have which was the other problem of writing overwriting a value to the same record. It's just that the right skew here is that this is where application logic is bleeding into your storage layer, right? Yep. The applications logic requires that the two doctors be on call, but it's not a database constraint, so therefore there's no storage violation of it. So you can get into this situation. Yep. And they even called out, you know, Hey, if, if Dr. One had gone into the system and the system was saying, Hey, there's two

Starting point is 00:47:36 doctors on call or more than two on call. And they went and took themselves off call. And then right after that happened, the other one did it. Then everything would have been fine, right? Because when that second doctor read the system, it would have said, hey, there's only two doctors on call. You cannot go off call. So that would have been fine because the transaction would have completed. But when they both open it roughly the same time, that's where you start having these problems. So, and they say that this particular right skew thing is sort of a generalization of that lost update problem that we were talking about before. Really, the nuance here is you're could enforce a constraint, for example.

Starting point is 00:48:32 Like they talked about materialized views as being like possibly one way to, if I recall correctly, using a materialized view as one way to potentially solve this type of issue or to ensure that you don't get into this situation. But I was thinking like, well, I wonder if there's a way that you could just write it out to where it is a single record that is being maintained. So this is tough, right? Like they don't go into this in the book like what you're talking about here. I'm always torn on that. Like, do you try and make your storage layer look a particular way to support a business case that you're trying to do? Because at some point that breaks down, right? Like whatever your business case is right now, sure, you designed it. It works for that business case. But then at some point, they end up having to add another business case to it, and your storage structuring no longer works well for that.

Starting point is 00:49:31 So should you try and do that? Well, here's what made my head go in that place, though, was that in the example in the book, the author includes some example, you know, sequel of what this might look like, right? And of course, we don't know like all of the structure of that, you know, we just have like a statement. But the table name in question was called Doctors. And I was like, well, man, that seems like a horrible place that you would also store on-call information. That seems like it should just be generic information. Like, these are the doctors that serve this office, right? With like, you know, HR type information, not an on-call schedule. And that's why I was like, started thinking in my head, like, well,

Starting point is 00:50:15 I wonder if there's a better place, you know, a better way that you could structure that type of need. And then that way you could have the database help you out here rather than you know maybe you're just putting like more than is necessary in this one table but i mean it's way beyond the scope of what the author was trying to get i mean like he you know he i'm overthinking it i i i get i grant you that but you know the whole purpose of this book though like as you're reading it like in my mind i keep trying to think like okay well how could, how would I do this if I were to write this out to a flat file system? How would I do this? Whatever.

Starting point is 00:50:50 And so in this particular part of the book, it was like, well, how might I be able to – maybe this is a case – this specific example is a case of you're structuring the data wrong in the database. Yeah, that's totally possible. Like I said, the, the only thing that sticks in my head when I start looking at things like this now, though, is it's easy when you're designing for today, as opposed to, Hey, what are the business problems going to be tomorrow that come up? I mean, we, we even, God, this, this has been a long time ago. We had, I think outlaw, maybe, maybe Jay-Z, you went, we went to this one meetup. This has been years ago where somebody, I think from GitHub was talking about their interview strategies and he would basically be like, Hey, um, this is what we're trying to do. Um, design your schema or your data structure layout. And then he would actually purposely, as soon as they did it, let's say that they went to a mini to mini type setup or a one to many type setup. He would throw a monkey wrench at them right after that and be like, okay, well now I have this situation. What are you going to do? And it was to force them to be like, oh, you have to change your data structure now.

Starting point is 00:51:59 And, and that's why, like, when I think about this, I'm like, well, should you be trying to tailor your data storage needs for this as opposed to, hey, put the right application constraints in place, right? So it's an interesting question, and I totally agree, right? Like they could have completely designed this different, but then what kind of monkey wrench could you have thrown at it and been like, okay, that data structure is now terrible for this next use case? It was, that was you and me. Jay-Z wasn't there. It was Stack Overflow. Stack Overflow. And they did a mock interview, a live mock interview, and it was one of our favorite meetups that we went to.

Starting point is 00:52:38 Yeah, it was really good. They had a real, you know, the presenter was from Stack Overflow. And I think that the person doing the interview, like, I, I don't recall if he was actually applying. I don't believe so. I think, cause I think it was just mock, but they, he actually walked through like, okay, we're going to, this is what an interview at Stack Overflow would be like. Yeah. And it was fun. I mean, it was good stuff like that, right? Like basically seeing how people thought. So, all right. So we're going to jump into this thing that Outlaw actually just alluded to a second ago.

Starting point is 00:53:08 So how do you prevent this right skew? So you can do something like this atomic single object lock, but it won't work in this situation because you're doing more than one thing, right? So you can't, unlike where you're trying to lock that object that two players in a game are trying to pick up, there's more than one object. So you can't just do that. Snapshot isolation also doesn't work in most implementations. This is interesting to me. SQL Server, Postgres, Oracle, and MySQL will not prevent write skew. So none of your big databases out there, probably the top four of the top five, I'd say, aren't there.

Starting point is 00:53:47 And we could go to that database site that tells you, but those are pretty high up there. So in order to do this, you actually need true serializable isolation. And that's why it doesn't work in those databases. They just don't have that set up. Most databases don't allow you to create constraints on multiple objects, but this is where you could potentially use a materialized view, or you could use a trigger. This right here would be the spark of holy wars everywhere, because I mean, as I've worked on databases over the years, I've heard so many people that are for triggers and I've heard so many people against triggers because triggers hide database logic, right? Like that's,

Starting point is 00:54:30 that's what said, and they also complicate things. So, but the idea here is you can't put a constraint on a table saying that, Oh, there can't be two records with this. Like usually it won't let you do something that complex, but you can do it with a trigger, right? So after an update or an insert or delete, Hey, I can't have more than two records with this setting. Right. And then that could, that could throw some sort of error or something or roll it back to whatever the state is. So you can do that, right? You can go in and do this, but again, is it the right thing to do? Who knows? Maybe. I think, was this the portion of the book where they described like maybe creating something else, like another table that you could put a lock on.

Starting point is 00:55:13 So we're about to get into the schedule table. Yeah. And in the schedule table, you can say like, this is the block of time I want to select for update that block of time. And then that way you could have like like one yeah that's coming up here in a second so before we get to that though um they do mention if you can't use serializable isolation your next best option may to be lock the rows for the update in a transaction meaning

Starting point is 00:55:40 nothing else can access them while the transaction is open. And this again is that select for update that Outlaw was talking about earlier, right? So as soon as you read the record for Dr. One, as you're reading it, it's locking that record. And it's not released until you've either updated or just released the transaction. So it can be done that way but that's also that can be hairy so phantoms causing right skew um there's a pattern excuse me one sec this was your star wars movie right the phantom uh the phantom something phantom right skew that's star trek yeah don't count that movie oh right really oh man terrible okay um so the pattern is you query for some business requirement so for instance um in the case of the the doctors there has to be at least there has to be more than two on call you're going to

Starting point is 00:56:42 query the database select star from doctors on call um and you going to query the database, select star from doctors on call. Um, and you need to have at least two records come back, right? So that's, that's the first part of your business requirement. Um, the application then decides what to do with those query results. This is step two. Hey, if I have more than two, then I can continue on. If this doctor's trying to get off call, if I don't have more than two, then I'm going to say, hey, you can't do anything. I'm not going to update any records, right? The next one, if the application decides to go forward with the change, meaning that, hey, this doctor's trying to go off call, get off call, then an insert, update, or delete operation will occur. And then that would change the outcome of that previous step where the application decides what's what to do when it gets those results back.

Starting point is 00:57:31 The important thing here is they said that these steps could happen in a different order. And I've actually seen this done before. So for instance, what this was doing was, hey, is it okay for me? Let me find out if there's more than two doctors on call. If there are, now I'm going to update and take this one doctor off call. You could do it in the reverse order and say, hey, set this doctor, take this doctor off call, then query to see, hey, do I have enough doctors on call? And if you don't, then reset that update, right? So you could do it in the reverse order, but the general idea is the same. Um, and I'm curious, what do you guys feel about that? Do you always like checking for the precondition before? Do

Starting point is 00:58:14 you like checking for the, the, the condition after doing something? Uh, geez. Yeah. What he said. So, you know, what's funny is i actually have a preference on this um i generally like to do this is kind of stupid but i like to do the pre-check first because it's less expensive operation um because reading is typically way, way cheaper than writing. But to me, even after you do the right, you kind of need to check again. So you're doing that read twice a lot of times. Um, so, but, but it seems like doing the right and then rolling back to right is more expensive because like, like we've talked about before, like you're writing to write a head logs,

Starting point is 00:59:01 you're doing all kinds of other stuff that is way more, io intense than just doing a read for a record yeah i mean where my head was at when you asked that question is like well out of what i probably thought was just in my head like an order of operations I would probably do the check first and then the right second. And so maybe that's two things at play. Like maybe there's like a little bit of laziness in there on my part to not like think through, you know, oh, is there a different way I should be doing this? But then also just the simplicity of like, as I'm thinking through, I'm like,

Starting point is 00:59:42 well, I only want to do this if this thing is true. So let me check this condition first and then do the thing it's basically the equivalent of like do you prefer to write a while statement or a do while statement right right and like typically i would write a while statement first i'll say well as long as this condition is true go do the thing instead of the do while where you're like well let's write let's do the thing and then see like right where do we keep doing the thing it's the ask for forgiveness versus ask for permission right like it's that order of operations there which is interesting um all right so in the case of

Starting point is 01:00:23 checking for records that meet this condition, you could do that select for update to lock the rows, which I mean, we have all known about that for a long time. Yeah. Like years or minutes that that existed. I forgot all about it entirely. So,

Starting point is 01:00:39 you know, don't feel bad. Yeah. A Friday existed. Well, here's the part that gets all gnarly. What if you're querying for a condition that checks to see if a record exists? Like the precondition is the thing that, hey, if there's records in this table, then that's what indicates that something's on.

Starting point is 01:01:00 So instead of like what I was talking about earlier, like you're updating a doctor's table. What if instead you had a doctor's on call table that had these records in it right um if you're checking for records in a table and they don't exist there's nothing to lock right so so there's no way for this like for update could actually even help you out there so it's it creates a bit of a monkey wrench in this whole locking paradigm. Yeah, this is where the scheduling table can help. And the author kind of implied that that's a common answer to these types of problems, whether it be a scheduling on-call doctors or appointment or conference room booking, you know, like to make sure that the room isn't already taken that you might like

Starting point is 01:01:48 pre-allocate in some table, like, Hey, for the next six months, here's all the time slots that are available for that conference room. And then you would never update that table for it, but it would give you in your application something to lock on. So we're going to go to that right now. Last thing on this other one is snapshot isolation

Starting point is 01:02:13 avoids phantoms and read-only queries. So what we were talking about were these read-write transactions where you read to see if something's there and then you write if it's not. So the snapshot isolation doesn't help with that. And that's why it's such a tricky problem. I feel like as far as this episode is concerned, like there's like some latency issues or, you know, like some out of order packets happening. Cause like every one of my comments have been like, well, that's not, that's for the next section. Right? Like, yeah, he's wanting to get to the, to the meaty stuff. So now for the resources we like, oh wait, right, right. We'll come back to that in a second. Oh, so what he just mentioned, there's actually a name for it.

Starting point is 01:02:49 It's called materializing conflicts. So this whole thing where you create a table of all combinations, this is what kind of sucks about it, honestly, is you create this Cartesian product of, of every possible combination. So he was talking about scheduling for a room, right? You have a meeting room that people can go and lock up for a time. So you're going to have to figure out what increments that people are allowed to book this room,

Starting point is 01:03:18 right? So is it every five minutes? Is it every 15? Is it every 30? Like, so, so now you have to figure out, do my time slots look like 10 o'clock, 10, 15,

Starting point is 01:03:27 10 30, or do they look like 10, 10 Oh one, 10 Oh two. Like, so there's already a problem you have to solve up front, which is a nasty one, right? Then, then on top of that, you have to figure out, okay, well, how am I going to set this thing up to where there's no overlaps and people don't hit this because now you're going to have to lock these objects as you're creating these bookings and all that. It gets really nasty. And that's one thing that they call out is when you're doing this and it's truly a Cartesian product, right? Like every single possible combination of time slots that can, that can happen. you're materializing conflicts because what you're

Starting point is 01:04:06 saying is you're taking these phantom rights that could have occurred and you're turning them into lock records so it still doesn't yeah yeah lockable records it still doesn't map well for materializing conflicts in my mind but whatever um yeah and you know we mentioned that game thing where like you know we both grabbed the same object okay fine we could lock on that but now if we're trying to place them in the same spot like you get a cartesian the product of all the pixels on screen you know like i've tried to place the object that's like 10 by 10 and i want to put it starting at pixel like oh geez no you can't do it it's too much data and it would it would bog down your thing but this is where they say hey this is effective like this is a solution and

Starting point is 01:04:45 it can be done, but it should be a last resort because it's really hard to do. And, and the errors that you end up creating out of this are really hard to figure out. Like it's just, it's a difficult thing to do if you think about it. And the other thing that they call out, if you want to be a purist about this is it is a nasty leakage of your storage layer into your application tier, right? Because you're only doing this because you need your database to effectively be able to make sure that there's not a conflict that comes up.

Starting point is 01:05:21 So you're, you're now exposing that to other areas yeah last resort you know now that we're done the episode i came up with the perfect example of the shared object stuff minecraft like that's the game where you have shared objects in the world that people can kind of manipulate and stuff like yeah totally figures yeah all right well, we'll have links to this book and other things in the resources we like section. But now it's time to head into Alan's favorite portion of the show. It's the tip of the week. Yeah.

Starting point is 01:05:55 First, so we've talked a little bit about BuildKit before, which is like, you know, kind of a next evolution of Docker builds. It's really efficient, has a lot of cool tricks that are kind of almost like hidden you know it's um it's tough to try and involve what they've got now and add new functionality that changes old functionality and so you kind of have to opt in for some of these things in different ways like either by passing a flag to say use build kit or saying like a syntax flag at the top of the file to say hey we're using newer features uh so it's kind of tough but it's so worth it there's so much cool stuff in there and so i wanted to mention a couple things that i just found out about so i think we've even talked about um the mount flag before which is a way that you

Starting point is 01:06:35 can kind of pass basically dash dash mount inside of a run statement in docker and mount files uh that uh at build time. So this is kind of similar to the kinds of things you can do when you do a Docker run and you can mount like a volume and that isn't actually part of the container is way to kind of share files. Real quick, backing up. So you said it's when you do the run, he's talking about the actual run command inside a Docker file.

Starting point is 01:07:02 Yeah. Right. The Docker file. So not a Docker run. Yeah. This is the run statement inside a docker file yeah right the docker file so not a docker run yeah this is the run statement inside the docker file yep and so it lets you mount things in at runtime and what's interesting about uh the the kinds i'm going to mention now the different kinds of mounts is that they only apply in the scope of that statement so i'm mounting uh the directory and that applies only for the things that happen later in this run statement.

Starting point is 01:07:27 So one of the flags you can pass, for example, is you can mount and set the type equal to cache. And then say, you know, do a build, for example, like a.NET build or something. And you're going to mount files that are going to bring in files and it's going to cache them locally. And so that next build that runs can potentially use those files in the same cache and this is a cache that docker uh kind of puts and you know hides away in its own little location but it's a way of persisting some sort of files between uh different builds which is super cool uh what's tricky though is that it does only apply to the same the actual statement that does the mount so it's kind of like this weird thing where these files are here and they're gone uh in in future

Starting point is 01:08:09 phases they aren't uh they don't exist in future layers of the uh of the docker file unless you explicitly copy them so you can imagine where you like do like a run statement say mount type equals cache downloads the files and copy them to my layer or copy some of them to my layer that's something you could do and we'll have a link in the show notes so cache is one of them i wanted to mention bind which is a really cool one too so we say mount bind and i can bind files that are within my docker context which means inside the folder that i'm doing the build. And I can bind files in there. And it's kind of like mounting them in, but they only exist for that layer. Why would you want to do that?

Starting point is 01:08:51 Well, it's a cool way of bringing in files that you need for builds, for example, without actually bringing them into the layer. So even if you were to delete those, well, let's not go there yet. Sorry, I'm getting ahead of myself so you you combine these files in at build time and then they don't exist they're not persisted to any layer so if you've got some like large binaries or something that you want to do for a build or maybe even your source files that you're going to do a build on um then this is a good example where you can say i'm going to bind these files and

Starting point is 01:09:25 I'm going to do my bill and I'm going to take the artifacts that build to get into the DLLs or the jars or whatever. And I need to keep those in my image, but I don't need to keep the source files in my image. And this effectively lets you replace the ad or copies with this, um, build time mount. And the advantage here is that those files aren't ever stored in the layer. And we've talked about this a little bit before, where a lot of times people don't realize that even if you do a delete of files in a Docker file,

Starting point is 01:09:56 they're still in the history because that's part of how that, you know, that Docker image came to be. So even though you may bring in a bunch of files, do some stuff with them and then delete them, the, the, um, the layers are still going to be big. So if you're doing, you know,

Starting point is 01:10:10 builds and stuff and moving stuff around, like that's part of the history and you're still slipping that data around. So, um, even though files are maybe deleted from your final image, they can still kind of end up gumming up the works. So this is an interesting way around it. That's assuming that you let the files persist from one statement to another

Starting point is 01:10:28 statement. So they would be in one layer to the next statement. If you did the something with files and then deleted them as part of the same statement. So you like you and ended double ampersanded a bunch of things together, including the delete, then they're not there. The problem here, though, is that you're not going to do an add or copy and and something else.

Starting point is 01:10:53 Yep. So this what you're describing is how can I bring those files in as part of a run statement so that I can use them in that run statement without them needing to persist in the layer. Yeah, and that's pretty cool, right? Like you said, it's not something I could do before. I think the thing, though, that's missing for context is what's a situation where I want files available in the run statement that I don't want to persist? Oh, compile. So in compile, I only care about the artifacts

Starting point is 01:11:27 of my build i don't need the source files after they've been compiled so then so like if you were use docker as part of your build system you might have previously done something like a from whatever your compiler might need and then do an add or copy to bring those files into the docker image and then in a run statement you might do like a maven package or a dot net build or whatever but now because you separated the add and the run statement, the source files persisted in that previous layer. But you're saying the only thing you really cared about was the final jar or DLL that came out of it at the end. But- So I can mount my source directory.

Starting point is 01:12:17 But then do you run into context problems where like, because you want to detect that the context changed in order to know to rebuild and uh but i'm assuming that that will still be okay because those files are not part of a docker ignore so it'll still uh it'll still detect that like oh one of the source files changed i need to rebuild this image yeah you got you nailed it so exactly the trick with bind is it only works on things in your context and that way uh you know at first i thought it seemed like kind of an unnecessary limitation like why can't i just mount some stuff from another directory but then i realized like oh it's because

Starting point is 01:12:57 if it was in another directory then the context checks something to figure out if the the layer was still valid isn't going to work so this, you can only bind things in your context. So I think you got another one, listen, their secret that I've actually used in the past. And this is a perfect example of one that is, is really good. Let's, so in the past, if you had a Docker file that needed to do something that needed some sort of credentials, right? The only way to kind of do it was to have like either environment variables that were passed into it. And the

Starting point is 01:13:30 problem with that is you can always inspect that stuff on the layers later and see what those credentials were. Yeah, even if you delete them, you can go back in a previous layer, you can go look back and there they are yeah totally with the secret and this is the important takeaway of what jay-z is talking about here with all of these is when you use these the bind or the secret or any of these you do it in that run statement it never gets persisted to any layer but you get to use all of the things that you did so you can bind a secret on your layer and let's say that you're pulling something from artifact or you're from Docker or whatever, and you have to pass

Starting point is 01:14:09 credentials that secrets used on that run command, but it's never persisted anywhere in the layer. So it's ephemeral. It gets thrown away at during the build kit build of this thing. So, um, that's the important part is nobody could go back through and then scrape the stuff from those layers that you didn't want prying eyes to see. Yeah, it was fantastic. And, uh, the next one SSH is similar, except instead of environment variables, it's for SSH keys. So you can give, you can give access to your SSH keys in your Docker build without persisting them. So if you need to do some sort of stuff with certificates or whatever that are sailing on a build machine then you can go ahead and just borrow those ssh keys from the build server safely because they're never saved

Starting point is 01:14:54 to any sort of layer and so no one can go and steal your keys later and the final one temp fs is i'm not familiar with temp fs it sounds It sounds like it's almost like a kind of a temporary directory that's all in memory. So I don't know, it sounds like something that might be kind of fast and interesting I should learn more about, but I just didn't get a chance to explore that one. Yeah, if you do Docker builds, definitely look at this stuff because it will open your eyes to some possibilities

Starting point is 01:15:24 and probably solve some problems for you especially security related that that you would want to know about yeah and some of these are like really hard problems to solve without this like it's it's kind of crazy to think that uh build kit still feels experimental but it's not you know it's it's the way forward it's the future it just in order to maintain backwards compatibility, it, you almost have to like bolt the stuff on, which is unfortunate. Yeah. I don't know. I'll have a link to the show notes.

Starting point is 01:15:50 All right. Well then, uh, my boring tip of the week by comparison, uh, well, I'm going to now, I guess what I'm dreading is that now I'm imagining my next pull request.

Starting point is 01:16:03 I got to go and update a bunch of Docker files. Um, thanks Jay-Z for more work to do. Uh, I'm just kidding. So, uh, I attended a conference recently and one of the things that, uh, came out of it, there was a whole bunch of things. So I'll probably like trickle in things, um, over time. But,. But one of the first I thought I would share was there was a Google presentation and they gave, they showed a Google cloud architecture diagramming tool. So the cool thing about this was that one of the things I liked about it was like, you kind of had this like, we've talked about like ubiquitous languages, right? But what if you had that for like drawing right to where like everybody's using the same things to draw whatever, you know, you're doing.

Starting point is 01:16:49 But also what you could do is you could draw out what you want for an environment in the Google cloud architecture diagramming tool. And then you could say, Hey, go deploy this thing and Google would go deploy it from your drawing. I thought that was super cool that is sweet yeah so uh so i'll have a link to that and then the next one that i am super super excited so this is like coming to a slack near you um it hasn't been released yet to the best of my knowledge as of this recording It's still only been announced. But Salesforce, who owns Slack, have introduced a chat GPT integration for Slack.

Starting point is 01:17:36 And there's a bunch of capabilities where the bot could automatically respond to messages for you and things like that, which, okay, now we're going to be, now this is going to be interesting where we can't even be bothered to respond to our own messages. Messaging our friends has gotten to be too complex. We need a bot to do it for us. But my favorite feature that they showed was a summarize capability. So we've all been, it doesn't even have to be Slack. I don't care if it was an old school forum or if it was a Google Hangouts or a Google Chat, if that's what your company uses, Teams, whatever. We've all been in those situations

Starting point is 01:18:18 where there's like some common thread where a bunch of people are writing and you, like an insane person, take the evening off to go spend time with your family and get some sleep or, you know, you go out for the weekend or whatever. But meanwhile, while you're in your absence, there was a whole bunch of things going on. And one of the things they showed was this summarize capability where I don't know the exact syntax of it. So I like that. I described it in my show notes as like slash chat GPT space summarize. But the point is, is that it, the bot

Starting point is 01:18:52 would summarize everything that you've missed in that thread. And like all the key points of like, Hey, you need to be here at this date or turn blah, blah, blah in. I cannot wait for this feature to become more commonplace. So an automated TLDR. Yeah, yeah. Yeah. Pretty much. Yeah. Except they decided to use more letters with summarize, but yeah.

Starting point is 01:19:22 Right, yeah, yeah. So mine, this, this was born out of some things that I needed to do this week that ended up being pretty helpful. So, um, if you're ever trying to debug anything in Kubernetes, it can be a bit of a pain, uh, depending on what you're trying to do. Like, let's say that you have a pod where a container just crashes over and over and over. If you've ever tried to shell into it, it'll kick you out like three seconds after you're in it. Like there's a lot of things that can happen. Well, I found it's actually a Kubernetes thing. There's these ephemeral debug containers.

Starting point is 01:19:56 Excuse me. And what you can do. Hold on one sec. Figured I'd clear my throat without killing you guys. So there's actually a command that you can run that is kubectl debug dash it for interactive terminal. Then name it, whatever you want to call it. In this case, I did a femoral demo. And then you pass in an image like busy box. So if you're not familiar, busy box is one that people use a lot of times.

Starting point is 01:20:23 It's basically just sitting in a wait loop, but it's so you can sit on, on terminal and shell and do other things. And then you tell it the target that you're trying to use, and this will tell it which pod to basically, um, attach to. And so it'll launch a debug container and a pod that you want it to go into. And so it's really nice if you need to get in there and take a look around, like maybe you're having networking problems and you need to see what's going on there. Maybe it can't talk to something else or whatever. So this is a really good way to be able to launch a debug container, the Kubernetes way and take a look at things. Um, I have a link in the show notes for that. And then I couldn't find a link for this one,

Starting point is 01:21:05 but this is something that somebody pointed me to the other day. So this is in Google Cloud. I'm sure that Azure and AWS probably have something similar, but this has to do with logs. So if you're taking a look at logs, a lot of times, you know, there'll be a time constraint on what logs you're looking at, right? So you go up there and if you hit the log explorer, it might just be the most current hour, right? Well, let's say that you know that there's an error that happened four hours ago that you need to see. Well, a lot of times when you're looking at logs, you kind of need to see a little bit

Starting point is 01:21:40 before and a little bit after that one error that you saw. And so you might need to see, I don't know, five minutes worth of stuff. There is a feature in Google logs to where you can say, Hey, show me entries around this time. And that around this time feature is amazing. So if you click the edit time thing down at the bottom, there'll be a selection for around this time. You can put in the time of the thing that you're looking for. Then you can say plus or minus 30 seconds, plus or minus a minute, plus or minus five minutes, that kind of thing. And so instead of you having to go in and like manually hand type in your time ranges, you can put in the one time thing and then tell it hey give me everything within a minute on either side of this and it's really nice for being able to go in and quickly

Starting point is 01:22:29 see what you're looking for and if i can find if i can find a link to it i will otherwise i'll go screenshot it and and show you where it is in the ui or maybe just an example of the query query like or or yeah yeah that'd work too huh if only he had a way yeah right uh all right so yeah yeah okay well uh we'll include the log lyrics um i'm just kidding that's all i could think of though when you described like uh you know it's like oh yeah it's what rolls downstairs loan or impairs okay so uh subscribe to us if you haven't already on iTunes or Spotify. I don't know. Maybe like somebody said, hey, check out these crazy guys. And they sent you a link or something or whatever.

Starting point is 01:23:15 Or you heard it on their car while they were driving. You can find us on iTunes, Spotify, all the major podcasting apps or platforms. And if you haven't already left us a review, like I said before, we would greatly appreciate it. You can find some helpful links at www.codingblocks.net slash review. While Alan looks puzzled at something I said. Didn't you say Spotter Stitchify? Did he say that? Yeah.

Starting point is 01:23:39 I thought he did. We're going to check the record uh but uh the gentleman from georgia does not believe he said we gotta roll that beautiful bean footage back that's interesting all right so yeah hey while you're up there at spotter and stitchify um i did not say that i swear you did i swear okay I might be having a stroke. I might need to call for medical assistance. Either you did or I did. That's what I'm really concerned about. Like, I don't know.

Starting point is 01:24:13 Hey, so while you're up there, make sure you check out our show notes, examples, discussions, and more. And make sure you send your feedback, questions, and rants to our Slack channel. Yeah, and we've got a Twitter, at ConanBlocks. You can go to ConanBlocks.net and find all our sausage links there at the top of the page. You always say sausage links, too, and I'm like, I think you said sausage links. That's not what I heard.

Starting point is 01:24:34 Man, I swear. All right, I'm having a stroke. I think so. I think it's Alan, yeah. It is. Awesome. Thanks, guys. I'm going to the doctor.

Your Ad Here

Coding Blocks - Designing Data-Intensive Applications – Lost Updates and Write Skew

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.