Coding Blocks - Designing Data-Intensive Applications – Weak Isolation and Snapshotting

Episode Date: February 20, 2023

Ever wonder how database backups work if new data is coming in while the backup is running? Hang with us while we talk about that, while Allen doesn’t stand a chance, Outlaw is in love, and Joe forg...ets his radio voice. The full show notes for this episode are available at https://www.codingblocks.net/episode204. News Thanks for […]

Transcript
Discussion (0)
Starting point is 00:00:00 5 4 3 2 2 2 2 2
Starting point is 00:00:05 2 2 2 2 2 2 2 2
Starting point is 00:00:05 2 2 2 2 2 2 2 2
Starting point is 00:00:06 2 2 2 2 2 2 2 2
Starting point is 00:00:07 2 2 2 2 2 2 2 2
Starting point is 00:00:07 2 2 2 2 2 2 2 2
Starting point is 00:00:08 2 2 2 2 2 2 2 2
Starting point is 00:00:08 2 2 2 2 2 2 2 2
Starting point is 00:00:09 2 2 2 2 2 2 2 2
Starting point is 00:00:10 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Codeblocks. It's episode 204. Boom! In your face. 204. Got it. 204. Okay, so subscribe. You're listening to Codeblocks. Did I say that part already? I'm trying to change it up. Keep it fresh. Keep it new. Keep it new. Yeah.
Starting point is 00:00:38 Subscribe to us if you haven't already. iTunes, Spotify, all those major platforms. Visit us at Codeblocks.net. You can find the show notes, examples, discussion, blah, blah, blah. And this is episode 204Z. You can send your feedback, questions, and comments to
Starting point is 00:00:54 our comments at codingblocks.net email. And you can follow us on Twitter at CodingBlocks. All right. And we also got a website www.codingblocks.net and you can find all other social sausage links at the top of the page the sausage links i think you forgot that don't you have to do like something with like a slash slash uh i forget how it goes at the front of the no
Starting point is 00:01:20 you don't to get to the website not as of of like 30 years ago. It's just keyword, keyword coding blocks. That'll work too. Sorry. It's had some mustard on my monitor there. All right. But yeah, anyway, I'm Joe Zach. Mustard on the monitor. What?
Starting point is 00:01:36 That doesn't even make sense. I miss it. Oh, cause. Okay. Now it doesn't make sense. Dang it. I was wrong. I got to rethink all my life choices over here.
Starting point is 00:01:46 Nailed it. Nailed the ending. You did. Yeah. It was a perfect setup, and you hooked me, and I missed it. Boom. And who are you, sir? Which one?
Starting point is 00:01:57 You. Oh, me? I'm Michael Outlaw. But he didn't say who he was. I don't know who he is. He did. He said, I'm Joe Zack, and I have mustard on my monitor. That's right. he is. He did. He said, I'm Joe Zack and I have mustard on my monitor. That's right.
Starting point is 00:02:06 He did. Dang it. I'm so bad tonight. Today. This morning. Yeah. I'm Alan Underwood. Yeah.
Starting point is 00:02:14 And at least we're consistent. Yes. We're going to be talking about a lot about consistency, I guess. That's what went wrong. Oh, I guess I forgot's what that's what went wrong oh i guess i forgot to change that one though we're continuing talking about uh uh transactions um getting into some isolation levels and this was a really cool chapter or a little section of the chapter uh i liked because um these are some things i've seen like around and kind of queries and like
Starting point is 00:02:41 a little literal constructs and like languages and inputs to functions and configurations. And it was kind of cool to see like, Oh, these things have like kind of formal accepted definitions or at least some of them have formal accepted definitions. We'll get into that. So first a little bit of news. You want to read some reviews there,
Starting point is 00:02:57 Ella? Yep. So from audible, we have a new review from Alison Williams. Thank you very much. And then also I wanted to call this one. I don't know if you guys saw this from audible. We have a new review from Alison Williams. Thank you very much. And then also I wanted to call this one. I don't know if you guys saw this, um,
Starting point is 00:03:07 from Twitter. Uh, we had, I'm just going to abbreviate it. John G's, uh, message to us from Twitter was very heartfelt message. Uh,
Starting point is 00:03:17 very much appreciated. Um, I don't know if he wanted like his full name mentioned. That's why, like, since it was sent as a DM rather than, you know, put on, uh, one of was sent as a DM rather than, you know, put on,
Starting point is 00:03:25 uh, one of the platforms or whatnot. But yeah. So one to thank both of them for, uh, you know, taking the time and, uh,
Starting point is 00:03:33 share their stories and, and appreciate that, uh, you know, they left good news. Good. Thanks. Definitely.
Starting point is 00:03:40 Yes. Thank you. Um, so yeah, we already mentioned we're continuing on with the designing, designing data intensive applications. So as is tradition, leave a comment and you know,
Starting point is 00:03:54 it'll enter you in for a chance to win a copy of the book. Could be physical, could be digital, could be audio, could be audio. I forgot this book was available as audio that's right take your pick we will hook you up maybe if you have a chance to be hooked up and by the way uh i looked it up there is no new version planned so i just totally made it up so okay good i wasn't really thinking that we'd be able to go through another version of this it'd be rough yeah from the top so before we get into the um the meat of the episode there i thought i would do like a little bit of callback uh you know hey throwback thursday uh call back to a previous episode or a series of episodes so if you haven't already listened um we did a series on the 12 Factor app, which if memory serves, it was a standard put out by Heroku.
Starting point is 00:04:54 And if they published it, it was at the 12factorapp.net, I think. Or was it 12factor.net or just 12factorapp? Dang it. Now I can't remember. 12factor.net. Okay. Thank you. One, two, like the number one, two factor.net or just 12 factor app? Dang it. Now I can't remember. 12 factor.net. Okay. Thank you. One, two, like the number one, two factor. Very important. Yes. Yes. So, uh, you know, I mean, they kind of, um, I don't know how, how would you phrase it? They, they, they impressed upon this world, this, this great concept of the 12 factor app. And, you know, it's kind of taken
Starting point is 00:05:23 off. I mean, we, we, there was even, um, I think at one point we talked about somebody else that had like an additional three factors or something like that. That sounds familiar. I don't remember it off the top of my head, but, um, I wanted to share a link from Google with their spin on the 12 factor app and how it applies to the Google cloud and how you can take advantage of Google Cloud to make your 12-factor app. And you go through it, and they've got some little code snippets here and there and talk about how you could implement it in their world, and even going next steps beyond the 12-factor app, what you might be able to do. So I thought it was a pretty good read,
Starting point is 00:06:13 pretty consistent with the previous discussion. Yeah. Sorry, I got a little distracted. I was just picking the winner. So don't forget to comment. We only had one comment on the last post. So congratulations super good dave uh i just messaged you that's amazing yeah so uh quick thing i want to mention is the orlando code camp is coming up march 25th it is a free conference we've we've spoken about this one before we've showed up before we've uh had a lot
Starting point is 00:06:40 of fun there past years this is the first one back after covid so it's been a two-year break and uh we're just kind of getting ramped up there's registration link so it's free if you're in the area or if you feel like uh you know commuting and you should sign up and come you get free lunch and a t-shirt and it's awesome it's uh we got 50 plus speakers this year uh we got uh you know it's it's like a college campus uh we'll have a link in the show notes so uh you should check it out and you should sign up and come hang out. Yeah. And we said it last time, like they do a stellar job down there.
Starting point is 00:07:10 So if you're in the area, definitely go. It's a great day of learning and meeting people and eating for free and getting a T-shirt. Like that's amazing. Yeah. And I think I saw on another conference i was advertising for drupal con uh down in front of their one of the things in their hero image maybe it wasn't drupal con it was some drupal conference they had a hero image and like they're like so many attendees so many speakers so many sessions average temperature 75 degrees fahrenheit
Starting point is 00:07:39 that's a good selling point for down there it's's pretty funny. Is Drupal still a thing? Really? Yeah. Even with WordPress dominating the world, like it has, that's interesting. All right. Cool.
Starting point is 00:07:52 Look up the actual, it's called Florida Drupal. Can't I'm, I'm, I'm shocked. It was hot years ago. I'm surprised anyways. All right,
Starting point is 00:08:03 well, let's kick off this thing. We're picking back up in the transaction chapter. I think surprised anyways. All right, well, let's kick off this thing. We're picking back up in the transaction chapter. I think it was chapter seven, right? And we are talking about right now, weak isolation levels. So, um, this chapter is so good. Uh, if two transactions don't touch the same data, they can be run in parallel. Right. And that makes sense. Right. Like if you're updating record number one and somebody else is updating record number two, then they're not crossing paths there.
Starting point is 00:08:31 So you're good. Race conditions occur when two different processes are trying to modify or access the same data at the same time. So if you have, you know, all three of us trying to update record number one, that's where you start running into these problems. Again, put yourself back into the mindset of like, I'm going to, I'm going to implement this database on my own. Here's just like, go back to the shell scripts that he had, uh, that the author had where, you know, you would write to a file or update a portion of the file as your database, right? And now when you talk about these concurrency issues or the race conditions where you're trying to read or modify the same thing at the same time, you could kind of understand like, okay, right now when we talk about this,
Starting point is 00:09:18 we're setting the scene for what's to come, right? Like why, why, why did we need to get to where we eventually got that we're, we'll discuss eventually, but you know, yeah. Nope. So I don't know if that helped. It was easy for you to say too. Apparently. Yeah. It was so gosh, why for you to say, too. Apparently, yeah. It was so, gosh. Why are you going to call me out like that? Sorry about that. I'm sure I will have my stumbles here. So concurrency bugs are very hard to find and to test for.
Starting point is 00:09:57 If you've ever done it in like an IDE, oh, man, it can be really frustrating. Yeah, just the bugging is terrible. Oh, man, it can be really frustrating. Yeah, just debugging is terrible. Oh, man, it's truly awful. And they say usually when you have these problems, it comes down to just unlucky timing. You were about to say something. Well, I was going to say, like, the debugging of concurrency problems, usually because you're going through it so slow,
Starting point is 00:10:18 you can't reproduce. It's so difficult to reproduce because, like, wait, you're going through it too slow. So you're naturally, like, already bottlenecking things to where it's no longer to reproduce because like, wait, you're going through it too slow. So it's, you're naturally like already bottlenecking things to where it's no longer happening. And currently it's happening. And you're like one step at a time. It's going now, right? You know, you know, calling or talking about that in particular, I haven't done a ton of concurrency debugging and IntelliJ, but Visual Studio had some really good tools for working with that. So if you're working in C Sharp or whatever and you're doing like parallel methods or concurrent type things, if you're not aware of it, look up the concurrent debugging in Visual Studio code because they are not code in Visual Studio.
Starting point is 00:11:03 They did a really nice job of it there. It was actually your tip of the week a long time back where you had talked about how to debug in Visual Studio threaded apps and concurrency issues. Man, your memory just astonishes me nonstop. I'm going to find the episode that's going to be more astounding. He's going to be like, it was episode 77.
Starting point is 00:11:34 Back in 2011. You know, one thing I've seen in some tools for testing and some frameworks, like I forget which one it was, I think it may have been Beam, Google's Beam project. Some of the streaming toolkits. They have really good test support for testing multi-threaded concurrent applications. And the way they do it is by having basically a logical clock that you can stub in instead of using whatever clock they normally use. And so you can actually kind of step forward in time and cause things to happen explicitly,
Starting point is 00:12:07 which lets you write like, you know, effectively unit tests around things like that that are normally super hard to make happen. Oh, that's very cool. I don't think I'd seen that. Yeah, I think you really have to design your system like knowing that you want to do that from the get-go in order to be able to kind of do that.
Starting point is 00:12:21 But those kind of things, you know, we talk about stepping out system calls and things like that in your test and not having your system rely on anything, including the clock. Yeah, which is very hard when you're doing multi-threaded type stuff. Yeah, plus I just don't want to.
Starting point is 00:12:35 That's not how I want to spend my time. Right. So concurrency bugs can also be very difficult to understand because multiple parts of an application can be interacting with your database at the same time in ways that you didn't expect. Right. And that happens quite a bit. Single interaction or single user interactions with the database are hard enough when you have tons of different things interacting with your database. It makes it a lot more difficult. Right. So, you know, you know that you're writing something, but now you have an inventory system talking to your database
Starting point is 00:13:08 and you have other like accounting and customer service and whatever else, right? Like all these things competing for the same things. And it can really, you think you know your system until you see this other thing happen. And then you're like, how, right. You know, I've heard stories of,
Starting point is 00:13:29 um, what was, what's the simple DB or God, geez, I can't remember the SQL light. Um, I've heard stories of like the, like how in depth and how thorough their tests are.
Starting point is 00:13:37 I've never looked at a postgresory. I've never heard anything about it, but I assume they have to be really tough. Like I didn't like implementing any new feature, all the things you have to kind of think about it and all the various combinations and features and stuff they've kind of like added on over the years like to make sure you support everything that you could possibly configure it for it's just crazy could you imagine being the person that has to review those pull requests like no did you think about these 9 000 other things just you know i know you
Starting point is 00:14:02 just want to do that one thing right there but but this impacts every other part of the system. That's why you should have pull request gates, unit tests, linting, whatever, yada, yada, yada, so that you don't have to scrutinize it visually with your own eyeballs. Let the machines do the work for you. That's tough. Yeah. Anyway, databases try to make it look like interactions happen one at a
Starting point is 00:14:27 time for that very reason basically it's kind of um if they can simplify the problem so that processes can kind of be isolated and work in an isolated way then they simplify the things that the person working on the system has to to think about so basically it's kind of like taking care of the problem or helping with the problem at a very high level. So by the time you get down to the nitty gritty details of whatever you're working on, that stuff is kind of, you know, abstracted away from you, which is really nice. Uh, they mentioned serial, uh, serializable isolation, uh, as a database guarantee that makes transactions look as if they happen, uh, serially, meaning basically one or after another. So if we send two at the same time,
Starting point is 00:15:07 if we arrange them such that one goes first and the other happens afterwards, then suddenly we don't have to worry about two things happening at once, right? Because we've artificially forced them to happen one at a time, which is a, you know, great in theory, but all of this portion, like I can can't i so can't wait to get
Starting point is 00:15:27 to the head because this this portion like i said a moment ago like it's literally like just laying the groundwork like setting the scene the next part the next the sections that's coming up for me i don't know about for you guys it was like mind-blowing like i had never even considered the craziness that they were discussing like oh here's how it works and you're like well yeah now that you tell me i can't unsee it like that's amazing like somebody yeah it's awesome so i do too so like when you talk about this idea of like trying to make it look like the, the, the forethought, the high level forethought that the database engineers over time, over, you know, years of iteration put into this thing to make it look like one transaction was happening after the next. But then when you start getting into like, well but how did you do that that's the part that's coming hey and and to be clear like what you just said like this wasn't something that was
Starting point is 00:16:33 just created like i guarantee the first database that was set up they ran into these problems they're like oh man how do we how do we fix this right and so to even get to where we are right now with your SQL servers, your Postgres and all those, like this has truly been years of, all right, how do we, how do we fix this trade-off? Because the very next thing we're going to say about is isolation is not easy, right? There's a trade-off. Your serialization comes at a performance cost, right? Like when you, when you try to make things look like they're running back to back, you're trading off this simultaneous action for, um, uh, a data state guarantee. And so you lose something there. And so where they've done all this stuff, all this magic that outlaws excited about, and, and probably both me and Jay Z as well, it's because they take their storage systems and they've figured out how to do things in ways that just make a lot of sense.
Starting point is 00:17:31 And we'll talk about it a little bit. But this form, though, the serializable isolation in this form that they're talking about, they're saying because of that performance cost that most databases choose to not use this form. And we'll talk about the reasons why not and um there's some pretty cool solutions they've got for kind of getting around it they're exciting i know we keep like kind of hinting at what's coming up but it's just really exciting um so like we said most databases use weaker isolation levels things that are not as good as serializing but they protect against some, maybe most concurrency issues,
Starting point is 00:18:06 but definitely not all of them. And these things aren't just theoretical bugs that somebody can kind of prove on a whiteboard or whatever. These are, you know, things that have happened throughout history have resulted in huge financial losses, which sometimes you imagine like banks, governments, you know,
Starting point is 00:18:17 you get auditors involved in like that zone level of terribleness, you know, corrupted data, data loss, things like that, you know, all really bad stuff., things like that, you know, all really bad stuff. And these are things that these databases like, no, you know, happens very rarely, you know, but it can happen. And that's things. Could you imagine, I'm just trying to put myself into the mindset of like, imagine you work at some company, doesn't matter what the company, you work at company Acme, but Acme has been very profitable at whatever it is that they're producing, the widgets they're
Starting point is 00:18:49 producing. Um, obviously they're, they're number one seller are the rocket propelled skates. Um, but the, uh, uh, and anvils. Um, but imagine like, you know, one day your boss comes in and he's like, Hey, there's this bug in your software. You got to fix it. Right. And you're like, well, it's not in my cell. Like what? And you know, when you're, you're beating your head up against the wall, you know, you
Starting point is 00:19:12 can't figure out like, well, how did that, how did we lose all that money from the customer when I swear, like I did everything right. And then you find out that the problem isn't in your code at all. It's in the software that you're using, i.e. the database that you're using has a bug and it costs you, you know, some untold amount of money that they're like, oh yeah.
Starting point is 00:19:35 But you know, the good news is we fixed that for generations to come in this new version. And since you are on our support plan, we will give you a free upgrade to it. And you get one month free. Yeah. But you get, but that's if you do it our support plan, we will give you a free upgrade to it. And you get one month free. Yeah. But that's if you do it this year, though, because your subscription runs out at the end of the month. Right.
Starting point is 00:19:54 You can imagine, like, you know, you have some sort of big data loss or corruption problem. Something really bad happens and you, like, call in, like, maybe, like, the MySQL family. And they, you know, fly Mrs. and Mr. MySQL in to kind of look at your transaction log. And they kind of go through it. Go through it. And they're like, yep, we see what happened here. and Mr. MySQL in to kind of look at your transaction log and they kind of go through it. And they're like, yep, we see what happened here. That stinks. All right, here it is. We highlighted it.
Starting point is 00:20:12 Have a good day. There's nothing we can do about it. This is one of the things we just can't do anything about. I just think it's so sweet that Joe thinks that MySQL was written by a husband and wife team and they work together and they go traveling to fix the bugs together. Like everything's together. Like,
Starting point is 00:20:26 you know, you've pictured like just such a sweet, happy couple. Right. It's totally how it worked. Yeah. It's totally how it works. Yeah.
Starting point is 00:20:35 Hey, so I don't know about you guys. When I read this in the book, I was like, Oh man, I'm sure I was this person at one point. Um, they said that there's like this common theme that if you're doing like financial transactions
Starting point is 00:20:47 and things that you absolutely need to make sure are perfect, you have to be using a relational database because they have the transactions. But what was so mind blowing for me here is most databases you could use weak isolation. And for that reason alone, there are things that can go wrong so you are not guaranteed 100 that that transaction to where you move money from jay-z's account to outlaws happened exactly the way it should be if you don't recognize exactly which isolation level you're using and that that, to me, was like, oh, man. I'm pretty sure, Alan, that during this series, maybe within recent episodes,
Starting point is 00:21:35 I think you've said something very similar to that. I probably have. I'm pretty sure I recall as we were talking about document databases versus there was an example that was given about document databases and I think you had mentioned something like no, for financial stuff you probably want transactions. Oh, I probably did say it.
Starting point is 00:21:58 Yeah, that's what I'm saying. When I read this I was like, oh man. This is bad. I used to imagine myself a couple years ago sitting in a big fancy boardroom, and someone's like, what database are we going to use for our new banking system that we're going to write? We're going to disrupt finances. And I'd be like, I got this relational. Look, one that's got transactions.
Starting point is 00:22:21 Boom, nailed it. Asset compliance. I'll send you the invoice. That's so funny. boom nailed it acid compliance I'll send you the invoice that's so funny not even like a specific vendor just relational that's where we're going
Starting point is 00:22:38 dbengine.com go pick one oh man make sure it's got that transaction checkbox. Now you know why you should listen to us. Well, honestly, to be fair, the amount of bugs I write, like just the amount of bugs my application code is dealing with the database, I mean, you're going to deal with like a thousand bugs mine for every one of these that happens. Man, that's so insane.
Starting point is 00:23:02 All right, so let's get into the first one here, read committed. So this is an isolation level, right? And it gives you two guarantees. And it's, man, it's so important. So if you've never listened to another thing that we've ever said and actually paid attention, you should probably listen to this stuff right here because this matters a whole lot. So, again, read committed has two guarantees. When you're reading from the database, you will only see data that has been committed. That means there's what's called no dirty reads. You can't read data that is currently in the
Starting point is 00:23:37 process of being written, but hasn't been committed. All right. And then the second is when you're writing to the database, you will only overwrite data that has been committed, meaning there'll be no dirty rights. So what that means, again, is if somebody is in the process of writing to a record, you cannot actually write to that same record until that first one was committed. Right. And that's what it means. No dirty rights. You can't write something that actually hadn't finished writing already so very important to understand those two things okay again i'm gonna say it again put your put your hat back on your thinking cap right you are trying to write this thing from scratch how might you implement this right and you haven't gotten to the next section right yeah i did this i totally did this yeah so you haven't gotten to the next section, right? Yeah, I did this. I totally did this. Yeah. So you haven't gotten to that section. So you're like, okay, well,
Starting point is 00:24:28 how could I implement this? Like if I'm writing it to the file and like, if you're only talking about like updating a single value, that doesn't sound so rough, right? But if you have multiple indexes, that means there's multiple files. So you've written data to one and you've saved it, right? So does the mere act of saving it count as committing and then something else can now read that thing while you're busy writing to the other file? But then, oh no, that write failed. So you got to go back and undo the previous one, right? Those are the types of thought processes that are running through my head as I'm reading this. Yep. As I'm reading it, I'm thinking like, oh, wow, this is a really good solution.
Starting point is 00:25:13 You know, so when you're writing the database, you only write data that's been committed. If you're reading and it's not committed, it's just kind of like you read it a half second earlier. Like, who cares? It's fine. This is the solution. This solution is perfect. You solved it. Good job job database people
Starting point is 00:25:25 then you read to the the butt where they go on and talk about how it doesn't protect against like uh incrementing race conditions like two processes try to read a value and write at the same time there's a nice example there in the book where basically it says like you know someone kind of grabs the value but by the time they get around to trying to write it, you know, still got an old snapshot. And so the, the, the math there doesn't work out. I'm not my specialty,
Starting point is 00:25:49 but yeah, it just, well, I mean, and there was also the, like the mention in the last episode of the transaction manager, right? It was in one of the footnotes that,
Starting point is 00:26:01 you know, isn't in the digital version, but, but, and I was thinking like, okay, well, you know, It was in one of the footnotes that isn't in the digital version. And I was thinking like, okay, well, you know, maybe the transaction manager can help do some magic in this scenario to help, right? And I was totally going at it from like maybe a different route than what's coming. But the way that where like the industry landed was like, it's one of those things this goes back to your point earlier at the start alan where you're talking about like there's things that you see is like configuration or whatever and you're like you just kind of gloss over you're like yeah okay whatever like i'm not
Starting point is 00:26:34 i'm not a dba i don't whatever i don't care like you know like you know your thing and i and i that's what i love about you you you know that thing. And I got to go move the logo three pixels to the left. I'll be back. Right? Yeah. You know? So, you know, there's things that you just take. There's so much technology.
Starting point is 00:26:55 You can't help but take for granted, like, some of the things, you know? Totally. That's one of the things, like, going back to, like, our New Year's thing, I think it was. Like, the things that we love about the show like the forced learning because we you know we forced ourselves to go back and no longer take for granted some of these concepts and this being one of them right so yeah i've seen this several times i remember things saying like sql configurations you know you can kind of like set the isolation level uh which at the time i you know i basically went to google what is the good isolation level like what's the fast one or
Starting point is 00:27:25 what's the best one? That's how I would give it up before. I would see read committed, great. Or if I was trying to get faster, I would look for another one. Read uncommitted, fine. I don't care in this case. And so that's the laser focus I was on at the time and just moved on quickly. So to kind of read
Starting point is 00:27:42 about it here, be like, oh, it actually meant something. It wasn't just some crappily named you know uh config that microsoft set you know and it's just been kicking around it's like this actually like means something to databases in general dude that's totally it what you just said is it's like you see all these configurations it's like man why did they name it this like what what does that mean but hey real quick before we move on from this because joe he mentioned it but he kind of glossed over it. And this is important, this incrementing problem thing. Just imagine the three of us read the same record at once, and we all get ID one back. I go to update that thing. And now it should be ID two, right? Like
Starting point is 00:28:21 we are, I added something and now it's ID two. Well, Jay Z and outlaw read that same record at the same time. So they both have ID one as well. And so when they go to increment it, they're each trying to set it to two as well, which is not right because it conflicts with what my new value is. And that's the problem with read committed right there is everything could have happened just fine right that record was written it had id1 but then multiple people got that record and when they went to write again they didn't do anything wrong but now you have a problem with this incrementing thing so that's that's why it's important to understand the two guarantees that recommit has. And they actually talk about this later. I think we talk about it in this episode.
Starting point is 00:29:08 Yeah. They call them lost updates. And the reason is because when we all read the value, let's say like Alan, you're actually updating it out on, I read it. We read it while it was still being committed, hadn't been committed yet.
Starting point is 00:29:20 So we saw the old value of one. Then I go and set it to two, you know, one greater. Meanwhile, Alan, yours then i go and set it to two you know one greater uh meanwhile alan yours finished and so now it's two i just reset it to two outlaw right at the same time as me he goes and sets it to two the real value should be whatever two plus two is i don't know but uh yeah so we consider those two to be lost updates and nothing wrong happened like everything would happen as was designed it went it didn't take the value that was being written because i hadn't been committed it added to it
Starting point is 00:29:48 everything was right the ultimate value wrong right and now going back to the financial uh you know use case for databases and financial transactions now imagine that that's not just an identifier but a balance right yeah like maybe i was supposed to get more money added to my account and now suddenly i'm not i'm gonna be a little bit upset yeah hey and how do you figure out what went wrong there you know like who like oh yeah man hey and this is actually that particular example is coming up in this next section which is snapshot isolation and repeatable reads this is where we're getting into the fun parts. Yeah, the meat of this.
Starting point is 00:30:27 Like, it's funny. Is it not sort of ridiculous that we get excited about this kind of stuff? Because, I mean, I don't know. We've been using these technologies for two decades? A long time? Yeah. And now you're like, oh, man's there's the magic like it actually means something it's like that in all of life though like like because this is the beauty of the
Starting point is 00:30:52 internet now right like it's so easy to just get lost into like how something works they could be totally ridiculous it doesn't have to be something as complex as like how does a plane stay in the air right like how does that work right even though the wings are wobbling don't ever if you have a if you have a window seat and you're on a wing don't just go ahead and close that because you don't that is creepy where you see those wings flapping wait why is that bending this isn't a bird oh no this is like that episode of twilight zone um but yeah like there's so many crazy things like how how does the heat pump work you were i think talked about that one time recently you know and you
Starting point is 00:31:29 start getting into thermodynamics and you're like oh my gosh i never did think about that yeah so like heat does track cold and cold does attract heat yeah you know like there's just weird little things and so that's why like there's things that we've, we, you know, as anyone's going through life, there's things that you take for granted. But then when you take the moment to like appreciate it and like study it, you're like, Oh, that's how that works. That's what, that's awesome. And so this is, this was one of those moments for me. Yeah. So cool. All right. So snapshot isolation and repeatable reads, these address read skew. And so an example of a non-repeatable read here is sort of what Outlaw was talking about. So an example is given where a customer has two bank accounts, gets her balance of account one. And then some moments
Starting point is 00:32:21 later, after a transfer of $100 from account two to account one, the customer gets the balance for account two. Well, the problem is the customer has an old value from account one and a new value from account two. So it looks like the customer is missing $100, right? And this is because there were multiple different things reading two different places while operations were happening, right? So if you didn't follow that completely clearly, just know that money's moving from one place to another. And somewhere in that, you've read an old value of that place where money was transferred out of or whatever. And so now it looks like you're missing some money, right? Whereas if you just refreshed it after that,
Starting point is 00:33:08 they'd both show up and everything would be good, right? You'd see that your money had moved. So yeah, the trick there is that the client, the person on the webpage ended up issuing two queries. Query A hit account A, query two hit account B. And by the time we got to the b the transaction hadn't gone through but it hadn't gone through the first time so yeah the numbers were off just like i said and what's crazy is going back to the point that you two were making earlier this is not
Starting point is 00:33:35 a bug in fact this is acceptable this is this is i know you're like wait how can that but for recommitted isolation if that's the isolation level you're going to use for your database this is accepted isn't that crazy so this is why like you gotta know like what your data patterns are going to be like what what you're trying to accomplish to know like which one of these isolation levels you need to use. And you have to log in and say again, I was going to make a dumb joke. Go ahead. Oh,
Starting point is 00:34:10 please do. Yeah. I want to hear a dumb joke. That's why whenever I log into my bank, I always give a couple of refreshes. I'm like, that can't be right. Come on.
Starting point is 00:34:18 No way. That's pretty funny. Oh, I skewed, but no to too that like what we just mentioned was a very temporary state so the chances of you seeing it are very slim but they can and things look really bad right so this can happen yeah and on red it happens all the time. Happened today to me, actually. Oh, Reddit's awful about it. We've talked about that before.
Starting point is 00:34:48 You can hit refresh 100 times and you'll get 20 different counts on the pages. Yeah, but I think that would have to do with the distributed nature of that particular database. So I don't know if that would be an example of this. But we'll say that with Joe talking about the bad jokes, it's got me thinking about, you know, dad jokes. And I, I want to just say is like,
Starting point is 00:35:08 I don't know if you two like to go camping, but it is intense. I love that one. Uh, uh, Hey, you never go on a date with, uh,
Starting point is 00:35:18 with a pastry chef. You know why? Oh man, am I going to say this wrong? Can't figure it. You always get dessert deserted i don't think i said it exactly right dang it how do you look good how do you organize a space party space party you plan it that's very good yes wow yes mine was the best though yours was pretty good i like it you notice that's a tough crowd best, though. Yours was pretty good. I like it.
Starting point is 00:35:45 You notice that's a tough crowd because he only gave you a pretty good. You know, whatever, man. I'll just say, like, whenever I go around town, I always keep my guitar in the car now. Yeah. It's good for traffic, James. Nice. That one's from Mike RG. Thank you, Mike RG.
Starting point is 00:36:03 Very nice. Yeah. So we mentioned the person querying twice. And we said it doesn't really happen that often, but it stinks. Here's an example where it's really important that it not happens. That doesn't happen. Imagine you're taking a backup of the database. And by the time it gets to row one, the transaction is not committed.
Starting point is 00:36:23 It gets to row two or table two, whatever, if you're doing a multi-object transaction. By the time it gets there, the value has changed. And so now the backup that you've taken has kind of happened inconsistently. So different parts of the database tables are in different states based on where the backup was at that time. That's a terrible, terrible problem. It doesn't reflect the database at any point in time, right? It's this like weird kind of mix of things that never actually exist in reality. That's not a backup, right?
Starting point is 00:36:54 We expect our backups to be a snapshot. Perfect. Perfect, yeah. I don't even want to call it a snapshot. I expect them to be perfect. Yeah. Yeah. You can just imagine like a naive person to a backup, be like, well, okay, I'll't even want to call it a snapshot. I expect it to be perfect. Yeah. Yeah. You can just imagine like a naive person to a backup,
Starting point is 00:37:06 be like, well, okay, I'll just query every table and write it to a file and we'll call it backup done, right? No way. If you've got any real traffic on that thing, it's going to be all sorts of out of whack. I mean, this is why like it's totally a tangent here. Oop, tangent alert. But if you're going to have any kind of disaster recovery type of plan,
Starting point is 00:37:27 right. And backups would be a part of that. You can't just like have the plan and you can't even like just execute it. You have to exercise it. You got to go and make sure that, okay, we have to restore. Did we restore what we thought we restored? Did we get what we thought we want? Is it in the state that we wanted it? Okay, yeah, okay, fine. Then we're good. Otherwise, you can think like, no, we got a disaster recovery plan and then we back up the database every night. And then you go to open up that database
Starting point is 00:37:51 and you're like, hey, why is the database zero bytes? Why is the backup zero bytes? Oh, yeah, there was a bug in the backup. Oh, also, I meant to mention OLAP queries. So like we've talked about OLAP a lot, several times. Basically, analytical queries that do, like, big counts. So, you've got some data warehouse, and you want to see, you know, all your sales and a bunch of different fancy math things. That's going to take a while to process, right?
Starting point is 00:38:17 You imagine if, like, the results were wildly skewed by whoever's, you know, shopping now or whatever. Like, that would be crazy. That would be very bad. So we've got techniques for mitigating that, which we're going to get into. I don't want to jump ahead yet. But I really want to, though. So here's the spoiler alert to these problems. Snapshot isolation is a typical solution to these problems, snapshot isolation is a typical solution to these problems.
Starting point is 00:38:48 And this is where it was like, okay, I've definitely seen snapshot isolation as like a term in a configuration thing. You're like, well, yeah, of course you're going to,
Starting point is 00:38:58 whatever that is, you're going to give me two of those. I want two of those. Do you have it in red? I want two of them in red. Yeah. And if you could put it on that server, that'd be great. Do you have it in red? I want two of them in red. Yeah. And if you could put it on that server, that'd be great.
Starting point is 00:39:10 But you don't really think about it, right? But now we're going to get into thinking about it. Yeah, and think about how many, like, we talked about replicas a lot, right? And what it meant to kind of take a, you know, add a new replica or take one down, whatever. We talked about it, but I didn't really think about the mechanisms beneath. You know, of course, it just makes sense. Like, well, first you take a snapshot of the data and then you start tailing the log but what if the snapshot takes 30 minutes what happens to all the data that has come in since you started like how do you know when to tail like there's all sorts of questions there that we didn't really
Starting point is 00:39:37 dive into at the time but you know in in principle that's absolutely it from thousand you know foot view like that's it that's what you do but in practice like if you think about how you would implement that in the file uh that's that's a bit tougher so this is a very popular feature across i would have to venture it has to be on all the major database platforms but at least in the book they called out like postgres oracle sql server mysql uh this other one i don't know how you're supposed to pronounce it nod NODB? Yeah, so it's actually MySQL using the NODB engine behind the scenes, right? Okay. Oh, I'm sorry. I misread that. Yes, you're right. Because MySQL has multiple different engines that you can use. So in a nutshell, each transaction, when it goes to do its reads and writes, it's going to read
Starting point is 00:40:22 from what they're calling a consistent snapshot. It's getting a snapshot of the things that it's reading. So in the example that was given before about the three of us each reading from the table, we would each get our own snapshot that we're reading and writing to, and then the database will figure it out later. Right? And this is where the magic happens. And then the database will figure it out later. Right. And this is where the magic happens. That's like a high level kind of like, don't get into the details yet.
Starting point is 00:40:54 Don't worry about the details yet. Kind of like explanation as to like what's happening here. But then that way we don't overwrite each other. We don't overstep each other in the things that we're reading and writing. Right. So, so now how do they make that happen? If we were to take that to an example, right? If, if Joe and outlaw are writing to the same area and I'm trying to read, it has to know that for Joe, he's isolated from outlaw. It has to know that for me, I'm isolated from Joe and outlaws rights. So, so we sort of have three different states that we're working with. And, and that's why it's important that what he said, transactions read
Starting point is 00:41:39 from a consistent snapshot. So each one of us will have our own consistent snapshot that we're working with, which is absolutely insane. Now, I don't know, but in my mind of imagining that the transaction manager plays a role in some of this decision, you know, like, you know, of deciding that like, hey, these two things, you know, need this particular state, this one needs that particular state, this other one needs this state. I'm imagining that it's maybe at a high level part of that orchestration. I could be wrong, but that's why I brought it back up earlier. It was just to think that there are things happening at the connection level to make
Starting point is 00:42:19 decisions of how to manage those interactions. Looking through the show notes uh it looks like we don't get there i didn't do the show notes today but the book goes there the book gets into how that works and it's super cool so that might be because this is a longer chapter yeah there's some meat here there's some meat here but i'm so excited excited for you to hear about the standard techniques for doing it because it blew my mind. I started thinking about how I would implement it and I was like, okay, I guess I can kind of. Am I in the ballpark? Right.
Starting point is 00:42:57 That sounds good. I would say no. Dang it. So nothing to do with the transaction manager. The transaction manager doesn't play a part in it at all it's way dumber yeah really it's not very yeah i just mean it's like it's really elegant you can tell who's read ahead yeah yeah you're you're you're gonna see i'm a right ahead kind of guy you're a read ahead kind of guy so and that's a database joke hey nice
Starting point is 00:43:27 so high level uh we can talk about uh how slight snapshot isolation is accomplished and basically the deal is that you usually write locks to stop dirty writes so you kind of say like um i don't know how you say it. Lock the rose. Sure. Yeah. And like you, so the books mentioned that locking the rose is usually the kind of normal strategy for kind of handling that. I couldn't really think of any other way to do it.
Starting point is 00:43:56 I don't think, I don't remember if they mentioned another one though. They didn't go deep into it here anyways. Yeah. But yeah, they just say that you lock it, right? You lock something for a reader, right? Yeah.
Starting point is 00:44:08 So basically saying like, hey, someone is writing to this. You just got to wait a minute or you got to give up or, you know, whatever you need to do. I don't care. This is not in the state where you can do any writing to it. So this is kind of like the opposite approach that we mentioned of read committed. We're basically saying you can't write unless it's in a good spot and what's nice about this is that by only having the the locks on the rights it means that reads never block rights and rights never block reads they're totally independent all right um so because there may be multiple transactions going on at once, there may need to be multiple copies of the database objects in play at once.
Starting point is 00:44:50 This is referred to as a multi-version concurrency control. Yes. Did not see that coming. Yeah. And so Outlaw just did the mind exploding thing. And this is where things get really cool. So what they just said, right? Like this record that you're, that you're trying to update here, we each have our own copy of it more or less. Right. And that is where the magic starts.
Starting point is 00:45:13 Yeah. I never imagined, like we've talked about B tree database. We've talked about LSM. So when we talked about the LSM, as we said, it's, it's normally kind of Mark records as zombies and kind of keep it pending
Starting point is 00:45:21 until you can pack the logs. And I was fine with that. And then we talked about B trees and how you can kind of go in and you could do your rights in place but you know often it made more sense to just kind of create a new row or new page or whatever and kind of repoint the indices and i was fine with that but i never thought about it keeping track of multiple versions and being able to tell you what the data was at any particular kind of point in time yeah so said another, like go back to that, that B tree example and every leaf of that tree is a separate file.
Starting point is 00:45:50 And the three of us happen to be doing queries against the same particular leaf and it's maintaining, Oh, there's well, because of this state, there's these three each have their own copy of it. And maybe a fourth one coming later. Oh, well, Joe has committed his.
Starting point is 00:46:08 So his is out. But Alan and Michael's are still in play. So there's the there's the updated version. And then there's Michael and Alan's. They're still in play. Like the management of now the the leaf nodes of the tree has, has gotten extreme. Yeah. And actually it is in the notes is coming up this section.
Starting point is 00:46:29 I don't know what I was thinking. Okay. We're going to, we're getting there very soon. So, so this next part, they say they actually call out the difference between the read committed and the snapshot isolation is for the read committed.
Starting point is 00:46:41 Each read has its own snapshot that it's reading from. Whereas with the snapshot isolation, all the reads are coming from that same snapshot that was made. So instead of each of us having our own read that we're doing, if you're in in snapshot isolation you're always looking back at that same version that you looked at the first time yeah which is nuts and how do you know when it's okay to uh discard older versions right like when does that get cleaned up how's that process work well this is where i was thinking this is where i was thinking that the trans mac transaction manager would come into play like okay he's done with that that set of tables like what i was thinking of like if you if you you had the versions of each snapshot,
Starting point is 00:47:28 of like, say, one leaf of that tree, right? If each one was versioned, then the lower number, once the connection to that lowest number is no longer in play, then that's the one you know you can get rid of. Okay, yeah, yeah. You're on the right track then.
Starting point is 00:47:44 I misunderstood what you're saying. Yeah, so basically that's exactly what it is. There's an incrementing number that's the one you know you can get rid of. Okay, yeah, yeah, you're on the right track then. I misunderstood what you're saying. Yeah, so basically that's exactly what it is. There's an incrementing number that's kind of global for each transaction. It's got its own ID. And when it says like, okay, let me look at all my open ones and the lowest number there is 14. That means I can discard everything 13 and below.
Starting point is 00:47:58 Yeah, okay. Yeah. Hey, and so while I was putting together some of the notes for this, I was curious. They showed a little example of Postgres' implementation, and I'm not going to try and explain it because it just confused the heck out of you on the show. However, I did go and I was curious, like Postgres is an open source database. They have this stuff out there. They have a readme for how they do this stuff.
Starting point is 00:48:26 Man, if you go open that thing, it is ginormous. The read me it it's talking about all the different ways that they handle some of this stuff, man. If you, if you want a mind bending exercise, go read this link. Um, so anyway, put that out of the way. You ever feel like if we had all of this, all of these resources at our fingertips, you know, when we, when we were in college, for example, or university, like would we have taken advantage of it? No, no, totally not. No, totally not. That's the reality,
Starting point is 00:49:05 right? Like you want to say like, Oh man, if only I knew then now what I, if I only knew now phone you wait, whatever, you know, this thing I'm trying to say.
Starting point is 00:49:14 And if I knew it, I would have said it. And then I would have knew it. And then it'd have been amazing. And I would have known it. And I would have said it. Just, just what he said right there.
Starting point is 00:49:25 Clear as mud. so on this postgres thing their implementation basically uses some metadata fields on a row um there are two specifically created by and deleted by which contain the transaction ids so here's the cool part right like? Like you query this Postgres row, you never see those fields. These are internal to the database system tracking these things, right? If you were to delete a row that deleted by field is updated,
Starting point is 00:49:58 the row is not deleted at that point in time. But garbage collection will pick it up later and remove it physically from the table at a time when it's deemed that it's no longer accessed. So it's doing a logical delete on a row. So your table is growing. You don't see it because if you ever query it, you're only getting back to one record that exists in the table. But there might be 10 other versions of that record that are just waiting to be garbage collected. And they have metadata behind the scenes that it's allowing Postgres to do this
Starting point is 00:50:31 type of thing. I just imagine this going kind of hand in hand with, you know, earlier discussions from this book where it was talking about like you had tombstone, the record was the, was the terminology that was given. You would tombstone the record, but then something like in a write-ahead log would have the more up-to-date version of it. And eventually, the compaction, as you call it, it might represent a whole new file. But now that we're talking about the snapshot isolation, in my mind, I was kind of thinking like, well, I suppose what you could do is, uh, you know, just create a whole new file altogether. And then you're only writing to it
Starting point is 00:51:14 because rights are much faster than, uh, edits to the file. Right. So you could just rewrite all the, you know, like, let's say that that particular leaf of the tree had a total of 10 rows of, of, you know, 10 records of data in it, then you could just write, here's the 10 latest versions of it. Once you get to like, um, I think in the example you gave was like transactions 13 and lower can be removed, right? Then you know that like, okay, whatever those transaction versions are that can be written out and preserved. Right? And now that the versions of the leaf that still had the tombstone versions can be deleted because those transactions are gone, right? Does that make sense? And it very may well happen like that, right? Behind the scenes with the garbage collection
Starting point is 00:52:01 and all that kind of thing. Well, yeah. I mean, really more to the point is like we're building on concepts, right? We started with the idea of tombstone. We started with the idea of like, hey, what if we have this right ahead log? And that actually built upon the idea of like, hey, what if we like manually did this thing in a text file with some shell scripts, right? So like the whole book, the author, he did such a fabulous job of just like introduce concept. Let's build on it, introduce concept let's build on it build on it build on it and we might might shelve that for a minute we're going to introduce another topic before and then
Starting point is 00:52:31 you know you know you're 200 pages into the book and all of a sudden boom we're going to come back and we're going to build upon this other concept yeah it's really good hey and for any of you that that are sitting there going okay well you have a created by and a deleted by, what about an updated? If you've ever spent any time doing triggers in like SQL server or anything, there's typically not a deleted by or an updated by, it's always deleted by because the way that they handle it is if you update a record, you basically deleted that old data, right? And you inserted new data. It treats it as like an insert instead of an update. So that's why you'll see the inserted and deleted and you don't get an update, right? And again, go back to what I just said.
Starting point is 00:53:17 Writes are faster than the updates. So that's why from a performance point of view, you're going to prefer just, you know, it's easier to just append a new record to the end of some right ahead log, which might even be at like a, you know, a, an index level or, or,
Starting point is 00:53:32 you know, that, that snapshot level. And then, and, and then, you know, persist it later,
Starting point is 00:53:40 but, you know, go ahead and tombstone the original. Nope. So do we, it later, but go ahead and tombstone the original. Yep. So. Do we let Joe do it? Do we let him out of the cage? No. How are you feeling tonight, Joe? Not good? I don't want to do it.
Starting point is 00:53:56 He's grumpy. Grumpy Joe. Yeah, I don't want to. Didn't you have some great NPR voice the last time that... I think I did. What was the... I tried to go for shock jock, I think.
Starting point is 00:54:10 I don't know. He did the double shock jock thing and that was confusing. So this time, if you haven't already, Outlaw mentioned at the top of the show that we got a couple of reviews and one of them that we got on Twitter was like truly amazing.
Starting point is 00:54:27 Right. Like he basically said that we helped him change his career. Right. Like while he's playing video games, which is amazing. Like if you can play video games, listen to us and change your career and potentially change your life. That's amazing. And it was truly killer to to read that and see that we have helped somebody. So if you find that we've helped you out and you find yourself with a minute or two, please do, if you get a chance, leave us a review. We have some helpful links at codingblocks.net slash review. And we really do. We say it, but we absolutely mean it. We love it when we get the feedback and we see what you guys are doing or learning or if there's certain things that you like and love and all that kind of stuff.
Starting point is 00:55:11 You know, please do leave it. We read all of them. So thank you very much. All right. Well, with that, it's time to head into my favorite portion of the show. Survey says. All right. So this one, let's see.
Starting point is 00:55:30 This is what? Episode 204. So according to Tucker's trademark rules of engagement, JZ, you are up first. I feel like I'm the Atlanta Falcons of this particular challenge. I didn't know Georgia had Fal Falcons of this particular challenge. I didn't know Georgia had Falcons. They do. Yeah.
Starting point is 00:55:52 I thought it was like California birds. I don't know. Yeah. I don't know. I was going to say you were like the 1980 Braves. That'd be pretty good too. Yeah. Okay. So, Joe, you are up first.
Starting point is 00:56:16 And yours, to start us off with, is going to be name something appealing about working from home. Hmm. I mean, pants, but it's the lack of pants. I don't know how to say that executive loungewear that's my answer executive loungewear that's definitely what the people said okay okay i i mean basically you're just talking about like a dress code kind of what you wear yeah yeah yeah yeah all right or lack of yeah okay uh no commute commute okay all right well uh let's see here let me break out my scoreboard here jay-z went first no clothes slash dress code number one answer on the board ding ding. 28 points. You gotta be freaking kidding me,
Starting point is 00:57:06 man. I can't win. Alan. So hard. Alan. I can't win for the life of me. Underwood, uh, says no commute,
Starting point is 00:57:17 which I'm going to, I'm going to give you this one as great commute came in as number three on the board. So there's 17 points on the board for Alan. Flexible hours was the number two answer for 26. See family, number 410 points. Avoid coworkers was number five for four points. You know what?
Starting point is 00:57:42 If that's what you're looking for, maybe it's not even the working from home part you should consider. Just a whole other job. Save money. Number six with three points. Avoid boss. Again, that one's right up there
Starting point is 00:58:01 with avoid coworkers. What number are we on? Seven for three points. Eight is bathroom. Anytime to no babysitter. No babysitter is last for two. Yeah. Childcare can get expensive, man.
Starting point is 00:58:17 Like that one. You can appreciate that. All right. So let's see. How about Alan? Hey, wait, wait wait i have to point out the irony here as i said i feel like the atlanta falcons and we went to the super bowl and played against the new england patriots they beat us 28 to 3 and i'll point out that jay-z got 28 points first round of the sign it is a sign i'm going down i'm going to lose hard
Starting point is 00:58:46 okay uh make sure we're okay name a job alan you're going first name a job that requires a lot of education uh doctor doctor okay dang. Dang is not on the list. I'll give you another go. Okay. I mean, I want to say lawyer, but it's not going to be doctor. But I also have nothing else. So let's go with lawyer. Lawyer.
Starting point is 00:59:15 That's going to be number two or number one. So first of all, it wouldn't be proper if I didn't introduce myself as doctor. Doctor. Doctor. Doctor. Doctor. I love that scene. Do doctor, doctor, doctor, doctor, doctor. I love that scene. Do you know in the movie? I do not.
Starting point is 00:59:31 Joe, do you know the movie? Ah, come on. Spies Like Us. There's the scene where like. That's been years ago. Is that the kids movie? Dan Aykroyd and Chevy Chase are meeting the other. I think it was like Russian soldiers that they were meeting. And they were all. They had lied and said that they were were doctors and the other people that they were meeting were doctors.
Starting point is 00:59:49 And so there was like doctor, doctor, doctor, doctor, doctor. You'll watch it now. Whatever. OK, Alan, this is gonna make you feel a lot better. All right. Doctor number one answer on the board. Look at me. 31 points.
Starting point is 01:00:08 That's not enough points. Alan taking a lead. Very nice. A moment. All right. Now, Alan, this is not going to make you happy.
Starting point is 01:00:18 Yeah, no, no way. Lawyer is 30 points tied for number one at 31 points. No way. Come on, man. Wow. Not fair.
Starting point is 01:00:30 It's like a net no gain for you, Alan. So ridiculous. At the end of the day, we might as well not even have that question. Number three answer was teacher. Number four, college professor. Oh, teacher was 21. College professor teacher number four college professor oh teacher was 21 college professor number four for 11 and nurse is the number five answer with four points and this is why like you know they didn't talk to anyone who uh is into software development because nowhere in there did did software developer come up yeah Yeah. We're not, we don't, we're not educated.
Starting point is 01:01:05 Yeah. Apparently, by the way, here's some education for you. Uh, there are five types of Falcons in the U S and, uh, a lot of them can be found in Georgia actually.
Starting point is 01:01:14 Oh, very nice. Yeah. Very nice. All right. Um, Joe, I will do this like we've done for the past few since,
Starting point is 01:01:24 uh, this is your episode to go first on. You get to pick the final. So your choices for the final question are something about, let's just say space, online shopping, or household items. Online shopping. Online shopping is your choice. eBay. Got this. eBay?
Starting point is 01:01:51 Yeah. I mean, yeah. A 1990 called. All right. Name something you might prefer to buy in person rather than online. That's not fair. I can't think of anything. Wow.
Starting point is 01:02:18 Oh, no. In person. Something you might prefer to buy in person rather than online. All right. I got to get out of my own head and realize, you know, I got to think like someone else. I'm going to go with a car. Car. Okay.
Starting point is 01:02:37 Clothes. Clothes. Oh, dang it. That's so good. They got to fit. I'm going to win. That's so good. I'm going to win. I'm going to win. I'm so excited for you.
Starting point is 01:02:49 Oh, gosh. Here we go. Here we go. Here we go. Here we go. I got to contain my excitement. I got to be impartial to this, okay? All right.
Starting point is 01:03:01 Show me mommy. No. Mommy. So Joe says car. Joe car was the number two answer. Okay. On the board. I need to beat him.
Starting point is 01:03:19 So you have a chance. It was 28 points. Wow. That's a lot of points. Out of 100, right? But Joe? Dang, no. But Joe, close. No.
Starting point is 01:03:33 Man. Close was the number one answer, Joe. Of course it was. Okay, can't be much more than... It's got to be 12 more. I need 12 points, man. I need 40 is what I need. You've said so many numbers now.
Starting point is 01:03:48 I need 11 points. I need 12. If I can get 40 points, I got this in the bag. If only. If I get 100 points with this next question, I might win. I got it. I just need one more than him. If I get one more, one.
Starting point is 01:04:02 That's right. Any other numbers? You haven't said any fractional numbers. I will stick with 40. Okay. If I can get pi to the thousandth digit. That's right. Okay.
Starting point is 01:04:15 Close was the number one answer. And I'm going to go ahead and write these points in here. With 29 points on the board. That's ridiculous. Joe wins again. I've got good answers tonight.
Starting point is 01:04:36 That's the crazy part. Some of these Joe couldn't even come up with an answer. He stumbled his way into a victory. Yes. I am of the people i'm connected oh gosh uh the questions humanity the questions that you gave up joe were name a planet or other than jewelry what is the most expensive single item in your house or in your home that one especially i was like okay they didn't ask anybody that's into computers at all the most expensive single item in your house or in your home? That one,
Starting point is 01:05:05 especially I was like, okay, they didn't ask anybody that's into computers at all. Yeah. Eric just all right. No TV was the most expensive. I was like, really?
Starting point is 01:05:16 Yeah. This is my computer. How are you? Like what kind of TV have you got? All right. That's a pretty big TV. Was pet anywhere on that list for things you wouldn't buy online? Ooh,
Starting point is 01:05:29 I bought my head. I totally did. Alan will buy everything online. No. Oh, I guess I, I guess I did forget in the spirit of going through all of the, Oh,
Starting point is 01:05:42 you know what? Next time I should, I should give all the other ones first. So clothes was number one, 29 points. Car, 28 points. Food was the number three item at 18 points. Something people would rather buy in person. Mattress is 18.
Starting point is 01:06:00 I'm sorry. Mattress is 14 points. It was the fourth answer. Fifth was jewelry for four. Perfume or cologne was the sixth answer for four points. And last but not least is shoes for two. So no mail order bride or anything like that. Wow, Alan.
Starting point is 01:06:17 Sorry. I mean, you know, these guys. Can we get a sensor right in here the lawyers are questioning they're looking at some they're knocking dust off of books hold on they're like we got education but we ain't got that much that's right all right well let's get back into uh now that now that we have uh solidified joe's victory what is that is that like 18 in a row like how dude it's so stupid man like i had good answers i had the number one answer and he somehow had the tying number one but you see those answers right like you don't really want to
Starting point is 01:07:01 be connected to that audience you know you know what you and german sausage have in common ellen uh don't you're the worst he got my joke even he even beat me to the punchline i'm on fire he's on a roll he is he's got it tonight he's got like an achievement unlocked there we go. Okay, man. All right. So getting back into this thing. So the next portion is visibility for seeing a consistent snapshot. So consistent snapshots work by following a certain set of rules.
Starting point is 01:07:40 Um, at the start of a transaction, a list of all the transactions in progress are identified and ignored by any reads. Any rights made by transactions that were aborted are ignored. Any rights made by a newer transaction ID are ignored. All other rights are available to read. So yeah,
Starting point is 01:08:03 that's an amazing list. If you're making a database there you go there's your list at the start get all the transactions filter out the ones that uh are aborted or ignored you can get rid of the ones that are newer everything else is fair game so isn't it just like if your transaction id is given a certain number it basically am i thinking this wrong that like the simplified way to read this is like any that anything that is your transaction id or lower is safe to read but like you know you want the maximum of that of whatever that set might be so but because if it's if it's written but higher than your id which would be newer then you ignore it if it was if it was deleted you're going to ignore it if it's still in progress you're going to
Starting point is 01:08:52 ignore it so you only want the things that are like your transaction id or lower that have been written that makes sense and if it's it if it is an older transaction that is writing but it hasn't been committed yet then it doesn't show up on the table yet. So you don't have to think about it. So I had that thought in my mind as I was reading this part. And then as I continued reading, I was like, okay, does it break the thing, the simplified version that I just said? And I wonder why they didn't just say the simplified version. Because the reason –
Starting point is 01:09:25 That's an implementation though, right? That's an implementation detail. But there has to be a reason why the author went to the pains of writing that detail. For that very reason, right? Like what you said would be an implementation detail, right? Like my transaction ID is one, yours is two and Joe's is three. Well, what if they're not using numbers? What if they're not using date time type things?
Starting point is 01:09:51 What if they're using some other thing, right? Like I would imagine that's an implementation type detail. But if I recall though, in the book, he, they, the author did mention that they are a values. In fact, Postgres, I think, was one called out where there was a max value, and it would eventually roll back around. So if you do that, then you can't do what you just said, right? If you roll it back around, and you're number one, you can no longer use that, right? You know what, Joe?
Starting point is 01:10:25 I take away your win. Alan just won, survey says. You got it. I won something. That's why. That's why then. Because the transaction numbers can roll. And in the case of it rolling,
Starting point is 01:10:38 you can't just simply look at the numbers that are lower than, than your ID number. Okay. Good, good call. Hey, so they did call out here and this is pretty, this is pretty good. And it makes sense when you hear it because the database is never truly
Starting point is 01:10:57 updating or deleting values in place to your point earlier, rights are way faster. So, so you're going to err on that side of doing things. Because you're not updating or deleting things in place, a number of running transactions can continue to function from snapshots of those objects with very little overhead. Okay, the very little overhead part of this.
Starting point is 01:11:21 I was, like, if you have, a database can already be large enough, right? And, and let's say you're a database is like, you know, you're doing something at scale. You've got a bunch of like stack overflow famously, you know, he's shown their architecture. They're still running on SQL server, right? So you, you, you have like a lot of things happening in in the database itself is already large and now you're going to have multiple copies of parts of it in memory or or maybe it's written and they didn't really get into that part i don't believe but uh you know you have these different snapshots that are happening and this is considered very little, very small overhead. Well, keep in mind, though, they're only talking about the transactions.
Starting point is 01:12:09 And if you remember, right, if we go back to and Stack Overflow is a great example of how they can make something like this work. And I think they even called it out in their in their architecture pages is the reason they can still use SQL Server is because a vast majority of their traffic are reads, right? Most of the things that are happening are people reading Stack Overflow pages. And so... With that, and I think they actually also called out that those reads go to a Redis cache, if I remember correctly.
Starting point is 01:12:40 Well, so they have Elasticsearch clusters and then they have Redis caches on top of SQL Server. So all that said, unless there is some sort of write that's happening, they probably don't even need to engage a transaction. Well, I mean, yeah. So I know I use – the only reason why I called out Stack Overflow, though, wasn't necessarily to say that they have like 1,000 transactions happening against the database at any one point but just to call out that like there are real uses of a relational database and not a distributed database of like oh well we'll just you know uh big table all the things or you know whatever like you know no mongos or whatever you know it it could just be a real relational database and a large one right right so yeah i it's it's weird because i think about it it's hard to get every scenario
Starting point is 01:13:36 crammed into your brain right but even if you had a thousand transactions happening at once and let's say they're only writing a single record. That's really not a lot of memory requirements that you need there. Now I have seen transactions where you do things like you delete an entire table and there were a million records in the table. So what do you do there? Because you're locking all those rows. That's very space intensive, whether it's in Ram or on disc or whatever. So I think different scenarios can, can play into that. But I think in general, they're saying, well, I guess this is okay. Go ahead. I'm sorry. Well, I think what they're trying to say is there's very little overhead in what you're doing because going back to what you said earlier, you're not updating these records in place.
Starting point is 01:14:27 You're writing them out somewhere. And so because you're doing that, you don't have a ton of overhead because these rights are fairly cheap. As long as you have the space and the IO on the drive to do it. But the isolation, the thing that's being isolated though, it's not like it's just a copy of the, the one record, like in the isolation, the thing that's being isolated, though, it's not like it's just a copy of the one record. Like in the query example of the three of us before, each of us have just that single record in our own versions of that single record.
Starting point is 01:14:58 I was thinking, and maybe I'm wrong, maybe I misread it or overlooked it when reading it. But I imagined it was the leaf node of the tree, like whatever that page is, right? Because we can't both write to the file at the same time. So that's the thing that is being locked at the end of the day is the page file. Well, we're about to get into some implementations of that. So we should probably continue on, and then I think we'll clear this up.
Starting point is 01:15:34 Yeah, and so basically after reading that kind of last section, I was like, oh, my gosh, this is so genius. We've basically got a number with our transaction. We've got a number with the data. We can kind of, you know, like, make it down a little bit to kind of what i was just saying to ignore some of those details we can kind of you know basically take the number that's lower except you know we know it's a little bit more complicated than that but that's kind of the gist of it right super elegant solution except how do you do that efficiently right we're talking about keeping multiple copies of data you know around for potentially you know potentially a long time right um yeah we start
Starting point is 01:16:05 with worrying about the garbage collection right we've got these other things to think about right so what are the implications of that you know we're this is another example where like we're talking about something where like a couple times in the same episode you're like well that's covered later and i'm like i did read it but i just read it so long ago that I've forgotten it already. Right. Yeah. Yeah. So snapshot. So this is where we get into a particular like implementation that can be done. And this is talking about indexes specifically because one of the questions that comes up is, all right, so you update a record in the database. Well, you have indexes pointing at that particular table. Well, what do you do there, right? Because those also have to be updated because the whole point of an index is
Starting point is 01:16:48 to make a fast search. So considering what we said about database storing multiple snapshots of state, how does it work with an index? Well, they come up with a couple different ways. And one is they have the index point to all the transaction IDs and then they filter them out behind the scenes. Right. Like in the query engine itself, not anything you do. You make a query. The query engine determines that, hey, I'm going to filter out these IDs. And then when garbage collection happens, they remove those entries from the index as well.
Starting point is 01:17:21 So that's one way of doing it. And that does to me, that doesn't sound hyper efficient like an index of indexes yeah but it's kind of you can imagine it's almost like having just a hidden field that you can't normally see and we've got the indexes set up just like we would have any other time yeah but again like when you say hidden field then the natural in 2023, it's going to put this kind of spreadsheet kind of view into somebody's mind. Like, oh, I'm just going to update that one thing. But that's not the thing here. Like, this is a whole file that has to be written. That's why I was trying to refer to the page file of whatever your tree is for that storage,
Starting point is 01:18:05 which I guess more technically to Alan's point is the index that I'm referring to, right? Then that leaf node is the bottom part of that. That's the page file for that part of the index. Well, here's what's cool, right? And this is where I'm sure every database is going to handle things a little bit differently. Right. SQL Server, Oracle, Postgres, whatever.
Starting point is 01:18:30 So this all comes down to implementation details. And this is where it's hard to say exactly what everybody's doing. But they did give an example of Postgres. So one of the things that they say is, hey, if you update something on a record and that update can fit on the same page file in the index, they don't have to change anything, right? They just add that updated thing to that same page file and the index doesn't have to change. Okay. So like as a simple way of saying it, like if you had, if this page file had room for 10 records, but you only have three written in it due to the way you're, you know,
Starting point is 01:19:05 whatever indexing strategy you've chosen, the tree has shift, you know, shuffled into a particular way. You only have three records in that file at the moment and you want to update record number two. Well, you're not going to update it in place.
Starting point is 01:19:19 You might tombstone it in that same file and write in a fourth record, right? Maybe, or maybe the implementation detail is to overwrite the second one. But it's, but yeah, it shows up in the same page file. So nothing has to change, right? Other pointers don't have to change. Other pointers don't have to change. So you don't take any performance hit because you're just writing to the same
Starting point is 01:19:38 spot where it already was. So that's, that's an implementation detail because they know exactly how they're writing these things out in Postgres, right? And another database might do it different. Yeah, and the reason why I wanted to make the clarification of the pointers is going back to that tree example, like each one of that, the nodes of that tree, or I guess more technically, what would they be called? They wouldn't be called the nodes, right? Vertices?
Starting point is 01:20:03 Yeah, right? I guess that's technically the graph, but yeah, the, the, you know, those nodes would, would technically be pointers to other files like,
Starting point is 01:20:12 Oh, for this, uh, you know, for records one through 50, they're over into this file for 51 through a hundred. They're over here. And it's,
Starting point is 01:20:20 it's, it's pointers to the individual file. And if you don't have to change that, the file that you're writing it to, which would be a new file name, uh, if you're writing a new file, then, you know, if you did do that, you'd have to update the pointers, which could have a cascading effect. But in this case, you know, if you had the space, you don't have to.
Starting point is 01:20:39 So when you said earlier that you got excited, this particular one that we're about to talk about is the one that I was like, Oh, that's amazing. Um, I don't know, Jay-Z, you want to take it?
Starting point is 01:20:50 Cause you're, you're not in your head over there. Well, I just, no, I just meant that the part above the, just talking about keeping like a version number and like being able to elegantly kind of filtered by it.
Starting point is 01:20:58 That's the part that I was super, uh, jazzed about. Um, I mean, for me, it was all of the snapshot, all everything, everything about snapshotting that we've talked about from that moment that that that portion was introduced and like my level
Starting point is 01:21:14 of excitement has only continued as we continue on with this section so yeah now we'll see my life it just so happens in the last couple months uh there's been a lot of snapshotting going on uh it's just not all good times either you know there's i've had some experience with some bad snapshots in my life you know what i'm saying but uh one thing i always you know keep thinking about as i was working on it's like how the heck does this work and i like i would find myself like even googling again and be like are you sure that this works that i'm not like missing data or missing updates or something like what if it takes long what if it gets stalled for hours and hours and sometimes uh you know it Are you sure that this works? That I'm not missing data or missing updates or something? What if it takes long?
Starting point is 01:21:47 What if it gets stalled for hours and hours? And sometimes it would time out and lose its place. I'm like, how does that even happen? Why can't it pick up? There's all these questions I had. And now I've got a better perspective on it. Why couldn't it resume? It probably failed and resumed several times in the process.
Starting point is 01:22:01 And I just didn't know it. But eventually it failed long enough for it to basically lose its record you know its transaction timed out and it wasn't able to resume because it got cleaned up probably because you know some other problem but yeah it was just kind of nice to like read through this and be like okay so I have a different kind of like level of appreciation here and I understand now why I'm not losing
Starting point is 01:22:20 data and like how this works and what it means when you actually you know lose the snapshot halfway why you have to start over again because you you've already, you know, kind of lost your, lost your, um, you know, the data got cleaned up. Basically the snapshot isn't available anymore. So before Alan gets into this, this next part that he liked so much, just while you're on the topic of like applying some of this to other things, one of the thoughts that came through my mind as reading through this section that I didn't have a chance to go in tech to find out if there is any relation to it, but it made me think about
Starting point is 01:22:51 snapshotting at the file system level. The feature that Microsoft introduced years ago and shadow volumes and snapshotting at the file system for backups. I guess there was talk, like, I remember years ago they're talking about like, well, we're going to treat the file system as a database. And now you're like, huh? Maybe I get it now. Yeah. All right. So this next one,
Starting point is 01:23:19 so we talked about having these IDs that can sort of be hidden filtered by, by the engine. This one I thought was just beautiful. So they talk about an approach by CouchDB and what they do here is an append-only copy-on-write. And if you've done any kind of like big data stuff, you're familiar with these type of terms. But basically what happens here is CouchDB does not overwrite anything in an existing page in their B tree. So they have a B tree index, right? Instead, it creates a copy of the modified page. Then a copy of each parent is made all the way up the tree to the root. And then when you get up there, there's a pointer going to that new parent node that you gave. So any pages not impacted by the right operation don't need to
Starting point is 01:24:16 be touched. So that's so cool to me. Like you have something down here at the leaf node, you make a copy of each tree, and then you just point to the new tree. That is super fast in most cases, like incredibly fast. And you have your chance of losing any data at that point is so small. It's not impossible, right? Like we've talked about hardware failures and all kinds of stuff,
Starting point is 01:24:42 but because you just duplicated that entire node tree and moved it over and repointed to it your old stuff's there and it happened super fast that was that was to me the mind-blowing thing like and that is so beautiful and so elegant and what it allows you to do is have tons of people working on things at the same time and having new copies of those things created all over the place if they need to be. You know, the crazy part about this, I mean, that's kind of, you know, along the lines of what I was describing a moment ago there, right? With the pointers. have an appreciation for the author, Martin Kleppman, all the database technologies that he has pretty in-depth knowledge of. So I'm just going to go ahead and do him a favor. If you're looking for a database guy and Martin Kleppman's available,
Starting point is 01:25:37 that's probably the guy you want to hire. Just saying because this guy apparently knows a thing or two about one or all database technology. So he might be the Wikipedia of databases. Database knowledge. He's got a cool blog, by the way. Nice blog. Oh, really?
Starting point is 01:25:56 We haven't talked about that yet. Yeah, he doesn't write very often, but he's got a couple of cool articles. And that's good stuff. That's going to make its way into the uh resources resources no doubt hey so so finishing up that last point with that creating the new thing the killer part of this is because you're going all the way up to the root anytime you go look at a new root node you're always looking at a consistent snapshot of the database at that point in time. Like if you move over to another root node, that is a consistent snapshot
Starting point is 01:26:33 of that database at that point in time. And that's, I mean, it's such an elegant solution and you don't have to filter anything. So that was one of the key things here. There's no filtering. You just point to the main root node and then you have have to filter anything. So that was one of the key things here. There's no filtering. You just point to the main root node, and then you have access to all the leafs from there on down. So, yeah. You do need a background process to clean things up, which we kind of mentioned.
Starting point is 01:26:57 So that's kind of a little funny thing I never thought about, like SQL Server, for example, having a garbage collection process. In Postgres, it's called the vacuum oh really the vacuum process that's pretty cool yeah that's funny all right so uh repeatable read and name confusion uh name confusion so uh you know we mentioned um snapshot snapshot isolation and we mentioned also before in past episodes how a lot of the things and kind of principles behind modern databases were like figured out basically a long time ago like in the 70s and uh a lot of those have been baked into standards over the
Starting point is 01:27:35 years like uh you know sequel for example but fuzzy standards yeah exactly and yeah we mentioned how yeah it's there's some room for interpretation there. Snapshot isolation in particular is a newer idea. I don't think he said when it came around, but it wasn't kind of part of this initial set of research that came out a long, long time ago. And because of that, various vendors had started kind of implementing kind of versions of it and kind of ideas that kind of evolved a little bit more organically rather than having a big kind of idea up front. And so different database systems kind of have different flavors doing this and they have different names sometimes for similar type functionality. And one of those names is called repeatable read, which when you think about it, it makes sense, right?
Starting point is 01:28:19 I guess. If you can read again with the same transaction ID, then you would get the same results, right? Even if you happen to do it at a later time, assuming that the data was still available, Oracle calls it serializable. Of course they would.
Starting point is 01:28:34 Yeah. I mean, I guess I, you know, if you're using it for backups, then, you know, I guess it's serializable.
Starting point is 01:28:40 We implemented the interface. It's serializable. Yeah. So yeah, that's one of those terms overloaded um postgres and my secret call it repeatable read so interesting so snapshot isolation um i think i've seen that in sql server definitely in kafka definitely in mongo by the way um you ever wonder where the name mongo came from i have yep wonder did you look it up no i just looked it up uh it's
Starting point is 01:29:04 from the word humongous. They just kind of shortened it a little bit. Oh, okay. But then it's not so huge. Yeah. The oxymoron there, isn't it? Yeah, it's funny. We want something big, but not that big.
Starting point is 01:29:20 Let's make it small. Yeah, so Postgres is my sequel i mentioned called it repeatable read they had uh kind of you know this functionality is very similar to snapshot isolation but it's not quite exactly but you know we've seen that happen a bunch of times but just know that it's kind of you know different names for these things but uh like pretty much any database that you can back up i think would have to implement some sort of version of this. Yeah. The biggest problem that they said here is the, the databases that implement repeatable read kind of do it in their own ways and they don't provide guarantees, right? Like when we started this entire section, that's really what everything boiled down to is what are my guarantees for
Starting point is 01:30:05 certain isolation levels? And because it's not consistent from database to database, that's where they're like, well, we can't use repeatable read here. And that's why snapshot isolation is the term that's being used. It goes back to the exact same problem of ACID compliant. Like, oh, my database is ACID compliant. Okay, great. I have no idea what you're telling me. And that's what they said here. Nobody really knows exactly what repeatable read means. Yeah, because there's no formal definition of it. So, oh, I did want to call out too, this was another example of the footnotes. So I know that Alan really likes the digital version of the book. Joe really likes the audible version of the book. Joe really likes the audible version of the book.
Starting point is 01:30:45 And I like the physical version of the book because that, that bit about the Postgres ID rolling over, that was a footnote in the book. Oh yeah. I never saw that. Yeah. Yeah. Yeah.
Starting point is 01:30:59 So, uh, yeah. So you, you, you do get some, uh, additional details from, uh, the footnot some additional details from the footnotes.
Starting point is 01:31:06 So read the footnotes. That's really the takeaway from this episode. Is that your tip of the week? You know, it wasn't going to be, but now you're going to call me out on it? Sure. Tip of the week. You know, I got a free tip for you. I wasn't planning on sharing this one.
Starting point is 01:31:20 Are we into the tip of the weeks then? Yeah. Oh, well, before we do that then, let's just say we'll the tip of the weeks then yeah oh well before we do that then let's just say we'll have we'll have links in the resources we like section and now we head into alan's favorite portion of the show that joe's trying to rush me into but i'm going to say it right now and i had to interrupt him it's the tip of the week yeah all right now i've got an anti-tip coming up first this is just a bonus this is just a lifetime anti-tip is it actually a bonus yeah i mean this isn't this i don't know maybe it's not niche i don't know we'll see have you ever
Starting point is 01:31:51 heard of the book infinite jest no okay well infinite jest is kind of like a modern classic it was written in 1996 by this guy um david forrester wallace, he's now passed on. The book is kind of infamous. It's 1,451 pages. It's really big. And it is tons of footnotes. And footnotes refer to, you know, kind of like in-world kind of like tangents and just crazy things. If I described this book to you, it would sound terrible. And somehow it's like, you know, really super popular and people love it.
Starting point is 01:32:28 And, you know, despite all the weird things I won't get into now, but there's just a lot of things that sound very weird to you. If I describe the way that this book talks about time and people and the way it kind of hops back and forth between the footnotes, like you would think it just sounds miserable to read. And yet it's wonderful. So you've read it. sounds miserable to read and yet it's wonderful um so you've got it you've read it i started too big too big so i got it on audible and you can't do it like that you need the footnotes it has to be this book in particular i think has to be read
Starting point is 01:33:00 not digitally not audiobook you have to get the book because you are going to spend a lot of time flipping almost like a you know it's like a future adventure yeah that's what i was thinking of as you were describing it yeah and it's a weird book because there's times when you just don't want to do the footnote and you can skip and you can just keep going there's times you know to go by go back it's a weird book that you don't read in order like who would do that right david forrest forster wallace would do it i think we were talking about the wise water wet book uh or not book um talk no yeah okay well anyway he's got a famous uh like speech he did uh something um i'll get a link there um water wet speech anyway it's um uh hold on i'll find it dang it okay anyway it that doesn't matter
Starting point is 01:33:48 oh we'll have um there's a really cool talk they did about it starts off with a joke about two fish and uh running into an older fish and kind of talk about um water being wet which sounds crazy but it's good just just trust me y'all. I'm going to stop transgressing because this isn't even my tip. But we'll have some show notes in there you should check out. I'll put it in here. Foster Walsh. Okay, anyway. So my actual tip then, aside from don't try to do this on,
Starting point is 01:34:19 don't try to do that book on Audible, is there's a YouTube channel called Tamara Makes Games hosted by someone named Tamara, trying to do that book on audible is uh there's a youtube channel called tamara makes games uh hosted by someone named tamara uh ukrainian game dev uh on youtube that puts a bunch of videos out around game dev in particular uh she tends to focus on uh isometric games or has a lot of videos about isometric games where you have kind of like a cool like 45 degree angle on stuff uh which is like diablo kind of like diablo or uh like a lot of strategy games like warcraft 2 stuff like that um a lot like civilization i think would probably be considered isometric a lot of city builder games sim city
Starting point is 01:34:59 stuff like that uh and it just talks a lot about videos like that in factorio and kind of crafting systems and it's a different kind of game dev video than you would normally find on stuff where you normally if you look at like game dev YouTube it's a lot of like how to make characters jump and animations and like how to kind of do basic tools and this is like just in a different ballpark
Starting point is 01:35:18 altogether and it's really good I can't believe I didn't think of League of Legends when I was trying to think of that view oh yeah totally how did we leave that one out yeah it's kind of funny because when you think about it it's um it's a really comfortable way to play a game and it feels natural but the way that you have to interact with the world in 3d spaces like developer like it's kind of tricky because like your mouse for example is basically 2d you know it's in like like screen position x and y and you've got to map this to this like kind of weird
Starting point is 01:35:45 angle and so there's just a lot of interesting things and even like the way you do your art and stuff in order to make it so you can kind of like be looking down on things but also still be able to see them like for example faces like if you're looking down from me up from above you're not going to see my face my face is even going to be maybe in shadow you know there's just a lot of things to kind of consider but uh aside from the actual just visual stuff, she's got a really nice mix of like C sharp code. She does go very fast. So I pause a lot to make sure I'm kind of grokking and doing things right. But anyway,
Starting point is 01:36:12 it's just a great YouTube channel. And so we'll have a link to that. Very cool. Okay. So for my tips of the week, that's right. Plural like a boss. We've talked about I Term 2 in multiple episodes.
Starting point is 01:36:29 I'll include links to that in the show notes for the episodes that we've previously referred to it. But no surprise that we love I Term 2 and continue with that love affair. Did you know that iTerm2 has its own status bar where you can show system resources? So I'll give a link to the documentation for it. But underneath preferences, you go to preferences profile session, and you can turn the status bar on there and you can configure it as well. And it'll have like CPU utilization or memory utilization or network throughput, or, uh, what's the get state or the job name or username, you know, the clock there's miscellaneous things that you can do in there. They, they talk about all the different, um, things that you could do in that as well as,
Starting point is 01:37:23 uh, you know, calling, uh, uh, you know, there's API, uh, Python API hooks that you could do in that as well as uh you know calling uh uh you know there's api python api hooks that you can use with it so um it's pretty cool like if you spend your day like in a full screen terminal like yeah i don't know about you guys i spend a pretty good chunk of time in a full screen terminal so it's it's, it's a, it's a nice little handy thing to be able to see there. Um, you know, like, you know, well, well, is my machine pegged out right now? Oh no, there's the, the CPU isn't fully pegged or, but my network is, oh, that's why things aren't working so well. So, uh, I'll have a link to that. Also, uh, this one I thought was super cool. So this is a beta tool from Google called ContainerDiff.
Starting point is 01:38:09 And I'll have a link to it, to the GitHub project for it. And the GitHub project has all the instructions on how to use it and install it. But what this thing can help you do, where I found it, how I found it or why I found it was I was trying to debug some cache related issues. Like why, why does this Docker image, why is it not using the layer cache that I think that it should be? And what is actually different between these layers? Right. And so that's why I started hunting around trying to find like, how can I figure out like what's actually different because they have different hashes. So in, in Docker rebuilt it. So it's obviously something's different about it. What is it that's different to help me debug this
Starting point is 01:38:54 problem? And there's a bunch of different commands that you can do on it. Um, but you know, showing like the particular image sizes or the packages like, uh like APT or RPM packages, whatever, I'll include a link to it. But maybe you will find it helpful, too. That sounds really awesome. Right? Yeah. I mean, especially if you're dealing with the exact problem you're talking about, right? Like, why is it not using my layer?
Starting point is 01:39:24 Yeah. Yeah. That's killer. All right. right so real quick just want to say correction i got the title totally wrong and even the joke wrong the name of the speech is this is water but apparently everyone gets it wrong so if you like google david forrester walsall wallace and water you'll see like a bunch of people referring to the speech. And also it's a weird name. It's just like me because I can never remember the name, but the name is this is water. Okay. Excellent. That I think outlaws adding that to our notes here. Um, all right. So I had a super awesome tip, um, earlier in the week and I didn't write it down or email myself or do whatever I needed to do to remember it. So the super awesome tip is gone. So instead you get a meh tip.
Starting point is 01:40:08 No, I won't say meh because this one comes from Jamie over in our Slack channel. And he's usually pretty good. Usually? Come on, Al. Not everybody can be great all the time, right? By the way, of.NET Core podcast and uh which is that the name anyway and uh also spaces yeah so yeah um jamie's awesome he he's he's always sharing and leaving excellent stuff so he actually has a link here to a github i'm not even going to try and pronounce
Starting point is 01:40:42 it but the the thing is z z all at the end i'm not trying to to try and pronounce it, but the thing is ZL at the end. I'm not trying to pronounce the thing before it. But what this thing does is it will actually scan your container images for end of life packages. So if you have something and if you're not familiar with end of life or end of support, basically if it's not being developed or supported anymore, there's probably potentially security flaws and things that can creep in. So at any rate, this particular tool will go out and scan your containers to find packages that are being used that are end of life.
Starting point is 01:41:20 So really cool tool there. And then I actually came up with another one while we were sitting here going through the tips that I think is helpful. So Outlaw shared a tip that is actually pretty cool, and I've used it quite a bit. He had a good one this time, too. He and Jamie both on a roll. Wow. The Docker builder prune thing that he mentioned previously, right? Which is like a super easy way to get rid of any hanging containers that aren't, or not containers, images that aren't being used and clean up space on your disk, right? So a lot of people may not realize this.
Starting point is 01:41:58 Minikubes are VM. And images that get built, like we use Scaffold a lot. We've talked about it. If you're building images, they get added to a container registry. Typically, if you're doing Docker desktop or mini cube or something like that, it's getting added to a local container registry, right? You don't think about it. You don't even realize what's going on because if you had set it up remotely, you'd have to do it differently. Well, a lot of times you'll have things show up in Minikube and you're like, well, I want to get rid of that. And sometimes it's not super easy to do.
Starting point is 01:42:32 A lot of times, if you just Minikube SSH to get into your Minikube VM, there's a Docker demon in the background and you can do a Docker images images command in there and list it well because that is the container registry that's being used for minikube you can do that docker builder prune inside your minikube vm and it'll get rid of some stuff that's just hanging around taking up space not being used so um i use that quite a bit whenever, you know, I'm doing heavy things in Minikube and this space is at a premium. I'll go in and clean that stuff up. So just know that I guess the tip here is a lot of times Minikube is running Docker inside it. You can go in there and run all the Docker commands that you were used to running on your desktop.
Starting point is 01:43:22 Wait a minute. You don't already just use many cubes, Docker Damon. No, I typically don't. I mean, we've talked about that before. Um,
Starting point is 01:43:31 it depends. So we've actually talked about this in the past. If you are trying to run a Docker image as a container and you need to attach a volume to it, it can be tricky if you're trying to do it through um a minikube type thing so the docker daemon that runs in minikube won't attach the volume the way that you expect it to like you remember we did a docker run back in the past you do a dash v and you pass in something you go in there and it's empty and you're like well i know it's not empty because i can see the stuff on my drive.
Starting point is 01:44:06 It doesn't attach it. So sometimes I'll run both of them side by side so that I can actually do a Docker run and also do my Kubernetes stuff. So at any rate. Okay, because what I was going to say is the, so, okay, first, if in your situation, as I i understand you're saying you would typically run like a docker desktop just to be your docker daemon and maintain your your build cache or your let me say your your
Starting point is 01:44:38 your registry of images right but then you would have a min cube VM VM that has its own Docker daemon and you're pushing images into it to run for like, you know, it's Kubernetes instance, right? Scaffold would do that for me, right? Like it would move those images in there. So what I do, and I'll put the command in the, in the show notes here, but, um, this is why I was so shocked because Minikube has a command called Docker EMV. I use that so that my native or my host operating system is linked into the Minikube VM to use its Docker daemon so that both my host operating system and the mini cube VM are using the same Docker, uh, Damon and the mini cube VM contains all of the storage for all, for everything Docker, be it Docker builder cache or the images or whatever. And so you can,
Starting point is 01:45:46 I'll give it, I'll give an example in the show notes, but there's an eval command that you could do. It's like the syntax would be something like eval space, dollar sign, and then in parentheses, mini cube, Docker dash E and V.
Starting point is 01:46:00 And in my case, I typically use a profile. And so, you know, like in your case, you're using Docker Desktop. So you could say like if your Minikube profile was called Docker-Desktop, which is pretty common, pretty common kube context. Then you could say Minikube Docker-env-p Docker-Desktop. minus p docker dash desktop and it'll it'll configure your docker commands to use the daemon from the minikube vm inside minikube yeah and that works and you could probably even make it work with docker run if you did a um an scp of your files into your mini cube image into the right directory. But if you're trying to do a Docker run and attach a local volume, that's where it falls apart when you
Starting point is 01:46:52 try and link up to, to the mini cube Docker. And that's, and that's where, you know, it's, it's use cases that they're running into that. If you're not trying to do a Docker run and mount volumes, then you don't have that problem. And what outlaw saying it makes things a lot easier well here's like a just a little docker tip you know bonus um unlike this is not going to be like jay-z's bonus tip by the way not trying to throw shade and this is the anti-bonus i'm not trying to throw shade on his tip. Anti-tip. But so when you're trying to Docker, when you're trying to mount a volume into your container, the important thing to note there is that the Docker daemon is going to mount a thing that exists wherever the daemon exists.
Starting point is 01:47:44 In its host right so if the if if you have like me if you if you have configured your host os to use the minikube docker daemon then aim volumes that you're going to mount better exist in that minikube vm right otherwise it's not going to pass through and that even though i'm telling you this and I know this, it still bites me so many times, especially when you get into like situations where you have like Docker and Docker inside of Docker inside of Docker, you know, turtles all the way down. Because especially like in, you know, cloud environments, you know, where you might be like, oh, let me just spin off a Docker container to go and do something. And, you know, you might be multiple Docker levels deep. levels deep and you're like wait why isn't my mount working and it's like oh yeah because it's like way off the chain yeah so what he was saying is exactly why i said you could probably scp so if you have files on on your machine that that you want to be able to mount but your docker
Starting point is 01:48:42 demons running actually inside minikube you you could SCP those files from your system into the Minikube instance into a directory. And then when you mount it, they'd be there. What would be even better is when you start up Minikube, if you could auto-mount some directories, which you might be able to do. I've never even looked into it. That's what I was totally thinking.
Starting point is 01:49:00 If you were to mount those things as an NFS mount, even though it's coming back to local host, but you did it over the network, then you probably better. The problem with that though is in doing it that particular way is I'm not crazy about the idea of technically exposing that file share. Your host system, yeah. Maybe there's something you could do firewall related
Starting point is 01:49:20 to like lock it down to where you're sharing it, but it's only available on like local host ip address or something but um yeah i mean you you could totally do it that way and if i and if i was in a situation where i needed to continue to dot to use minikubes vm um maybe because i'm just stubborn like that. I would do the mounting of Minikube. I wouldn't bother like, oh, okay, now I've made a change. Let me recopy the file in. That would be super annoying.
Starting point is 01:49:56 Cool. Yeah. All right. Well, that was a short discussion. So we'll have some notes, some links, some letters and words. Things will be on pages and on your phone. By the way, we've talked about this before, but if you're new to the show and you don't know, now you will.
Starting point is 01:50:21 The show notes that we provide are available. Usually in most podcast players, you can see the full show notes and follow along there. So if we mentioned like, Hey, there's going to be this link, uh, to Martin Klepman's blog, for example, or, uh, this is water or whatever we're calling it now. I don't know why is water wet, uh, whatever that, that silly story is. Uh, and we mentioned there link in there. You can usually find it in your podcast player. So you don't really have to go far out of your way to even get to that. So, yeah, subscribe to us on any of the major platforms.
Starting point is 01:50:54 If there's a particular platform that you like to find your podcast and we aren't there, we would like to know that. So you can hit us up on one of the many different ways, including, as Alan said, if you haven't left us a review, you can find some helpful links at www.codingblocks.net slash review. Hey, and while you go up to the Coding Blocks website, obviously check out our show notes,
Starting point is 01:51:15 our examples, discussions, and more. And if you're not a part of our Slack group, you should be. Go to codingblocks.net slash Slack and you can join up there. And we have a website, codingblocks.net slash slack and you can join up there. And we have a website, codingblocks.net and we have social delays at the top of the page here,
Starting point is 01:51:30 including one for Twitter at codingblocks and links to Slack. Other things too. You should go hit it up. Wait, I thought they're social links now. I thought they were sausage links. They were sausage links. I cleared the mustard.
Starting point is 01:51:41 Oh, I see what happened. Yes. Hurry, hurry, hurry. All right.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.