Programming Throwdown - 125 - Object Caching Systems

Starting point is 00:00:00 programming throwdown episode 125 object caching systems take it away, Jason. Hey, everybody. Happy New Year. Happy New Year, Patrick. Thank you. Do you have any New Year's either resolutions or I feel like that word is abused, but do you have anything you want to do this year that's special? I mean, I have tons of stuff I want to do, but I don't make them resolutions because I mean, we've seen in the last two years, it's very hard to predict what's going to happen. But no, I'm not a fan of like, start exercising just because the date rolled over. Like, yeah, exactly. Should always be doing it. So is there anything really

Starting point is 00:00:55 particular, like a skill you're trying to build or anything like that? Oh, man, you put me on the spot. I should have made it before. I'm sure I could have thought of something. I'll put myself on the spot. I think, you know, one of the things that I want to do is I want to get back into reading. I feel like, oh, so one thing is that I'm going to probably go back into the office, into an office this month. And so it'll be opportunity for me to get back into reading books, which it was kind of hard to do. I was so used to doing that on my commute. And when I lost the commute, I still read books, but just, you know, not at the same pace. And so I want to I want to kind of use the commute to bring that back. That's actually funny because my book of the show, I won't spoil it, is the first time I actually got a Kindle during Black

Starting point is 00:01:39 Friday sales and put a book on it. And it was the first time I bought a Kindle book, an e kindle book ebook yeah the first time i had read something on a device like that like not a physical book and actually like read it just like an entertainment book not just like flip through or reference something but actually downloaded the book like read the book cover to cover and you know did it during the break off between i guess christmas and new year's so yeah that'd be a good one for me too wait so this happened this year or are're saying this was recently yeah like a week ago or two weeks oh yeah so what's your take on e-ink versus a display so what I will say is so I just bought the cheapest kindle because I didn't know if I would commit to doing this and so it was a bit of an experiment but I'll say is has a light on it that's a game changer

Starting point is 00:02:25 because you can sort of like i would wake up in the morning and without like rolling out of bed again be like look i don't want to get my phone and all that like i'm going to see my text messages in my yeah right instead i would just like pull it out and like i'm gonna just read a chapter read read a few pages you know or in the evening the same thing And I'll say that it sounds so goofy and everyone says it and everyone's right, which is that you have less distractions on an e-reader. And the e-ink looks really good. So you can read it if you're in the sun or in the dark. And there's just not other things popping up and asking for your attention or distracting you. So I'll let you know at the end of the year.

Starting point is 00:03:05 But yeah, I too am striving to try to do a bit more reading. I am not returning into the office imminently. So yeah, I too need to fill that hole. Very cool. Yeah, I actually, so I was, I guess, thinking about just buying an e-paper like thing I could runux on or something like that um but uh but yeah maybe i should just go the road much much more traveled and buy kindle and not

Starting point is 00:03:32 cause so much heartache i was thinking that i would uh have an e-ink display that would be an e-reader but i could also use it as kind of a desk calendar and just leave it on my desk and it could show me my daily like schedule and maybe kindle can just do that so maybe someone's written that program for kindle i'm specifically trying not to do that so i will stay in the not distraction proper use of the thing for this one but but that does sound cool cool yeah um one thing that i did over the break, which I wanted to share with everybody is I, so I personally was spending like over $200 a month on public cloud stuff because I have, I have a whole bunch of sites.

Starting point is 00:04:14 I have, you know, programming throwdown. I also have a site for Mamehub. I have a server for Mamehub. I have a whole bunch of different projects all running on the cloud. And the thing that was really costing me a lot was databases, which is actually going to be one of the things that we're going to talk about, a part of what we're going to talk about today. But I was spending over $100 a month on databases. And some of these projects were either, you know, either dead or, you know, they were projects that we were running. Just at this point, just me and my family are using it.

Starting point is 00:04:51 And so I haven't really taken it mainstream or anything yet. And so I was able to really massively reduce the cost. So there's a whole like we could do a whole show on trying to reduce your cloud costs. But basically there's you know getting your own server so you have a server it's running you know ubuntu or debian or whatever and it's just there 24 7 and uh it's just sitting in you know google's data center amazon's data center what have you that's like the most expensive right right? And then there's, you know, getting a dedicated service. So like a database, you know, having a MySQL database kind of in the

Starting point is 00:05:32 cloud. And that, you know, they cut you a little bit of a deal if you go through like Amazon or Google, but still pretty expensive. I mean, for one server, you know, you're looking at, you know you're looking at you know at least like 50 a month and then there's the sort of uh like kind of pay as you go approach where you use things like aws lambda where you can say you know here's like a single python function and whenever the trigger happens i run this function but if there's no triggers then i'm not anything. And for a lot of the small projects I had, that would be way, way cheaper, right? Because a lot of these machines were sitting idle. But I didn't really know what there was in terms of databases. You know, is there a database where, you know, sure, if it got super popular, I could get

Starting point is 00:06:17 hit with a big bill, but, you know, I could at least be there until something got popular. And I found a couple which I wanted to share with everyone. So for SQL, for things like MySQL and Postgres and all of that, I'm using this thing called PlanetScale. And they have a very generous free tier. And then I think there's a flat fee. I want to say it's like $10 or $20 a month and you can get more databases and all of that.

Starting point is 00:06:43 But it's also a pretty generous flat fee. The thing is, if you start, they count the number of rows you access per second or per month, I guess. So if you start getting a lot of rows per month, I think it's like hundreds of thousands at that point, this is read or write at that point, they start charging you and then for object caching systems i'm using upstash which has like redis as a service and for the moment i'm still on the free version of upstash because i you know i'm not doing the caching too heavy yet but yeah i found both of these to be really nice and the way they've done it actually planet scale was clunky when I first tried it about six months ago, but I went back to it a week ago and it's now become extremely nice. It's very similar to just having a MySQL database.

Starting point is 00:07:32 There's really nothing you have to do special. But yeah, check these out. I think these are amazing projects. That's really cool. So that may or may not lead into stuff we talk about later, but database caching. But yeah, that's pretty cool. Have you ever thought about for some of that stuff, like the rage, well, at least my stuff I pay attention to would be like self-hosting some of that stuff, like having a computer you could just run Docker on and like letting people do that.

Starting point is 00:07:59 I mean, I don't know, depending on what usage patterns you have, I've been looking at this like WireGuard VPN as well, which would, there's a variety of people spinning up around that VPN, basically letting people appear as if they're in your home network, but easily for like relatives and stuff, even if you didn't want to expose it to the public web. Is that something you considered or just a bit too far off of what you normally do?

Starting point is 00:08:23 Yeah, I mean, the thing about it is it's actually kind of difficult to administer a MySQL server. You have to do a lot of setup. And then every time you update the OS, there's like a chance you break the server. So I ran Mamehub off just a regular VPS, like a regular machine in the cloud running Ubuntu. And so Mamehub had all of these things. It had MySQL, it had Redis, it had had everything running and it wasn't using docker probably should have been but yeah definitely without docker like you know it was breaking you know maybe like once a year or so i would have some big break or like the hard drive would fill up or something like that

Starting point is 00:08:58 there's just like all these small reasons where you could get kind of burned and then you have to kind of spin up on oh what did i do here and oh like you know the files i forgot what they are and everything so this this is nice because uh the worst thing that could happen is you can get hit with a high bill i do recommend folks turn on the uh there's a thing you can turn on where if the bill goes above a certain amount it shuts off i mean definitely take the five seconds to like go to the cost explorer and like set some ceiling on all of these things because you just never know i mean it could be like a dos attack or something and now all of a sudden you have some big bill but provided you do that

Starting point is 00:09:35 i mean these things will well should you know be up forever it's like upstash i set it up six months ago and to be honest i even forgot the name of the company or anything because I just, it just been working for six months. That's funny. Yeah. You remind me of the story of the guy when GPT-2 came out, the like natural language, deep learning trained model, and you could give it queries and he was doing like a sort of crude RPG over it where you could kind of just do anything you wanted and it would respond. And he was running it, I don't remember, I think on AWS or something. And it got super, super, super popular.

Starting point is 00:10:14 And I forget how obscene his server charges were initially. And same thing like you said. So I guess like that's always possible. So definitely try to make sure you set some sort of upper limit, even if it's pretty high, just so you don't end up waking up to a seven figure server charge. Yeah, I mean, you can easily end up with $100,000 bill. I mean, if you have some website that goes viral or gets DOSed, you could get hit with a huge bill. So it's definitely good advice to do that. Yeah, that's pretty funny. I think GPT-2 is not light.

Starting point is 00:10:50 I mean, it's a giant like hydra, you know, huge neural net. That's pretty wild. So what did he do? Did he negotiate his price down? Yeah, I mean, I think those ones that get like famous, famous, they sort of, you know, someone behind the scenes makes it

Starting point is 00:11:02 makes it not a big deal when a college student sort of says, oh, you know, I don't understand. it makes it not a big deal when a college student sort of says, Oh, you know, I don't understand. They don't want the bad press. So I'm sure behind the scenes, it just sort of got taken care of. And, you know, someone showed him politely how to set his his bill limits to something more reasonable. Yeah, that makes sense. But yeah, don't let that be you. Because I don't know what would happen if if you weren't famous or maybe they just didn't want to change the bill. Well, we've heard stories like this with cell phone, right? Where I remember a person was operating a crane in Alaska and they were roaming the whole time and they were streaming YouTube videos for like a month. They got some enormous bill like 70K or something. And yeah, that made the news too.

Starting point is 00:11:45 And I think they negotiated it down to 2000 or something. But yeah, be careful for that. That's always an issue. Speaking of reasons why you might also end up running unintentional code, this month has been pretty bad for Log4j. So this is Log4java. I've actually used this framework before. Yeah, same here.

Starting point is 00:12:04 Oh, okay. It's like super popular. And basically, it does a lot of really fancy stuff in Java to allow you to manage your logs, set verbosity levels and, you know, do smart tokenization so that you can insert, you know, data and just have better logging output, which logging is always just one of those things that's very difficult to kind of do really well. So Log4j as a framework was super, super popular. And of course, run in not exposed to the public, but on services that were exposed to the public, because pushing your logs up is one of the ways to debug if something goes wrong. And so making sure that, you know, servers are logging out information they're doing and that you can triage them is super critical. And they ran across three in the last, I think, sort of month vulnerabilities. And I was telling Jason before the show, we were talking about it's actually a bit difficult because it got so mainstream to find actual engineering discussion of what happened. But I have a couple of links in the show notes. And basically the exploit was

Starting point is 00:13:06 if you could insert a certain token into something you sent to the server and that token could get logged, Log4j would try to take action to unparse that token. And so I think Minecraft, for instance, got a lot of news because someday people are going to look back and talk about minecraft as you know rivaling the biggest things and crazes of the 80s and 90s oh yeah definitely for a long

Starting point is 00:13:31 time now it's crazy um how long minecraft has been going on jason and i you you and i played minecraft what 11 12 years ago or something i don't even know when it first came out like super long time ago now yeah it's still popular with kids today. Yeah. I was terrible at Minecraft. I mean, I just didn't have the patience really. Um, because I didn't realize that you had to look on the internet to find the crafting recipes. And so I tried a few different random putting things in the boxes and I couldn't figure out how to make the craft. I see. I think I, yeah, I couldn't even figure out how to make the crafting table. i was like okay you know i think we're done here there's no one's ever going to play this i remember you telling me about it on a work trip that had to

Starting point is 00:14:13 have been over a decade ago but anyway so minecraft super popular okay i was in a side back on topic and i think people were finding out you could send chats to each other in Minecraft. And if you send a chat that has a certain string, just like the classic, you know, SQL, uh, unescaped, um, unsanitized stuff and getting SQL exploits, you could say like, you know, dollar curly brace, and then some URL. And basically what would happen is log4j would go, Oh, I need to resolve this. Oh, this is a URL. I need to reach out to this URL to figure out the value of this variable and then the url could return a you know custom java bytecode class for you to run and that bytecode class we could just do anything and so this is obviously super super bad code injection wow wow it's wild this is like one one of those worst possible nightmare things. So I put in here some explanation from JFrog,

Starting point is 00:15:09 who they actually recently went public with Artifactory and stuff. But then also a link in here from Microsoft, which of course getting some of this heat from Minecraft, which all the Microsoft-hosted ones were patched very, very quickly. But we were just talking about hosting. Lots of other people are running Minecraft servers that are still vulnerable because they were relying on this vulnerable Java version.

Starting point is 00:15:29 But they were also discussing there how much logging they were seeing of people scanning for this vulnerability. So just hitting random endpoints with various things and trying to see if they could get pingbacks from your server, which would mean your server was vulnerable. And then they could decide what to do

Starting point is 00:15:46 once they had a list of all of the GC targets. How would they know if you're running log4j, whatever the service is? So, I mean, I think you could just find anyone, for instance, and it didn't go into this level of details. This is not my background. But I can imagine going to every IP address, going to port 80,

Starting point is 00:16:03 and then many of them are going to respond with some pre-canned server, something that you could recognize. And then sending back like, you know, Jason's whatever, oh, Mamehub. Mamehub.whateverjasonuses slash, you know, question mark equals, and then just putting like, you know, this format is streamed.

Starting point is 00:16:22 And many servers, it's very common to log the endpoint or to log the query params or whatever. And so if you have one of these in the query, I'm not 100% sure that would work. But you can imagine someone going to every port 80 exposed that had a signature that or fingerprint that matched some common open source host that uses log4j, inserting it into the query params, and then having a payload which says ping back my URL. And anyone who pings back, they would sort of insert into a database somewhere for later deciding what to do. Yeah, that totally makes sense. Yeah, I probably need to check Mamehub for this. But actually, Mamehub's running all Python and Flask and stuff, so I should be okay. Oh, probably okay.

Starting point is 00:17:06 Until it turns out, secretly, Flask is running some Java tool. Yeah, that's right. Flask has a module, a Python binding module. Yeah, that is wild, man. Very cool. Yeah, I always wondered that. Thanks for explaining all that. So my first article is scan of the month.

Starting point is 00:17:23 And this month, it's game boy there's some weirdness around the url but the url i posted in the chat should stand the test of time but they actually use ct and i don't know very much about i mean i know ostensibly mri is magnetic resonance i don't know actually what ct is isn't it computed tomography oh my gosh look at this man i put you on the spot and you nailed it i don't actually know under the hood like yeah what those really mean but but anyway they use ct to scan a bunch of computer components and this month they did a whole bunch of game boys like the game boy original the game boy color the ds the dsi Game Boy Color, the DS, the DSi.

Starting point is 00:18:10 And they use the CT scan to extract all different layers of PCB. And then they just go through explaining at a hardware level all the different components. And it's really cool to see over time how they kind of integrated everything. Like the earlier ones, it's it's like okay here's all the buttons and here's the d-pad and there's little traces going to all over the place and then and then later on it's like okay we just have this chip that's doing everything and and like literally all the traces are just going to this one place or they're baked into the board or what have you um but yeah check it out i, I don't know very much about

Starting point is 00:18:45 hardware. I got a really kick out of reading this and the graphics are beautiful. It should work on the phone and on the computer. So I read it on the phone and yeah, give it, give it a read. It'll be worth five minutes. Yeah, that, those, the artwork in there is really cool the way they did it. The presentation of the website is fun, but it definitely, definitely has a pretty pictures. Something that I was surprised about and it turns out it's rather common. So not, not CT scans. I mean, there's a, there's a very involved, but x-ray actually for people who do a lot of like electronics assembly, it's very common to x-ray boards and look for defects.

Starting point is 00:19:22 So when they make the printed circuit boards, like the, you know, where all the traces are, you have multiple layers. So you can only see stuff that's on the sort of like exterior layer, the bottom layer and the top layer, but you can have middle layers. So I think for like your motherboard in your computer, it's common to have five plus layers internally, like over top of each other, routing, ground planes, all this stuff. And you can't see that from the outside. So if there's a break or an issue, you wouldn't be able to detect it. So it's actually very common to do x-rays of electronics and look for, you know, places where a trace is broken or isn't routed correctly or something is happening. And so doing those kind of, I thought that was limited to when you go to the dentist and they do the,

Starting point is 00:20:04 you know, x-ray. But apparently doing sort go to the dentist and they do the x-ray. But apparently doing sort of real-time on-demand x-rays is more common than I would have thought. Yeah, that's wild. I didn't realize x-ray would be that precise. My next news topic is not really a news topic, but something that came up and then it was alluded to by some article I wrote. So I decided to talk about it here, which is Hiram's Law. Have you heard this before, Jason? No, I haven't. So, okay, I'm going to read it directly or else I'll end up misquoting it. So Hiram's Law says that with a sufficient number of users of an API, it does not matter what you promise in the

Starting point is 00:20:40 contract. All observable behaviors of your system will be depended on by somebody. So this comes up from time to time. When you do software engineering, there always is like mythical man month, Hiram's law, the networking issues. You get into all sorts of things where everyone keeps rediscovering it. And this is one of those ones that I had run across, but didn't have a name for it. Now I have a name for it. And it's Hiram's law, which is if you make your system behave in a certain way, if enough, if it gets popular enough, everyone will depend on every quirk intended or unintended of that system. And so the person, you can go to the website credit sort of where they got it from and these kinds of things.

Starting point is 00:21:27 But people were chiming in an article I was reading with all manner of examples. So if you go back to like the old Nintendo, the NES, that like certain quirks of how the graphics rendering became like super important. So in a later revision, if they had fixed the silicon and fixed that quote unquote bug, everything would have stopped working.

Starting point is 00:21:45 There are certain weird op codes or low level device drivers that just do certain things at certain time, because that's the way to always work. But then you can't change it because everyone does it that way. And so if you fixed it later, you'd break other people's stuff. And it doesn't matter whether you wrote that down or not, if it behaves that way, it's going to act that way. We ran across this recently, and I was bringing it up to somebody. It's not exactly the same, but in the same ballpark, which is somebody was iterating a hash map. Of course, like the how

Starting point is 00:22:17 things are inserted to the hash map could vary for many reasons, but it shouldn't be something you ever rely on. It isn't a contractual obligation for like, if you insert in a different order, if you do whatever, if you need a stable visit order, you have to do some sort of sorting or some guaranteed way. And HashMap is not that. Yeah. I think in Python, there's like a special structure called, it's either called stable HashMap or ordered HashMap. I think it's called ordered hash map and so ordered hash map when you iterate over the keys you get them in the order you put them in i still don't understand why anyone would want that it keeps like a linked list like it must yeah it must keep a link list in addition to the map in the background okay i mean sure but but needless to say so so we ran

Starting point is 00:23:02 across a test where someone was putting stuff in a hash map, iterating the hash map, and then basically asserting in a kind of like unit test that like the results were in this order. And I pointed out to them like, this is really bad, like super fragile, like, you know, this is overly simplified. Or sort it afterwards or something. Sort it, do something or do a check where you don't rely on the sorting, just check that it's present, right? Because yeah, that iteration can change at any time. But I think you could easily imagine building two or three layers up

Starting point is 00:23:36 to where when you call an API, you say like, give me a random user, but actually the random user always gives you a sorted list. Somebody somewhere isn't going to realize it was supposed to give you a random user. And they're going to assume that if they keep calling it, they always get them sorted. And they're going to depend on it being sorted because it's never been otherwise. So this is putting a name on one of those things I've always kind of known for years and run across many times.

Starting point is 00:24:01 But it's good to have a common language for it. Yeah, we see this a lot. So a lot of people are curious when they find out that MAME actually gets more and more inefficient as you move up in the versions of MAME. So for example, like the most recent version of MAME can't run on a Raspberry Pi, but you can run MAME from 10 years ago just fine.

Starting point is 00:24:24 And even the same game will run just fine and even you know the same game will run just fine if you're on the current mame it won't work and the reason for that is because of all of these things where you know they kind of use various tricks to take advantage of you know the way the hardware was set up and so you have to emulate kind of all of these tricks perfectly and it becomes really time consuming and there's a one in particular people should check this out in contra the arcade version of contra in the actual arcade the bullets would flicker but the flickering was caused by just the different processors running at different speeds and so they just thought it kind of looked cool

Starting point is 00:25:04 or maybe they shouldn't you know didn't have time to fix it. And so they kept it. But for a long time in MAME, the bullets would not flicker. And so they actually did all this work to make them flicker and it became really computationally expensive to do that.

Starting point is 00:25:18 But yeah, it's similar to this law where it's like people either took advantage or they took artistic liberty with the situation that they had on the hardware side. And then recreating that ends up being really hard. My second story is make the Internet yours again with an instant mesh network. So I haven't tried this full disclaimer, but I love the idea just to give a bit of backstory. So, you know, a long time ago, they invented IPv6.

Starting point is 00:25:48 And the reason why IPv6 is cool is because, you know, well, OK, if you go back even further, every machine had a public IP address for that machine. So I don't know how far back you have to go for this to be true. But, you know, you would have, you you know 27.27.27.27 like that was you and so anytime anyone you know ping that they would get your computer that you're you know on right now or your phone or whatever you're on right now when you're listening to this and so you could just run a web server or an email server on your device and that's it. You'd be done. It's like, okay, I have my web server on 24. Anyone can just go there and see my website. Well, people started

Starting point is 00:26:32 running out of IP addresses because there's a lot, right? There's 256. Well, the number space is 256 to the fourth. A lot of those addresses you can't use for a variety of reasons, but there's still a decent chunk of addresses, but not enough compared to the number of machines that we have on the planet Earth. Right. It's not even close. So and all the machines that have ever existed because you don't really you don't really know when a machine has been decommissioned. Right. So they proposed IPv6. And this just adds a ton of bytes, bits to the address space. And so with IPv6, in theory, every machine could be accessed directly. So you could have a machine that's at your house

Starting point is 00:27:20 and someone out there on the internet, an internet cafe could just like go to that machine and just see what ports are open and everything. And so, yeah, one thing to keep in mind is, you know, because, so actually just doubling back a bit. So to deal with this address space issue, we created NATs, Network Address Translators.

Starting point is 00:27:40 So, you know, a portion of the internet address space is blocked off for internal addresses. And so, for example, in my house, you know, everything is 192.168.something.something. And anything after 192.168 is going to be some internal place. Could be someone's house, could be, you know, a business, right? But there's, there's maybe hundreds of thousands of devices that have the IP address, you know, 192.168.0. I guess that's to be at least one. So 192.168.0.1, there's tons and tons of devices with that address. But what happens underneath is, is machines go to, let's say, your house. So let's say one of these machines with the same address tries to reach google.com. They ping Google and all the machines on the way keep track that that machine wanted to talk to Google. So let's say you have 192.168.0.1 is your phone.

Starting point is 00:28:45 Your phone talks to your router, which is running on, let's say, that address dot two. And the router keeps track of that phone wanting to reach Google. And then that router then goes to your ISP. Your ISP has another router. It's just the same thing. And then when Google replies with their logo and all of that, all these machines know who needed it. And there's a whole bunch of complexity around that.

Starting point is 00:29:11 But that's how that translation works. The problem is there's no way to go the other way. So if Google wants to send a message to your phone, that isn't a reply. That's just an organic first message. They don't know which 192.168.0.1 is your phone and not someone else's. And so there's just no way for them to talk to your phone. Now, that's good for a security perspective, but it's bad from a, you know, running a server or doing something, you know, distributed hash table or anything like that in your house. And so what this

Starting point is 00:29:46 system does, it has a weird name. It's named after one of the HP Lovecraft monsters, like Yggdrasil or something. Isn't Yggdrasil the Nordic Tree of Life? Oh, okay. It is, yeah. Norse mythology. Oh, isn't there

Starting point is 00:30:03 an HP Lovecraft monster that sounds very similar? That's Cthulhu? No, no, there's one that starts, I think, with that. Let me look it up. I think Yggdrasil is a Nordic tree of life. But anyways, continue. With branches and connectivity, I think it probably goes along with the... Oh, it's actually, there's one just called Yt.

Starting point is 00:30:22 Oh, I know. Oh, yeah, so they probably just stole it from that but uh anyways so overload yeah yeah they overloaded yeah they needed ipv6 for their you know monster names but yeah so check this out i mean you know as as i said it's going to basically make your any machine that runs this is going to be exposed to any other machine in the whole world that's running this and so what that means is you probably want like a firewall or just like you know if you're running a server or something you want to be really careful about this you know some people at their house they might run a samba or a network file share and leave it totally

Starting point is 00:31:04 unprotected because of this. They don't have to worry about this address translation or the address translation is kind of protecting them. This will kind of strip off all of those protections. So, you know, use this, kind of understand what you're doing. Maybe run it in a Docker container first. But yeah, I thought this was really cool. And I think the reason why IPv6 has never really taken off is all of the security nightmares that would be downstream from that.

Starting point is 00:31:31 But yeah, this will allow you to, without having to do a lot of complicated port forwarding, just access all the machines in your house from anywhere, which I think is kind of cool. Is it time for Book of the Show? Book of the Show. My Book of the Show is kind of cool. Is it time for book of the show? Book of the show. My book of the show is an amazing book. I've loved this about halfway through it. And I'm a really, really big fan. I love the format. I think they did an amazing job with this.

Starting point is 00:31:56 It's called AI 2041. And the premise behind the book is they're outlining different things that they think could be real in 2041, where AI has just advanced like a massive clip. The book is broken up into chapters. Every odd chapter is a independent story. Because it's a set of short stories, I can't, you know, I don't want to spoil any of them. You know, there isn't a plot that covers the whole book. It's like, think of it, each odd number chapter

Starting point is 00:32:28 as its own small book with its own beginning and end. And then every even chapter is Kai-Fu Lee who ran the Google Asia Research, Google Research Asia. And he goes through and explains the technology in the chapter you had just finished. And he kind through and explains the technology in the chapter you had just finished. And he kind of like explains it as if you're a total lay person. I mean, he goes into what is a neural network and everything. So I found that like really, really cool, like dynamics.

Starting point is 00:32:56 So the stories are all fiction stories, obviously. But not only that, they're dramatic. So without spoiling too much, there's a story about these twins who their parents are killed in a car accident, you know, when they're babies. And so they get taken to an orphanage. And it's about them growing up in this orphanage and how, you know, AI has massively improved sort of education. And especially in like kind of orphanage situations, you know, they, they have sort of these AIs that kind of care for them, like in addition to the people and everything. So yeah, I mean, it's, it's like, I could see kind of something like that happening where, where AI

Starting point is 00:33:34 can be sort of like a teddy bear, like a buddy for somebody as their, as their little, and all of them are plausible. I mean, I definitely think that they really push to the limit how much progress we would have to make. But nothing is totally fantastic. Like everything I think is doable in 20 years. And at least from the chapters I've read so far. One chapter, which I thought was kind of funny, was there was a person who or they're a driver for self-driving cars. So the idea is, you know, self-driving car gets stuck in an intersection and this person sits in their house and they have a steering wheel and a computer and their whole job nine to five is just taking over

Starting point is 00:34:19 from cars that are stuck. And so they're, they were recanting kind of their career. And when their career started, it's like, oh, a car is stuck at a stop sign. It's just confused. And so he just takes over. He just, you know, looks around using all of his monitors. There's no one in the intersection.

Starting point is 00:34:37 He just drives on and then he hands the control back to the AI. But now it's 2041, right? I mean, it's, I don't know, 15 years later or something from when this person got their job. And so now the situations they're in are just crazy. It's like, oh, the car needs you to take over. And he jumps in and the car's like on fire, you know? And so his job has turned into doing just crazy things all day, just like deftifying things all day. And he gets some PTSD from it.

Starting point is 00:35:07 So I don't know. I think the whole book is really interesting. Highly recommend it. Oh, that is not good. I'm going to add that to my list. Yeah, you can read it on your Kindle and let us know how it is. There we go.

Starting point is 00:35:19 So talking about books I read on my Kindle, which I've never read any illustrated books. I don't know how that works yet. But I read Dawnshard by brandon sanderson so um as if if you know anything about brandon sanderson his prolific writer but also his books tend to be kind of long and as if they weren't uh sort of long enough the uh current series that i've been working on the stormlight archives is supposed to be 10 books in total two two five-part books. But then between each set of books he releases, he's releasing a novella between them. And so this is the one between the third and fourth book, The Words of

Starting point is 00:35:57 Radiance and Rhythm of War. So it's a novella, but it's's still i guess i'm looking here on the internet says 270 pages 269 pages so 269 pages is considered like the short book between the like other books on either side of it oh my gosh it's like three point book 3.5 um but book four itself i mean the rhythm of war i forgot it's like 1200 pages i think we were talking about it before um so anyways i read this on my kindle a lower commitment than reading full Rhythm of War, which is what I've started now. But this was an easier thing. And I read it and this book was really good. Obviously not a recommendation for people who haven't read books 1, 1.5, 2, 2.5, 3, and now 3.5. But you know, if you're in that series, definitely worth a read.

Starting point is 00:36:46 Some people don't read the in-between novellas, but I like to because if you don't, I feel like you miss stuff. So I enjoyed this. This is one of those important and good for me and people in the know, but everyone else is like, yeah, this is not a useful recommendation

Starting point is 00:36:59 because I'm not gonna jump in at the middle of this huge, long, many thousand page epic. So. Well, I've definitely, we've gotten emails from people I'm not going to jump in at the middle of this huge, long, many thousand page epic. Well, no, I've definitely, we've gotten emails from people who either got into, you know, your interests, your reading interests. They've like adopted some of your reading interests or they're already there and the show really connects with them. Yeah, I've gotten some recommendations from people as well for good series that I've read. Unfortunately, my reading was, as we talked about at the top of the show has been severely diminished recently. So I'm trying to get back on the train. Got to make it through my backlog.

Starting point is 00:37:33 Cool, man. All right. On to the tool of the show. Let's go. I wonder if we should pay, like, get an announcer or use Amazon Polly or something to get a real announcer voice? Or we should just keep doing that. Let us know. Send us an email.

Starting point is 00:37:48 A horribly bad fiver. Yeah. Do you want us to pay an announcer to do that? Or I wonder if Amazon Polly has an announcer voice where you could just type something into Amazon Polly and it would... I don't know what this is. Oh, Amazon Polly is just a speech synthesis program. Oh, yeah, there it is.

Starting point is 00:38:10 Yeah, so my tool of the show, which is never going to sound as cool for me as it does from an announcer guy, is Swagger, which is now rebranded as OpenAPI, or at least part of it. So I used Swagger a long time ago, and it's really come a long way. I was working with another team at work that had a Swagger, had their stuff running in Swagger. It was really impressive. And so basically, the way it works is, you know, if you want to, you know, have a web service, so something where someone gives you some input, you know, some set of arguments, some JSON output and then on your actual server,

Starting point is 00:39:06 you could be doing whatever. You want something that is pretty standard, right? Because you don't know what languages people are going to be using when they're calling your service. And so you can imagine just like, yeah, taking in a JSON blob and then outputting a JSON blob. But then the problem with that is it gets really generic. Like people could put anything. They might mistype one of the keys, right, of your JSON object. They might just have a typo in there. And so that becomes kind of difficult to debug.

Starting point is 00:39:37 Like you have to write all sorts of logic to handle that. And then also to, you know, make sure you produce the right documentation. So you tell people, you know, when you call, because they probably won't be able to see your source code or it'll be, you know, cumbersome for them to have to go through your source code. So you want to be able to tell them like, look, you know, I expect this, you know, JSON object and I expect this key called foobar and the F needs to be capitalized and etc. And so Swagger is a way for you to sort of write up that contract and then it'll automatically generate, you know, a website where people can go and, you know, read more about how your API works.

Starting point is 00:40:17 And then this is where, you know, I don't want to spend too much time on this, but basically, you know, they created this standard called OpenAI. There's a whole bunch of tooling. There's tools that Swagger, the company has that lets you test your API, you know, from the outside and everything. So definitely check this out. If you're building a service that you either you want to consume from some other language or, you know, you want, you're building something for other people, definitely check these things out. They're super powerful. I've used some stuff that had this.

Starting point is 00:40:49 I've not written a ton of the sort of, what would you call that? The metadata you need to generate them. Although I have a little. Yeah, I guess. But I've used that or something. Okay. But I have used some that had it.

Starting point is 00:41:02 I do really like this thing you're saying where you can sort of go to it and have it guide you through what you're expected to send, then get back and super powerful. And I can imagine like keep going this way and then having tools to like auto-complete your code for like how the expression should come in and like what's expected and not when you start linking to serve, like one day, we're not there, but like one day in my head, it's going to be a beautiful place where everything's going yeah

Starting point is 00:41:27 that's a genius idea i bet someone's written that like a visual studio code extension where you kind of give it the swagger url like the you know the documentation url that you're trying to you know access and then they could do they could do tabbing for you and so that would be amazing if you're out there and you're looking for something to build and then they could do tabbing for you. And so that would be amazing. If you're out there and you're looking for something to build and this doesn't exist yet, that would be phenomenal. And it probably wouldn't be that hard to build. Ooh, famous last words.

Starting point is 00:41:56 We can't guarantee that last part. My tool of the show, I thought I mentioned it before. I couldn't find it. If I did, oh, well, it's worth mentioning again. I haven't heard of it. Oh, really? Okay. This is RipGraph.

Starting point is 00:42:09 Now, you're probably sitting there like Jason saying, I use only the most powerful, awesome Unix tools, and I don't need anything that isn't POSIX compliant and shipped with my BSD install. I'm not such a user. So I have found this tool enormously helpful for searching around my code. Ripgrep promises to do a lot of things,

Starting point is 00:42:32 many of them interesting, including being faster. But the thing I love Ripgrep for is that I can search the directories I have recursively by default, and also honoring my get ignores and ignoring binary by default. Now I'm 100% sure you can do the same thing with grep. I don't know how fast or slow it would be. But that incantation is beyond me. So I can do rip grep, which I have is just RG and then pass it a dash I flag if I wanted to ignore the capitalization of my query,

Starting point is 00:43:07 type in the word I want to search for, and get back in my Git repositories all the places I've used that keyword. Wow, this is so cool. Yeah. Apparently, there's a bunch of other tools that were listed there that I learned about from going to their website in preparation from here, things like Silver Surfer and other things that I guess also do the same thing, but I've never used those. RipGrep says it's faster than all of them. The speed is good. I've never been bothered by that,

Starting point is 00:43:33 but just this ability to automatically ignore binary files, automatically ignore things. For instance, we have build folders that hold outputs. And whenever you search, if you forget, and you just type grep, and then you do it recursively, you end up getting a lot of outputs from the build folder, you know, inputs that were copied or zip file, whatever, all these kinds of things.

Starting point is 00:43:55 And you don't want them. And so rip grep will ignore them because they're ignored and they get ignored. And so that's super powerful. And also it does have ability, which again, probably can do in other things,

Starting point is 00:44:05 but to like go into even compressed stuff. So if you were trying to search dependencies and you're using something which brings in dependencies as compressed, then it'll be able to handle those things as well. Wow, you know, that is amazing. I'm definitely gonna use this. So the other thing that this site had,

Starting point is 00:44:22 which blew my mind is there's grep inside of git So you can type git space grep space and then do grep inside of git. I Also didn't know that but does that work if you're not in a git repo like yeah, I have like something it's only for git repos Okay. So the problem is sometimes I have like a directory with multiple git repos in it and I don't know Which one of my projects I was doing that thing in so but yeah no i also learned that from there yes wow super cool learn something new every day yeah i should check this out rip grip it's probably like you can just app to get it or something yeah i'm on mac os so i just brew installed it oh yeah there you go yeah it looks like it says windows lin, Linux, and macOS support.

Starting point is 00:45:08 Yeah, Windows, Linux, Arch, Gen 2, Fedora. They did everything except Debian. Where are we, Debian? Oh, yeah, there you go. Yeah, you can apt-get install it. Okay, cool. All right, so if Jason's voice gets funny here for a minute, it's because he's downloading... Yeah, that's right, yeah.

Starting point is 00:45:21 If I start doing that weird thing where your voice becomes really robotic, it's because my Internet is totally used up downloading Ripgrip. Cool. All right. We'll jump into the topic. Maybe we should start with why these things exist, which I think is always a good place to start. Object caching systems. I'll give an example to illustrate why we need something like this. I recently built a replacement for Google Photos. So I built, you know, Jason

Starting point is 00:45:55 Gauchi Photos. And I convinced my wife to install it. And so now her phone crashes like every hour on the hour. And she's not super thrilled with that. So I'm definitely not recommending it to the audience yet. But the idea is it would periodically scan your phone for new photos. And if you're on Wi-Fi, it would push them to the cloud. And then I also had an app, have an app where you can go in. And very similar to Google Photos, you can kind of see all your photos. So the way the app works and the way

Starting point is 00:46:25 the website works is very similar where, you know, you need to see the photo, but the photo is in a, you know, protected URL. You know, it's not, otherwise anyone could just see all of your photos, right? So you kind of need permission to see that one photo. And the way this is done under the hood, which is not my creation, this is just how all the companies do it. And it's built into a lot of these cloud providers, is with something called a signed URL. So if you ever go to Facebook or your Google photos or any of these things where you need to authenticate, you can look at your photos, let's say in your Facebook timeline, and you can go view and tab and you'll have a URL for that

Starting point is 00:47:12 photo of, let's say, you and your family. And you can actually give that URL to somebody else and even a stranger, and they will see that photo of you and your family, right? So it seems like a security issue. What's actually happening under the hood is they've generated this signed URL that's kind of temporary. And so that URL works for, you know, maybe five hours or one hour or what have you. So effectively, like, you know, it works for your session. And then you'll have to go back and ask Facebook server for a new signed URL to see even the same photo. Now, the challenge is, you know, signing the URL and doing all of that is really expensive, right? Like, and so if you pull up, let's say your Google photos, you know, they might show

Starting point is 00:47:55 you 20 photos or even 50 photos right off the bat on your screen. All of those are signed URLs. That's 50 signed URLs that Google has to generate. And even when I was testing this by myself on my own Photos app, every time I would hit F5, it would have to generate these 50 signed URLs. And getting the URL signed takes like on the order of maybe tenths of a millisecond or something. So it adds up, right? And so, you know so what you want to do

Starting point is 00:48:26 is you want to sign a URL for somebody and then you want to cache that result. So if they're just hitting F5 or if they're going to different parts of your page where they're seeing the same images, that you're not signing everything again and again and again. You're not signing the same image. And so that's a perfect use case

Starting point is 00:48:44 for an object caching system. So what I ended up doing was when you ask my Photos app for an image, make sure you're authenticated and all of that. And then it checks this object cache and says, has this person asked me for this image before? And if they have, I just return that signed URL. And if I don't see it in my object cache, then I go and I sign the URL. And so that's kind of one example, but it kind of highlights, you know, it saves you on time. And sometimes, you know, you end up in situations where you really need the cash. But most of the time, it's there to really save you on time and resources. Yeah, I think the other thing, like you mentioned, not just the expense of computing something,

Starting point is 00:49:33 but even retrieving something. So the difference between retrieving something off of even a solid state disk, although it is closing the gap. But if you are talking about, what do they call it? Like read amplification. So sometimes you want to read something. And not only like Jason's mentioning, you have to do something like compute this cryptographic hash of something to sign it, like very expensive. But sometimes you need to pull in five or six or seven or eight different other resources. So one read amplifies

Starting point is 00:50:02 out. Now, all of those reads may happen if you're at enough scale in a data center, which are connected by networking switches that cost more than your house. And so they operate all this fiber stuff, like super fancy, low latency, again, but ultimately that data has to live somewhere. And your choices are to live in RAM or to live on disk or tape, I guess, in some case, like a super bad, like long tail case. Yeah. I think the glacier and that stuff still uses some kind of tape, like a robot and tape or something. I don't know. Like that's in one of those horror stories,

Starting point is 00:50:34 the guy who gets a ping up on it. Okay. Okay. Sorry. Sorry. Oh no, can't get sidetracked. So if you talk about like cash next to the CPU, you know, RAM speed, and then, you know, hard drive then you know hard drive you're sort of like talking orders of magnitude each one so when you talk about having something in ram versus networking speed you're probably comparable but when you start talking about hard disk speed versus inter computer inter cluster communication not inter cluster communication then it matters a lot right so if you need to do these eight reads you're reading them all out of ram out of computers cluster communication, not intra-cluster communication, then it matters a lot, right?

Starting point is 00:51:05 So if you need to do these eight reads, you're reading them all out of RAM, out of computers, that can go super fast. Versus if you have to go out to the hard disk to get them, where you get order of magnitude more storage, maybe, or even maybe a lot more, but also order of magnitude slower, you can't keep doing that. And some of those are going to be fetched more often than others. Or if you know, hey, I'm doing this for user, the chance that you get something else from that same user is increased by this first read, then you may want to bring some of that object into a RAM resident storage instead of a disk resident storage. Yeah, that makes sense. Another thing too is, and we talked about this at the very beginning of the show, is your database might be really expensive.

Starting point is 00:51:51 And so you might want to, you know, if someone's just hitting F5, F5, F5, reloading the page all the time, you don't want that to actually, you know, cause your database costs to go up or cause you to have to buy another database machine and add it to your replica. And so it's often way, way cheaper to spin up an object caching node, an extra node to your object caching system than it is to add a node to your database. Cool. So what are the disadvantages? I mean, we talked a bit about,

Starting point is 00:52:19 you can kind of already guess that if you're moving stuff into RAM off of disk or moving it closer, you obviously have less capacity. So if you talk moving stuff into ram off of disk or moving it closer you obviously have less capacity so if you talk about you know having a database of like to to continue jason's example if you have a database of every picture off of every user's phone ever that's not going to fit in as one computer that's going to fit in many many many many computers and so if you talked about having only one computer caching, then, you know,

Starting point is 00:52:46 obviously, like you can't fit all of that. So you have to be super selective. We talked about a couple of those strategies, right? Time sensitive, you know, aggregating data, deducing data, like that kind of stuff to get it smaller and smaller. But ultimately, to get the speed advantage, you really want to limit how much size you have. And normally that's limited to the memory capacity. Now, RAM sizes have been going up dramatically recently. So like we have gotten some advantage there and with SSDs and stuff, but that's still there. The other thing the object caching tends to do is, I don't want to say garbage or a dump, but like it tends to be just a soup of things. So you are putting all sorts of stuff in

Starting point is 00:53:25 the computer to keep it close but your organization normally isn't as clean as it would be in a full like relational database so uh that means that you're not going to have rich queries where you can do filters and scanning all rows and and doing that kind of stuff because it's a disadvantage but it's also it makes sense it's the right tool for the right job. You're already pulling in the results of those complex queries, not doing those queries themselves, or else you just end up back in the same boat you were to start with. The other thing is, if you start storing stuff in memory, you start to have an availability problem, which is if a computer goes down, which even if you have 99.9% reliability, but if you have billions of users,

Starting point is 00:54:05 you know, the computer where that user's data is stored could go down. And for persistent transactional databases, for things like Jason was mentioning before, like MySQL or Postgres, or, you know, things that you might hear about this, there's very intense thought put into how to make sure that no matter exactly what microsecond you glitch or lose power, that, you know, the database state is sort of consistent and maintained. They don't really do the same thing, because that stuff has overhead, it has cost, and no matter how good you do at it, you're sort of risking some amount of loss. And so most of the time, these object cache systems trade off this simplicity for saying, it might go down if you ask for

Starting point is 00:54:45 that key and the key is not present you just got to handle it the client has to know you know what to do or if it goes missing and so you have a lot now some things like we're going to talk about like redis and others do have options for writing out but they're normally much simpler compared to full rd oh relational database management systems. RDBMS. Okay, I almost glitched it when I said it. Yeah, isn't that weird? It's hard to figure out one from the other.

Starting point is 00:55:12 Yeah, I mean, I think that, yeah, you hit it on the head. The persistence in Redis is really for when a machine dies, you want to be able to bring up a replacement machine and move as many of those cache entries as possible so that people don't experience like a hiccup. The purpose of it isn't really for having data that lives forever. And yeah, I mean, you nailed it. I mean, I think another challenge is anytime you have a cache, you run the risk of the cache becoming out of sync with whatever the underlying database is. So imagine you look up what books Patrick's read and you store that in cache. And then while that

Starting point is 00:55:53 cache item is there, Patrick finds another book from Brandon Sorensen that's 2000 pages and starts reading that one. And then when you go back and say, hey, what books has Patrick read now? I just watched the latest episode. You get the old list because it's in cache, right? And so now you have this problem where you either have to delete entries from the cache. So the code that adds books to Patrick's queue would also have to delete any cache entries, you know, involving Patrick. Or you just have inconsistency and you just kind of live with that and you kind of make your whatever's using this system be able to handle that. And so you can actually see this in the real world. Like I know if you've seen this where you end up with weird situations where like sometimes you buy an

Starting point is 00:56:44 item, but the item is not gone from your cart in Amazon. This doesn't happen so much anymore, but I saw this happen a couple of times years and years ago. And that's just because, you know, the different machines at Amazon just have different states. They're not in sync all the time.

Starting point is 00:56:59 Yeah, and so that's one of the, it's sort of an extra complexity you have to think about. I went somewhere, you know, talking about that, I did go to some website the other day and it was like, we've complexity you have to think about. I went somewhere, you know, talking about that. I did go to some website the other day and it was like, we've noticed you have two different carts. Would you like us to merge them? And I was like, oh, huh.

Starting point is 00:57:14 Interesting. Okay. And I think it was exactly what you said, which is that like somehow I was on one machine and then I was in another machine and it somehow like reconciled later that like, oh, you've added the same thing in two different places you know i need to resolve this yeah there's um there's a whole thing we should talk about at some point called um i think it's called crdts conflict resolution i think database table i don't remember what the

Starting point is 00:57:40 t is but oh interesting it's a way to it's a way to have different machines that are writing to the same source and the conflicts are resolved. I want to say automatically. I mean, a better way of saying it is the conflicts are resolved through a policy. And that policy sometimes does what you want. Sometimes it doesn't. But at least it's like automatic. And so, for example, if you use Google Docs and you have a shared doc and you and someone else are typing at the same time, under the hood, it's actually a CRDT. But yeah, if you're writing something in Redis or Memcache, then you have to handle all the conflicts manually. And maybe you have a CRDT on top of it that does it for you.

Starting point is 00:58:20 But someone has to decide what to do when those things are out of sync i would say there's another topic for us as well talking about eventual consistency and transactional and guaranteed and the trade-offs that makes i think that's been a big space of exploration recently yeah totally yeah maybe we should do uh so you know we we actually when we were looking up this show we found we did a database show but it was episode 34 which is uh probably like a decade ago so um so what we're doing is we're going to probably dive really deep into databases and yeah we should definitely add that to the list you know like acid compliance and all the consistency you know guarantees and and the trade-offs there so So under the hood, almost all of these object caching systems are key value stores, right?

Starting point is 00:59:08 And so, you know, you could use, you know, Berkeley DB or Amazon's Dynamo or any key value store, LevelDB, any key value store would, could double as an object caching system, you know, under the hood. I mean, Redis might be using LevelDB under the hood. I don't even object caching system you know under the hood i mean redis might be using level db under the hood i don't even know but you know it's effectively a key value store but it just gives you a lot of nice things on top of that so you know one of the one of the things it

Starting point is 00:59:36 does for you is it supports multi-get or multi-set so you know in the example i gave you know person goes to jason gauchi photos which it doesn't have a name yet but but uh you know, in the example I gave, you know, a person goes to Jason Gauci Photos, which doesn't have a name yet, but they go to this photos thing and they see, you know, 20 photos at once. And so that means, you know, if it's their first time, I need to set 20 signed URLs in one shot. And if it's not their first time, I want to kind of get 20 or get as many as there are. And so, you know, in most of these cases, you're not just looking at one cache entry, you know, at a time. And so they have really nice APIs and also under the hood, really nice ways of getting multiple things in parallel and they make that super fast. Another thing that the object caching does is it lets you set expert expiry times so as we talked about we have this cache sync issue and so um you know one nice way of

Starting point is 01:00:33 of dealing with that or so one sort of hands-free way of dealing with that is just to set an expert expiry time on the cache they look this cache is relevant for maybe an hour. And after the hour, you assume the person logged off Facebook. Use the Facebook example. It doesn't need that picture anymore. Or after an hour, you say, well, there's a good chance something's out of sync. And so I'll just throw it out. And if the person, if I'm regenerating this every hour, it's so much better than regenerating every request, right? Maybe you even set it to a minute and that's still a huge improvement over not having a cache.

Starting point is 01:01:12 So it's good support for, you know, and they will handle when that minute is up, you know, destroying that item for you. The last bit on how this works is, you know, we can't talk about this without understanding hashing. Now, we did a whole show on hashing, so we won't rehash it here. But, you know, you have to, as Patrick said, maybe you're making this complicated query to a database to get, you know, all the books that this group of people have read. And now you need to take that result and cache it which means you need to way to index it by a key so you have to take whatever logic went into getting this data the first time and you have to convert it into some unique identifier so that next time if you make the same query you get the same identifier and so the best way to do that

Starting point is 01:02:05 is with hashing. So you could imagine you could hash the raw SQL query. You could hash a URL in my case that I want to sign. But there's, I think in my case, I actually hashed the user ID, you know, concatenated with the URL because I think it mattered which person you were. But yeah, you kind of, you know, hashing is a huge part of being able to do object caching. So I think the most common one, or at least the first one I heard of, although I think that these first two are probably tied, I'll probably get flack for saying one over the other. Anyways, the first one I heard of was Redis.

Starting point is 01:02:39 And when Redis first started out, it was sort of interesting, because when it was described to me, it was like, yeah, this is like, you know, like a hash map that you just like run on your computer. Just like one computer, like you just stand it up and it just runs a hash map. People can put what they want there and get what they want from it. And it has like a few extra features, but like super said, when we even first covered database, I hadn't the experience of large scale systems and all of these complexities we just described. So I was like, that just seems like a crappier version of a relational database. I too could just write a network server that serves up my hash map. Of course, it doubles in the details with adding other data structures and any sort of backend

Starting point is 01:03:28 or having it split between computers and segmentations and partitions. Okay, anyways. So Redis for sure enjoys a lot of popularity here. Adding other features like PubSub. So if something is modified, you could subscribe to it and sort of know and handle it.

Starting point is 01:03:46 Memcached is another big one that's super, super popular. In fact, I don't know. Jason, do you know which one might be if we had to make a call, which one's more popular? Memcached is much older. Oh, okay. Then I just heard about them wrong. Yeah, but I think the velocity of Redis

Starting point is 01:04:02 is way, way higher. The adoption of Redis, the rate of adoption of Redis is way higher. So I don't know which one's more popular. But yeah, you hit the nail on the head. Those are the two big juggernauts. One thing you mentioned, which I forgot to talk about earlier, is this is multi-node. So you can have a Redis cluster of hundreds of machines, and they're all kind of working together. And when you ask for a key or you store a key, you don't have to think about which machine to put it on or anything like that.

Starting point is 01:04:33 And as opposed to like a MySQL cluster where it's mostly done through replication, the Res cluster is truly distributed, where like a lot of the keys only exist in one of the nodes. It's not like a full replication or anything like that so there's a lot of complexity getting that right so what like an extra 10 lines of code yeah exactly it's just it's a little bit it's like one line of code more than writing a that extension that scans swagger there's a cloud oh a full fully distributed highly available swagger scanner yeah other you know key value store systems uh you know can work here so you know amazon has dynamo google i'm sure has something google cloud has some kind of key value store is it is it called like big big big? No, there's BigQuery,

Starting point is 01:05:25 which is a SQL thing. Yeah, I don't know what the calling here would be. Like you say, I mean, I think it's very common. It's called Datastore. There's a whole like also line of things.

Starting point is 01:05:37 What we've been talking about is sort of like per user, but there's also huge amounts of work for like content delivery networks is that the right acronym cdns and caching and that's a whole we've not really talked about any of it but if you sort of take a lot of their problems we've just described if we talk about like how do you push large volumes like youtube if you want to watch a youtube video all over the world you don't want

Starting point is 01:05:59 that being served from like a single computer's redis database i mean that's not what happens obviously but like that doesn't work right so like you have cdns and like entire distribution and peering agreements and caching data and like all of these things are just something for outside i don't even have a ton of experience they're actually interesting guests i'm sure we yeah i was just thinking that we should have someone who really knows cdns to come on the show? I mean, there's been several companies go public with huge revenues in this CDN space and sort of bringing data to closer to the user, closer to the edge, which we've overused before. But anyways, yeah, I mean, we haven't talked about those. These aren't really solutions for that. But in my mind, like, I'm not sure I understand exactly how the two interplay.

Starting point is 01:06:51 Yeah. I mean, the thing about CDN, I mean, we talked to, we talked to that gentleman about, about doing edge computing, Jackson. We talked to Jackson about edge computing, and we talked a little bit about this, but, but yeah, I mean, CDNs have to deal with this challenge of, you know, I have this image, but it's on, you know, data centers and edge servers all over the world, which, where should I actually get it from for you? Right. And so that, yeah, that, I mean, we should dedicate a whole show to that. It'd be super fascinating. Oh, another one to mention is along the similar lines is when you're on mobile, you know, you often want to cache things on your phone so that you don't have to go to the internet

Starting point is 01:07:26 especially if you're not on wi-fi and so you know that is kind of a whole different level a whole different caching layer because there might be times where like you literally can't get the underlying key so for example you, someone has configured their app to not use any Wi-Fi or not run on, or sorry, to only run on Wi-Fi, not fetch anything on metered data. Like imagine a podcast app. And so they only want the podcast that they've downloaded. So if you don't want to write all of that from scratch, you're saying, oh, have you downloaded it? If so, play it, etc. There's a whole bunch of tooling around that as well, like object caching systems for your phone.

Starting point is 01:08:07 The one I'm most familiar with is the Firestore, which is from Firebase. And so I believe that works on Android and iOS. But there's a whole bunch of these systems for dealing with metered connections and for caching things like that. I want to go back. I have not for fear, but I want to go back i have not for fear but i want

Starting point is 01:08:26 to go back to uh some of our early episodes and listen and we were talking about this if you haven't heard it yet in the the holiday episode or christmas episode but like growth as engineers and like decisions we made a long time ago and how they've impacted us but just thinking through like how i used to view coming out of college databases and then like the richness with like, which you and I are talking about. And I'm by no means an expert. I actually do more embedded work. So databases are far from my day-to-day experience, but just like how much more like that, it's just like one of those things like looking back and you think,

Starting point is 01:09:00 you know a lot about something and it turns out you basically know nothing. Yeah. I mean, that's fascinating. I mean, you know a lot about something and it turns out you basically know nothing. Yeah, I mean, that's fascinating. I mean, the way you described Redis when you first heard about it is exactly what I thought too. And so, you know, when episode 34 went out, our description of Redis probably does not do it justice at all.

Starting point is 01:09:17 It'd be fascinating to listen to that. I bet we'd be, yeah, we would be face palming the whole time. Maybe we should do that as like a charity thing. If people donate enough to charity, we'll suffer through listening to all that. We'll do a Twitch stream of like live listening to ourselves. granted things like you have all these keys each one expires at a different time you have to handle that you have 10 different nodes each one has a tenth of the data yeah and so it's just it's it's it was yeah we both had like really superficial knowledge of the stuff i never took actually they they offer database classes at university but i never took them i was always taking like all the

Starting point is 01:09:59 algebra linear algebra stuff so so so yeah i mean and now in hindsight you know at least if you want to build something on web it's it's uh it's so important to know how how all these things work at a like more deep level yeah i was listening to you're talking about like hidden complexity i was listening to podcasts i don't even remember which one it was and it was talking about someone who did not even like true high frequency trading but just like not low frequency trading on stock markets and was talking about you know like yeah if you want to just describe it simplistically like you get the not the price quotes but the order book like who has outstanding orders and what is the order flow and then you just like reconcile it over time and you watch it he's like except like you get misquotes constantly

Starting point is 01:10:46 and you have to decide like what's true and what's not true so like it's this weird thing that you know for me i like go to the internet and like say what is the price of tesla today um and you get some number right but in reality that number under the hood there's all these outstanding bids for selling and buying and they update like many many many times a second and then there's also like all this data error where you know like you occasionally get misquotes or things get broken or not quite right and you have to like also filter all of that out so like maybe the naive thing you could build in a few hundred lines of code in a few days or whatever, depending on your skill and familiarity. But then there's just this huge long tail of like complexity that you have to work your way down just to like improve it.

Starting point is 01:11:34 Yeah, I mean, dude, that, oh my gosh, I mean, we could spend hours talking about that. But that blows my mind. Like when you say I want to buy something like a share of Tesla, I guess what happens, I'm going to sound dumb here, but I guess what happens is if you say, I want to buy a share of Tesla, you know, like, um, Charles Schwab or whatever system you're using pre-populates a price. And my guess is that price is just set high enough that there's like a pretty good guarantee that you're going to buy. I mean, I know you can be fancy and say like, I want this price, you know, and if I don't get it, then I'm not interested or whatever. A limit order, I think it's called or whatever. But the interesting thing to me is what number

Starting point is 01:12:14 they put there for you. You know, it's like, it's like, how did they pick that number? They're just showing you the last transaction price. the last price a successful transaction was done at oh so well but there has to be some amount of signal processing there right because there could be some anomalous transaction and that would throw everybody off right well but that's that's oh okay well we can't this is this is fire yeah we're gonna stay forever it might be buried here so i'm not the right person. We should find a guest. I'm sure if one of you out there is in the financial industry

Starting point is 01:12:51 and wants to talk a bit about the... I don't think we've ever really covered that in depth here, but that's its own thing. I mean, tons of software engineers work this field every day. Get in contact with us, and we'll look for someone on our side as well. But I don't want to get into... I also don't know. That's not my expertise, but I think maybe a bit more than you,

Starting point is 01:13:08 but that'll be a whole raffle and this will go on for another hour. Yeah. Let's just put the call out there. If you are a engineer or product manager or anybody at a company that does some cool software, solves some really interesting problems, you know, reach out to us. I mean, we had the gentleman Abhay from Anduril who reached out to us and it was an amazing show. It's one of my favorite episodes. So, you know, if you do something cool out there,

Starting point is 01:13:38 you know, reach out to us and, you know, feel comfortable talking to a lot of people. Definitely reach, I mean, I just want to put it out there. I mean, I wish I could sugarcoat it, but, you know, you are talking to a lot of people, definitely reach out. I mean, I just want to put it out there. I mean, I wish I could sugarcoat it, but you are talking to a lot of people. It does make some folks nervous, but no, I think there's a lot of people out there who really would love to listen

Starting point is 01:13:55 and hear from your experience. And so if you do stuff in financial, that would be fascinating. If you do other stuff, definitely reach out. Programmingthrowdown at gmail.com. Wow, that was a great segue into finishing off the episode. Well, thanks, everyone, for listening. Yeah, whether you want to be on the show or you just enjoy listening,

Starting point is 01:14:16 thank you so much for hearing us out. And have a great 2022. See you all next time. Music by Eric Barndaller. Programming Throwdown is distributed under a Creative Commons Attribution Sharealike 2.0 license. You're free to share, copy, distribute, transmit the work, to remix, adapt the work, but you

Starting point is 01:14:52 must provide attribution to Patrick and I and sharealike in kind.

Your Ad Here

Programming Throwdown - 125 - Object Caching Systems

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.