Programming Throwdown - 181: Memory Management

Starting point is 00:00:00 programming throwdown episode 181 memory management take it away Patrick Welcome to another episode of programming throwdown. My topic, I guess it's the opening rant. Yeah. It's okay. I'm curious, actually don't, I didn't talk to you about this before we recorded, so this is live on air. Video game prices. Alright, so everyone's like a different kind of video gamer, They're not a video gamer, casual, whatever. And I get that. But then it's really interesting specifically when we're recording this is right around the launch time of the Switch 2. So we have to say this now because 10 years from now people will be like, what are they talking about?

Starting point is 00:00:59 But the Switch 2 is coming out and there's been a lot of chatter online, in people I see at work and about the price of the system, probably reasonable, but the price of the games were considered more expensive. So sort of like 80 USD dollars to 90 USD dollars. So just for my edification, what's the price of the system? Cause I'm not in the loop on this. Oh, I don't have it in front of me.

Starting point is 00:01:24 Roughly, yeah. Yeah, I think $ Uh, I don't have it in front of me. Roughly. Yeah. Yeah. I think $400. Okay. And what was the switch one? Was it cheaper than that? Uh, I think like $50 cheaper. Okay.

Starting point is 00:01:33 Got it. Okay. It looks like it might be 450 and I think the original switch was 400. That's okay. I don't know because I just opted out, uh, which is part of the part of the rent. So some people are saying, you know, oh, this is, this is inflation or even, you know, when there were super Nintendo and I remember going to a Kmart and super Nintendo games were like, what, they're how expensive?

Starting point is 00:01:55 And it was crazy. Um, yeah, like $70, $80. So people are like, well, you know, we're just, we're just kind of whining. And if you look at the number of hours you get, like compared to, let's say, the price of a movie ticket, then, you know, it's a super, you know, good value or price of going to an amusement park or something, right? Like you can kind of make these value propositions.

Starting point is 00:02:17 And what I realized is like price anchoring, price economics, these things are just so complicated. And even in such like, I don't know, a small chasm of the world, uh, that you can end up with such different logical reasons why the price is good or why the price is not good that are consistent for that individual. And then it just leads like to, to, you know, outcomes all over the place. Like, of course I'm going to buy this and here's my rationalization. Or of course I'm not going to buy this and then here's a made up rationalization.

Starting point is 00:02:50 And then other people who are legitimately trying to, they're on the fence and they're sort of trying to decide. So I have my sort of opinion on which side I feel this comes down on and my logic, but I wanted to hear from you. Like, do you think in general, like launch price of video games being sort of 70, 80, $90? Like, yeah, that's fine. It's worth it or, oh no, that's way too expensive. So yeah, here's my take on it. I think that the difference between a video game and a movie is that the movie has a higher lower bound. In other words, like a movie can only suck so much.

Starting point is 00:03:34 Oh, that's a bold statement. I mean, like, well, okay, I'll put it to you this way. Have you ever walked out of a theater halfway through a movie? Personally, no, but I see very few movies. Yeah, that's true. I see very few movies as well. But, but, uh, you know, in my lifetime, I've been to movie theater, let's say a hundred times, probably more than that, right? My entire life. And I have never walked out of a movie.

Starting point is 00:04:00 I've watched every movie start to finish. Right. But you know, video games, it's actually the opposite. Like very few video games do I really sink the full amount of time into. And there's some video games that, you know, I get, the media kit looks really good, and the trailer looks really good, and I play it, and it's okay. You know, it's not terrible. I don't return it or anything, but it's just like,

Starting point is 00:04:27 it just doesn't hold me and so I let it go. And so, you kind of like invest into the whole experience more than a single game, right? And so, you know, the analogy of like, oh, well, if you take your favorite game, you sunk, like in my case, what's the game I played the most? Maybe, I don't know, Rocket League or Rimworld or something, I don't know.

Starting point is 00:04:58 Let's say I put a thousand hours into Rocket League and I paid 20 bucks or whatever. Now it's free to play, but I paid whatever I paid when it came out. Well yeah, I totally got my money's worth, but that doesn't count all the games that I bought that I wasn't really that interested in.

Starting point is 00:05:15 And so if every game costs $80, now you have this paradox where, I didn't know that I was gonna be really into Rocket League. I wouldn't have spent eighty dollars on it. And so it just it just wouldn't have happened. And so. I think that's the real challenge. So, you know, for example, like I could see Madden football, you know,

Starting point is 00:05:37 it's like Madden football is a known entity and there's a cohort of people who And there's a cohort of people who are going to sink a thousand hours into every Madden football, every single one. Right. And so, yeah, they should probably pay like a hundred bucks. Right. But, but that's like a small part of the audience, I think. I wanted to, I wanted to challenge you, like really a thousand hours when it comes out every year. And then I realized, no, actually, yeah, you're probably right. There are probably people who play 20 hours every week. Yeah, I mean, but even if we knock it down to 200, then you can still say, okay, 200 hours,

Starting point is 00:06:18 you should pay a hundred bucks for that. But it's the problem is you just don't know. So I think I'm on board with your reasoning. I mean, I will say, I guess, to stick up for the other side that not every game will be that expensive. So you can still have games that are probably priced less. But as a prototypical example, I hear you out.

Starting point is 00:06:40 For me, the kind of issue is realizing two things. Specifically that Nintendo prices seem to stay up on their game. So if you sort of look like you said, like at Madden, if you buy last year's Madden, you wouldn't pay anywhere near $80 or whatever. You'd be like 30 bucks, 20 bucks, 10 bucks. Yeah, it's perishable.

Starting point is 00:07:01 They need to play it. Yeah, yeah, yeah. But there are other games similarly that will launch and then just discount over time slowly and assuredly go down. Although famously, Factorio does the opposite, I guess, only goes up. Good for them, breaking trends.

Starting point is 00:07:19 But I think they had a reverse sale once. No, maybe that was Cards Against Humanity. Someone did a reverse sale once. Nice. maybe that was Cards Against Humanity. Someone did a reverse sale once. Nice. Where it made stuff more expensive for a day. But yeah, for me, it's, Nintendo prices seem to like come in high and stay high. And it's exactly the reasons,

Starting point is 00:07:35 when I end up playing them in general, I find them to be very high quality, which mitigates the risk you're saying, where like you end up not putting the number of hours into it. And I don't spend that many hours a week in general. So it's very like a game will either increase the number of hours I spent in the week

Starting point is 00:07:51 or just fit into the budget I already have. But the contrast I had is if you take something like playing on your computer or a Steam Deck or something else, even one of the Android gaming devices now that they have. And you look at the prices there and picking up stuff on deals that are like indie games or whatever and just like games that are again a year or two old, prices seem to be so much lower. So this fact of like the same game released on iOS, Android, Nintendo Switch, and PC,

Starting point is 00:08:25 even though it's exactly the same, have such greatly different prices, sort of like kills it to me to be on the platform with the most premium. Like why do I need to be on Nintendo? It's not the most powerful hardware. It's the highest price. Like if you're gonna buy the same library of games,

Starting point is 00:08:41 you're going to, on general, pay the most on the Nintendo platform. Like it's just very,, it crushes me. I'm like, oh, I love it. I like what they're doing, but it kind of appeals to me as a casual gamer, but that premium, it really builds up if you're going to buy 10, 15, 20 games even. Yeah, that makes sense. It's kind of like Disney World where it's like, what is it, like $50 to park now?

Starting point is 00:09:05 It's like the prices are just getting so out of control and it's like, you know, it is a known quantity. You know that your kids are going to have like a higher lower bound, right? There's a higher minimum amount of fun than if they go to like Six Flags or something. But then it's like so much more expensive. Oh, one game that you recommended as a tool of the show a while back and I

Starting point is 00:09:28 played through, Tectonica has a very interesting story according to the dev team and this is, you know, in the eye of them. So, so, uh, uh, so take, take it with a grain of salt, but, but they basically said, you know, they did But they basically said, you know, they did one of these things, you know how like Epic and other platforms will have a game free for a week? Yeah.

Starting point is 00:09:52 Yeah. So the Microsoft Xbox store has this. And so they did all these free games for a week thing. And according to the devs, Microsoft paid them a million dollars to do tectonica for free for some period of time. And, um, and they took the million dollars, but what they said was the sales were just way higher, like so many people got it for free way more than

Starting point is 00:10:19 they anticipated. And then it gutted exactly. And then when they tried to, they couldn't recover, basically the free week was so good that they saturated the market. And, um, and so they went out of business and so Tectonica is actually defunct. I didn't know this. Yeah. Yeah.

Starting point is 00:10:38 The, the ending, uh, I'm not going to spoil like the, the ending, uh, narrative or anything, but, spoil the ending narrative or anything, but the ending is terrible. Technically, the game just falls apart at the end. It's just sort of unfinished. I actually never made, I ate it pretty far, but I know what the ending was supposed to be around, but I never sort of finished it.

Starting point is 00:10:59 Yeah, the ending is, I didn't finish it either, but I got close enough. Effectively, the ending, it just close enough. There's effectively the ending. It just trails off. It's kind of like, uh, if you lose your voice while you're giving a speech or something, it just kind of whimpers. Um, and it's just because they couldn't finish it. And so to your point, it's like so many people sunk so much time into it, but

Starting point is 00:11:19 there was no way to continue to charge, right? Like that's one thing that a lot of these games do is they charge for a hat, which does nothing. It's just cosmetic. But I don't know if it's psychological or how this really works, but even there's single player games where you could buy a hat. Nobody's going to see it but you. But if you can get some way to get people to pay once they're invested in the game, I think that might be the way to unlock the value.

Starting point is 00:11:51 Okay. So we're way off, we're off topic, getting on other stuff, but the, so the opposite example that I've been trying to figure out, and I guess I, I didn't actually gone and look, but no man's sky. So this was like hyped up beyond anything. It released, people were so frustrated that they had pre-ordered the game and it wasn't as good. but no man's sky. So this was like hyped up beyond anything. It released, people were so frustrated that they had pre-ordered the game and it wasn't as good.

Starting point is 00:12:09 And then now for like, what, six, seven, I don't even know, like a long time, they basically continued to drop massive DLCs, massive updates. Like I tried playing a while, like it's really complicated. I never really got into it, but like, I mean, they've just continued to like completely change the game. And as far as I know, they've never charged for any of those DLCs. They continue to pump out huge

Starting point is 00:12:30 content. They don't have any of the like, you know, in-app purchase stuff going on. So I don't know if they just like made enough money off of the pre-order to basically like coast it in and just do one for humanity. But like, yeah, it's fascinating. It's a, it is really interesting. Okay. Well, that was going to whirlwind time for news. So keeping on the game, game theme, talking about a over 30 year old game, I guess, uh, and super Mario Rose, which is, um, uh, YouTube video actually by

Starting point is 00:13:02 Abyss soft, I think I'm saying it right. Step one, jump in the Lava. In this video, and I've talked about a few of these about like speed running, tool-assisted speed runs, that kind of stuff, is a story about something that happened, I think fairly recently, in the community where someone noticed

Starting point is 00:13:20 on one of the Super Mario levels that they jumped into the lava in one of the Bowser castles and their game crashed. And they sort of like, you know, posted about it. They just kind of moved on. No one really thought too much about it. And then eventually, you know, people picked up that like, oh yeah, there was this really weird condition

Starting point is 00:13:37 where if a certain set of things happen in a certain way in the details of like how enemies are loaded as like the blocks, the kind of sp how enemies are loaded as like the the blocks that kind of sprites that that kind of get put in that dude in this one level out of all the levels where it even potentially could happen there's only this one level have this set up and it would be very very unlikely that you could trigger it but this person happened to to kind of trigger by accident they could explain why and then realize wait a minute there's an

Starting point is 00:14:03 opportunity for arbitrary code execution, which means, hey, you can make the game execute a jump command or a branch command that you can stage by certain inputs. So then through, they call it like tool-assisted speed running, basically bots that like perfectly time their inputs and do things that a human never could from timing or even input standards, they were able to basically put certain values in

Starting point is 00:14:28 the place that would get read and cause the game to just jump to the ending. Um, and sort of like that counts as completing the game for any percent speed runs and then the challenge of sort of like getting humans to, to kind of like, could they do it? Would it be possible? Is there a setup that physically a human could input the inputs to get it to happen? And this is not the only example

Starting point is 00:14:51 of these arbitrary code executions occurring in these video games, but just crazy to me that somebody just stumbled upon it after all these years, how much detail people went into to like debug, you know, the super old code from assembly to understand why it was happening, and then to realize that you could set it up such that the game technically is crashing,

Starting point is 00:15:14 but it crashes in a way that actually sends you where you want into a valid ending of the video game and plays the ending credits. And so just like one of these. That's awesome. I don't know. Definitely not useful to your life in any way, but just like one of these. That's awesome. I don't know. Definitely not useful to your life in any way, but just like a fascinating watch.

Starting point is 00:15:28 And so for some reason, these keep popping up into my feed every so often, even though I've never attempted a speed run. I don't really watch speed running. I don't keep up on it, but I, you know. Have you seen the Final Fantasy 2 one? No. Is it a summoning salt video?

Starting point is 00:15:43 I just like, I watched that one to just go to sleep. It's amazing. It's very meditative. What is that? What is the fun? Oh, go ahead. Yeah, so this one, okay, there's a, there's a spell called exit in most of these Final Fantasy

Starting point is 00:15:55 games. And so if you're in a cave or a dungeon or someplace as dangerous, or even a town, I think you can cast exit and it will take you to the overland, you know, where you entered. So if you entered the town, you know or even a town, I think you can cast exit and it will take you to the overland, uh, you know, where you entered. So if you entered the town, you know, from the right press exit, you end up at the overland one square to the right. It makes sense. Um, there's, there's another spell called war and war pill more confusing

Starting point is 00:16:18 where it will do the same thing, but just for one level. So imagine if like you're you walk down a staircase you're in level one of the mines you go down another staircase you're level two of the mines when you cast warp it puts you at level one of the mines you know at the staircase to level two. See what I'm saying? Yep. Okay so now you can't cast there's no stack so you can't like warp and then warp again until you're out. So you cast warp and then if you try and cast it again, it,

Starting point is 00:16:50 I think it just doesn't work or something. But anyways, so there's, so for each level they need to keep track of like the level you came from and so there's this one room, or they need to put like a sentinel value there to tell you you can't warp, right? But there's one room in a town in one of these villages where they forgot to put the sentinel value. And so when you warp, you go to like some arbitrary place of memory. And because it's all ROM, it's like completely deterministic. So when you cast warp in like this one room in this palace,

Starting point is 00:17:30 you end up just in like random memory and it's just every sprite is randomly generated. So it's like total chaos, but it's completely deterministic. And so the speed run is like get to this castle as fast as you can, cast warp and then like walk in this very as fast as you can, cast warp, and then like walk in this very specific pattern until you get the end credits. That's so good. Yeah. I feel like in another life, this would be my hobby.

Starting point is 00:17:57 Oh, totally. I could see myself spending like a whole lifetime. Yeah. Just figuring this out. I mean, it's like, it's very satisfying. And that is a good segue to what I actually did spend most of my life on, which is reinforcement learning. This is a really interesting reinforcement learning paper. And I'll give a bit of background. So I remember talking to like Trevor Blackwell and like the founders of

Starting point is 00:18:26 Y Combinator. This is like pre-pandemic 2018 or something and saying like reinforcement learning really needs like I said at the time like a BERT moment. I mean this is like pre-GPT pre all that stuff but basically you know BERT at the time was kind of like GPT, where you could embed different sentences and you could finish sentences and everything. And it just felt like a very universal model, like it was multilingual and you could just kind of finish almost any sentence it felt like. And I feel like reinforcement learning needs kind of something like that, where you're not training just totally from scratch.

Starting point is 00:19:13 Like when you start training, let's say a robot arm or something, you would start with a model that kind of understands like, oh, if I swing this arm really fast, that's dangerous. You know, like just that concept could just be there on the first epoch. And it would just understand that, like, high acceleration is dangerous. Now, of course, you could hard code that, right? And that's what people do. But, like, that's just one example.

Starting point is 00:19:38 It'd be cool if there was a model that was, like, just, like, at a high level understood, like, smoothness and efficiency and these things. And I think the community is still, you know, here we are like almost 10 years later, the community is still kind of like marching towards that. And this is kind of the latest paper. This one is a really big advancement in this direction. So it's a very simple to read paper. They paid for a domain name for the research paper, which I think is a really clever way to spend $10 and really boost yourself if you're a PhD student.

Starting point is 00:20:16 I mean, this is like a nice hack, but you could see kind of some cool examples. They have nice videos. The paper is like relatively easy to read. You know, if you get stuck, there's citations, obviously, so you can kind of go back. But without like spending a whole bunch of time on this, you know, they basically are able to build world models.

Starting point is 00:20:42 So I'll explain just, I think we talked about this in the RL podcast, but like, you know, world models are really tough because most of the time you're just predicting like the next word or you're predicting like, should I show this ad to somebody is like, there's always like relatively like small space, like a yes, no question. Um, but in this this case like you have to predict then what the future of the world looks like. And so there's like a

Starting point is 00:21:11 lot of things you have to predict all at the same time and that makes it extremely difficult. A really interesting paper called Dreamer and then Dreamer v2 v3 have come out and this And this is just an incremental improvement on Dreamer, but these increments are getting big. And this one in particular, I think really is inspiring. And so give it a read, if you're into this area or if you want to be into this area, it's a good way to get started.

Starting point is 00:21:43 And I'm really excited to see where all it goes. So the world model isn't something I guess I hadn't really thought that much about it, but the world model isn't something that a programmer writes. It's something that is like learned. So the system itself is trying to. the system itself is trying to, like I guess that's the part that always like leaves like model in this always is a bit confusing to me. So it's not a set of physics equations,

Starting point is 00:22:14 it's sort of like in the same way of everything else as a way of updating the values in the tensors or whatever from frame to frame of the planning in a way that reflects the progression of the tensors or whatever from frame to frame of the planning in a way that reflects the progression of the world. Yeah, exactly. So like take like chess, for example. So like chess, you could argue doesn't need a world model because the game is so easy

Starting point is 00:22:37 to simulate, right? But you still have this problem where if you don't have a world model, then how do you rewind time? Now in the case of chess, you know, maybe you store the state of the board or you store the moves you made and they're all reversible, right? But like for a lot of games, it's pretty difficult to be able to rewind time and ask like what if questions right so if you want to ask me a thousand what if questions you have to be able to ask the first one and rewind back and ask the second one sure sure yeah and so it look at like mario like we talked about mario earlier if you wanted to do reinforce learning algorithm to play mario. and learning algorithm to play Mario, you know, and you want to do counterfactuals. So you want to plan. You're going to have to like say,

Starting point is 00:23:28 okay, what happens if I jump? Oh, if I jump, I hit the flaming rotating thing and I die. So let's go back and try something else. Well, how do you do that? Go back. Now, like you have to implement safe states and it could be really expensive. It's not really practical.

Starting point is 00:23:43 If you have a world model, then your input is just like an input to a neural net, it's just like a vector, and they're super easy to store, to retrieve and everything. So even for like chess and Mario and these games, a world model makes planning practical. Got it, got it. Yeah, and so that's why you you could have

Starting point is 00:24:06 physics. You know physics is extremely lightweight but but most like simulators, emulators are just too heavy weight so you'd build a world model regardless. All right. You said the paper was easy to read so I'm gonna take it as a challenge and I will attempt to read the paper and see if that's true. Yeah. I'm debating whether or not to tell this story, but the primary author was a visiting lecturer at a company I worked at.

Starting point is 00:24:42 And he's super nice, really nice guy. His name is Xiaolong Wang. But his auto-generated email was XLWang. And I thought that he picked that email. So I was like, oh, your email's really funny. I thought it was clever and I thought it was very funny. And he's like, I don't know what you're talking about. Uh-oh, all right, all right.

Starting point is 00:25:10 No, no, no, no. Stop it. Stop it. I was like, nevermind, I'm an idiot. But if you're out there, I didn't mean that as like, insult or anything. I thought it was hilarious. It's still very funny. But, you know, these things happen.

Starting point is 00:25:31 We're in a hard pivot. Clever code is probably... What kind of pivot? Clever... Jason, you're in a mood today, man. All right. Clever code is probably the worst code you could write. This is on a blog code you could write.

Starting point is 00:25:45 This is on a blog, engineers codex. Link will be in the show notes, like always. But here, one of those things where they're making a great point, they have some well-sourced comics that are applicable to this. But just talking about their personal experience that you show up, you're a new coder and you're like, I'm going to be so smart.

Starting point is 00:26:06 I'm going to write the fewest characters, the fewest taps on my keyboard. And then as you, I don't know, I don't want to say become an old programmer, but now it's how do I write literally the dumbest thing that will get the job done? And it's okay if it takes a lot of I mean within reason you ever see those kind of like running jokes on you know X or reddit or something where people are like how to find prime numbers one to a hundred and they put like a giant if statement and yeah like all the all the prime numbers between one and a hundred hard-coded as if statements or something. Okay. Yeah. I mean, there's a balance there. Um, but yeah, for sure, you know, these like very terse, very compact things.

Starting point is 00:26:50 And we've talked about this, you know, on the show, it's not going to belabor it, but more than just observing that that that's bad. They were sort of, which isn't always true. Sometimes you find something that's bad and it doesn't yield to the good thing. Um, it just says this thing is bad and you got to move somewhere else in the design space and try again. But in this case, they were kind of pointing out, which I think is a good call, that actually here doing the almost the opposite thing of the clever code and writing that is almost always a really good decision and harder than you think. And so writing code that works and is effective

Starting point is 00:27:28 than you think. And so writing code that works and is effective and is understandable by the audience, and most importantly, you know, like understandable by you in five years when you've forgotten everything about that code. There's certain pieces of code base that I just like I dread going into because I just know that yes, it works, but it wasn't done with a clarity of thought. And, uh, sometimes that's because you're trying to grow it organically. Sometimes you didn't really know what you were doing until you got done with it. But the places where you knew what you were going and you got in and got done, and you followed, you know, like your, your guiding principles about how the code should be, you know, modularized and segmented and, you know, all that kind of stuff are, are much less problematic to go into and it's one of

Starting point is 00:28:07 Those things where again, it's it's sort of like going the opposite of that, right? So writing really compact lines putting everything in one big function, you know makes it seem like wow This is really good code like look how dense it is But again, like if you think about that was like those places where I know in the code base There's like one really big function that just does a bunch of stuff. It's like, Oh, no one wants to touch it, but you occasionally have to go in and add yet more and it's like, yeah, this really needs to get broken apart and everyone just kind of dodges doing it because unpacking it is just such a heavy burden.

Starting point is 00:28:40 Yeah, totally. I mean, I'm thinking about, I saw this code one time where it used like templates to auto-generate classes. It's like you needed like a float handler and an int handler and a foo handler and a bar handler. It's like, oh, I'm gonna make a template and then I'm just gonna have like instantiate this template like 20 times for these 20 different handlers and yeah just like then you know someone like control F tries to find the function and they can't because it's auto generated and everything yeah it's

Starting point is 00:29:15 like yeah you you were clever and you saved you know maybe like three hours of development time but you added like a thousand hours of debugging time. Yeah, that's cool. Yeah. I think this is like along with memory management and, and, and, uh, you know, instrumentation, this is really what separates, I think the wheat from the chaff. So, I think the wheat from the chaff. So, yeah, this is cool. I'm gonna give it a read. My second news is this open source text to speech model called DIA.

Starting point is 00:29:53 So the article says DIA has arrived to challenge 11 labs, OpenAI and more. Patrick, have you ever used text to speech like recently in the past few years? text to speech like recently in the past few years? No, other than like an AI response service that's like re-saying words that I said, I've not attempted to like have something read to me. It is remarkably good.

Starting point is 00:30:17 So like you can now put emotes. So like in parentheses you can put laughs and like the speech, the character will like the speech, it will actually laugh like and then continue talking it's supernatural it's amazing and 11 labs in particular has this really interesting thing which I highly suggest people play around with it's totally free until you hit a certain usage. But you can have it create a voice just by describing the voice. So you could say like an ogre that's terrorizing a village

Starting point is 00:30:53 or like a woman, a soccer announcer or something. And it will like create that voice and then you can say whatever you want in the voice. So that's pretty mind blowing. And I feel like text to speech is one of these things that like kind of needs to be open source because you kind of need to on the fly, you know, generate a lot of speech. You know, I'm imagining like my house, like if I wanted to have my own Alexa.

Starting point is 00:31:28 Well, like Alexa kind of needs to be able to say anything at any time, and it needs to be kind of real time. And so I guess you could do it with some kind of API charging or something, but for me, it kind of feels like something that should just run locally and be relatively quick. And so Dia is open source and they claim to be comparable with you know all these other

Starting point is 00:31:53 kind of you know leading systems. So and the models also aren't that big like people are used to seeing GPT or even the text to image models that are really big and they barely fit on your GPU. You have to jump through all sorts of hoops. But because this is text to audio, the models comfortably fit on commodity hardware. And so I haven't tried this one in particular, but I'm going to try it later today. And I think we're very close to getting to the point where like, you know, if you ever just needed something

Starting point is 00:32:28 to say something that you could just do it and it's just, it's a total commodity and works great. How does it handle like, not getting a problem with copying celebrity voices. So they have that with image generation or video generation, someone was pointing out, if you say words that are like copyright, like I want a picture of Mickey Mouse, right?

Starting point is 00:32:56 But if you say I want a famous animated cartoon rodent, right, like there's only one of those in the training data. And so it'll, you know, sort of like, there's all these kind of loopholes and workarounds. If you want like a specific person to, you know, can you achieve it by describing them? Or do they also put the same kind of censoring safeguards in to try to catch it late that, oh, no, no, no, this sounds too close to a known celebrity. Like we can't do this voice. Yeah. I mean, what I know is I went on

Starting point is 00:33:25 11 labs and I said basically it was my friend's birthday and I wanted the Unreal Tournament announcer. Remember this game Unreal Tournament? Yeah. I wanted the Unreal Tournament announcer to wish him a happy birthday. So I went on 11 labs and I said give me the Unreal Tournament announcer voice and yeah as you said it

Starting point is 00:33:43 came back and said you know I can't do copyrighted stuff etc. And yeah, as you said, it came back and said, I can't do copyrighted stuff, et cetera. And then I tried a few different things. I tried give me a video game announcer voice from a first person shooter video game. I tried to be kind of generic, and it caught me every time. So what I ended up doing is I ended up having to use a different tool. It turns out there's a website you can go to or someone else has an Unreal Tournament voice.

Starting point is 00:34:12 Uncensored. Yeah. Well, that's what I was going to ask because you could just do like, give it a voice sample. And even if you don't do like voice cloning, say, can you describe this voice for me in excruciating detail and then have it, you know, give all of the idiosyncrasies of the pronunciations and all of the, you know, timber and tone and all of that and then feed that in and say like, okay, now I want these words in this description. But like you said, I, there feels like there's probably still some other mechanism where they work hard to sensor, which is always crazy because it has nothing to do with the actual quality of the outputs.

Starting point is 00:34:50 It's strictly just for censoring purposes. Yeah. I mean, I doubt that 11 Labs sensors or even OpenAI, I don't think they actually go back and look at your picture and then say, whoops, we messed up or do that. So they definitely do for images because this happened I was trying out a new chat GPT image generator which is really good by the way and does a different approach than a lot of the other diffusion stuff and so I had it generate something which I don't know a very famous style of puppets from a picture of my family.

Starting point is 00:35:27 And it got like, you know, if you've seen, if you've tried the chat GPT-1, because it's not, it goes top to bottom. So it starts rendering, you know, top to bottom. And it got half the image like, whoa, this is really good. Like it perfectly kept the background, used a famous female pig, a famous male frog, and you know, a famous drummer.

Starting point is 00:35:45 So like- We're going to lose our sponsor. No, no. Our Patreons are bailing on us. So it got happened. I was like, this is great. I wish I had taken a screen shot. It was like, this is killer, this is amazing. Then as soon as it got part of the way through, it realized it was

Starting point is 00:36:05 generating copyrighted content and then it like it bounced it was like oh I can't do that. Wow that is wild I don't know if 11 Labs does that to be honest I would have said emphatically no but now I'm not so sure. They do support voice cloning I feel like you could probably put like a low pass filter and some other tricks to like you know make a voice not get picked up you know like to circumvent all the fingerprinting. But this DL1 also supports a voice cloning and it's open source so I think in this case is really nothing

Starting point is 00:36:45 preventing you from getting a bunch of Unreal Tournament voice samples, running them through Dia, generating a voice and then having an Unreal Tournament announcer say whatever it wants to your friends. Yeah, so I think that that's a really interesting idea though, the whole post-hoc censorship. Maybe you could take a screenshot quickly or who knows. But yeah, I think that there's another debate around the open source versus closed source situation.

Starting point is 00:37:25 I mean, these open source models are getting really good and it might come to the point where, like there's a lot of things like this, like compilers are an example. So, way back in the day, Patrick and I used this compiler called Green Hills. Patrick, I don't know if you remember the Green Hills compiler. Yes.

Starting point is 00:37:47 Yeah. Are you going to pay up? Yep. That's an example where like you could get GCC open source and for 90% of people it's good enough, but like you need this Green Hills and honestly, this is out of my area of expertise, so I wouldn't even really know what's going on there. But, but like, suffice it to say our company bought the Green Hills compiler and so it might get to the point where like Dia and other open source models are good for like 95% of us and you know you just use OpenAI or one of

Starting point is 00:38:18 these other models when that 5% makes it kind of worth it. So that might be where we end up with all of these. Yeah, I saw, not, we're mixing the news, we can move on, but I saw OpenAI was saying they're gonna release an actual open source model soon. So yeah, I'm curious where the money will be made. Inference, test time compute, is it gonna be the models? Like, I mean, I think like you said, yeah, it's compute, is it going to be the models? Like, I mean, I think like, yeah, it's going to be all over the place.

Starting point is 00:38:49 I'm fascinated by running it at home, but it feels like, I don't know. I see people buying rigs and doing like video generation. Um, but then it's so hard to keep up with. So people did that when they were doing like cryptocurrency mining and they get the latest miners and, you know, and for some people that paid off, I'm not sure there's a payoff here in the same way. If you go out and buy hardware to do all this stuff at home.

Starting point is 00:39:15 Yeah. If you're a business, I feel like renting it is still better. Even if you're choosing to be on an open source stack for, for whatever reason, like renting at least the hardware, um hardware and then using the models and stuff, because it moves so much. You can end up being in the wrong place and sort of behind and really wanting something different than what you have. Yeah, totally agree.

Starting point is 00:39:38 I see no point in paying, like buying this stuff outright when you can rent even an EC2 instance from Amazon and it's only like 90 cents an hour, you'd have to run it for so many hours to justify, you know, buying it yourself and the electricity and the depreciation and all that. All right. Time for book of the show. Patrick, what is your book of the show? All right. So mine's a bit out of left field and by no means, you know, am I a exercising person?

Starting point is 00:40:18 I've been trying to be diligent about the amount of exercise I do over the next last few years and going to the gym. Well, I have gym equipment, you know, just in my, in my, one of my rooms that I, you know, I just heavy objects around for a few minutes is what I say when I'm going to go do it. I'm going to go move heavy objects up and down for no reason. But, but trying not to, to, you know, have the myriad of health problems that can creep up from sitting in a chair in front of your computer all day.

Starting point is 00:40:47 Yeah. And there's a guy on YouTube, Jeff Nippard, who does sort of, I don't know, mostly lie harder content about in the gym. He's very large guy, like in terms of muscle mass, I am not that way at all. Um, but you know, he has some like sort of, let's take a science backed approach rather than just like, you know, go until you puke or, you know, if you're, if you're not, if you're not hurting, you're not doing it hard enough.

Starting point is 00:41:16 Um, and so generally good, good. There's several good YouTube videos, I think for, for watching this stuff. And although these guys may spend, you know, 30 minutes or an hour in a gym every single day, you know, he will also point out there's ways to spend, you know, sort of 30 minutes once or twice a week, if you're really efficient, if you combine them in certain ways. Anyways, he put together an actual, you know, physical book.

Starting point is 00:41:40 He had exercise routines and stuff before, and it's always a bit sketchy for me whenever I hear YouTubers, you know, like, oh, buy my ebook, buy my PDF, buy my, I don't know. I'm trying to think if I've ever done that, if maybe once or twice. Anyways, he had a book and it was on Amazon, like an actual hardback book. It was, you know, reasonably affordable called The Muscle Ladder. The analogy is kind of weird, but you know, talks about principles that are sort of the sides of the ladder and then rungs on the ladder, but also includes a

Starting point is 00:42:11 bunch of, you know, sort of workout routines and also pictures, which is very useful for me because sometimes you just read this like, Oh, you're supposed to, I'm, again, it's not my background, a Romanian deadlift or a sumo squat or anything. I'm like, I don't, what is, what is that? Romanian deadlift or a sumo squat or anything. I don't know. What is, what is that? Like I vaguely know what the word squat means. Um, but so then you're like going on YouTube, trying to find a video to watch. And then the person's like, hi, I'm so and so subscribe and like, and you know, five minutes, you're like, Oh, just show me the, like, what is the thing?

Starting point is 00:42:39 But he has nice pictures for each of the exercises. So just, I find a very useful book as I'm trying to, you know, find a sort of routine that works for me. So people will be like, the weird pic, Patrick, go back to the sci-fi. Yeah, you're probably right. Probably should go back to the sci-fi fantasy pics. But it's a book that I have actually been reading.

Starting point is 00:42:59 So I guess it's a full transparency. How have you changed your routine? Like, are you lifting heavier things? Are you in the heavy lifting room longer or what? Uh, I think it's finding like what, what exercises kind of pair well together isn't obvious to me. So I'll tend to just go and do whatever exercise I know. Um, but there are often when you read these a lot of variations on the exercise

Starting point is 00:43:27 to target slightly different parts of like for me it's like I'm going to squat why am I going to squat well I want stronger legs it's like okay well that's that's very crude like what part of your leg what muscle what muscle group like which, like which compound, like, uh, you know, this kind of squat targets, you know, your glutes and your hamstrings and this way, this one does in a different way. And so learning alternatives to the exercise so that after a couple of weeks of doing it one way, it's not just like keep doing that one and lifting

Starting point is 00:43:58 heavy, which is a philosophy. And that's kind of what I've been doing. Like just try to do the same thing over and over, but slightly heavier or slightly more, but actually also throwing in variations every few weeks where the muscle you're working in complement with that is slightly different. And so personally, that's one way that this kind of stuff has helped me. And also that keeps it from being super boring. Like for me, it's really boring to just go in there and do like exactly the same thing every time. Yeah that makes sense. That is super cool. My book this show needs a little bit of intro because this is one of these ones where like this could easily

Starting point is 00:44:38 degenerate into us getting all sorts of hate mail. Which we do by the way get hate mail. We don't have sponsors so no one can can can pull out of the show but we have gotten hate mail, which we do by the way, get hate mail. We don't have sponsors, so no one can, can, can pull, pull out of the show, but we have gotten hate mail before. But the reason I got to this book is because I was playing crusader Kings three. Have you ever played this game? I own this game that I bought as part of a humble bundle and I would love to play. It seems fascinating. I I'm nervous.

Starting point is 00:45:04 I haven't started. It's one of these games. You're like, wait a it. It seems fascinating. I'm nervous. I haven't started. It's one of these games where you're like, wait a minute, it's 4 a.m., like what just happened? Oh. And- It's paradox, right? Yeah, yeah, all these paradox games. It's actually to the point now where like,

Starting point is 00:45:16 if it's a paradox game, you basically need a mod that will speed the game up because I don't know, maybe this is like conspiracy theory or whatever I feel like they purposely don't have the fastest setting on these games because they want you to spend more time because like like this game like like you could play it at a much faster speed than the fastest one I feel like and still be productive but and it's Stellaris, you know, same thing. It's like, I feel like I'm always on the fastest possible.

Starting point is 00:45:49 Which one of those is the best? Sorry, before we would be like, if I'm going to start, I want to play, I have a few, but I want to play one of these paradox games. So is the easiest. Yeah, what I would say is you should play based on the theme. Like, do you want to be a medieval king

Starting point is 00:46:04 or do you want to be like a space empire? That's really the decision you have to make. Oh, so Stellaris, I basically play it, and Crusader Kings, I play both of these games as if they're turn-based, where like, I want the speed to be infinitely fast until something interesting happens, and then I want to pause, right? And so what I really want is a turn-based game and not like these like real-time games. But the gameplay and everything, the story and the theme and the setting is so good that like it makes up for what I think is actually not that great gameplay. But the story is amazing. And so I was playing Crusader Kings,

Starting point is 00:46:42 very emergent, love it. And I just got to thinking about war and, and going to war and like, why do people go to war and like, why, why do people fight each other? And, uh, you know, cause Crusader Kings is a very deep game. It's not just like, Oh, I'm just going to get to the point where my army is bigger than the army next to me and then just kill them. Like you would in like Starcraft or something. In Crusader Kings, there's like all this diplomacy and all this stuff, right? And so I started thinking about like, why do these people even go to war?

Starting point is 00:47:16 Like why would these people fight for you? And so that let me down this rabbit hole where I ended up reading this book called The Metaphysics of War. And they talk about basically like historically how people were motivated to go into battle. Like imagine if there's a battle and you have a hundred people and the enemy has a thousand people and you're probably going to die. How do people actually fight those battles? People did obviously run away and everything too. This book talks about how in the medieval times, around the times of Crusader Kings, you know, they would talk to these Crusaders and say, look, if you die in battle, it's actually even better than

Starting point is 00:48:09 winning because you get this like special place in the afterlife. Whereas if you had won the battle and then just, you know, died of old age, you actually, it's not, you don't, you end up in like a different part of paradise or whatever. Now all of this, like we're not making any, this is what I was worried about, the hate mail. We're not making any moral claims. We're not saying that's right or wrong or anything.

Starting point is 00:48:32 But just understanding what the mentality was at the time and everything was really interesting to me. So it's a pretty deep book. It's something where you have to kind of want to know answers to these kind of questions and you have to be a bit put a bit of your historian hat on. But I found it interesting. And yeah, I thought I thought it was kind of cool. I don't think I'm going to be reading any more books. I feel like I've kind of covered what I wanted to know.

Starting point is 00:49:05 So I'm probably going to move on to a new topic, but, uh, uh, but I thought it was kind of cool and I just finished it yesterday. So. Yeah, I mean, yeah, this is a tough topic and yeah, like without making moral statements, I mean, in general, it's tough, but it doesn't seem to be going away. So war, war comes and goes. general, it's tough, but it doesn't seem to be going away. So war, war comes and goes. And yeah, I, I, I guess I consider myself lucky that I

Starting point is 00:49:31 haven't had to think too much about it. Um, yeah, maybe if I played Crusader Kings, I would have thought more about it, but, uh, yeah, this is, uh, yeah. I mean, playing a video game and then being like, I think I'm going to read a book. I feel like, uh video game and then being like, I think I'm going to read a book. I feel like, uh, this might be like peak Jason here might be like the nerdiest thing I've ever done.

Starting point is 00:49:53 But, uh, um, yeah, maybe I should play a video game like Tetris or something. All right. Time for tool of the show. All right, Patrick, what's your tool? No, it's a game. All right. Time for tool of the show. All right, Patrick, what's your tool? That's a game. All right. All right. So Jason alluded to this earlier as far as money making schemes in games, I guess. So I'll say the game,

Starting point is 00:50:17 but then I'll give my caveat to it. So that is the Pokemon trading card game Pocket. So I've been playing this actually with my kids. I, to be clear, neither me nor my kids have spent any money on this game. It is a free to play game. But it's one of those games that has timers and you have to check back in every so often.

Starting point is 00:50:39 And there's this thing that's built in that I wanna talk about, but I had a Pokemon game for Nintendo Switch, actually, funnily enough. Although older last time now, this trading card game. And in general, I've gone through phases where I've collected cards. And then I never been super into it, but always off and on. It's intrigued me. But I've really enjoyed this game. I feel like it's reasonably generous as someone who doesn't play free to play games normally, or at least not this style. But

Starting point is 00:51:09 you know, each day you can open, you know, a pack of cards, and then you know, you add them to your collection. And then it, you know, does all those things that feed into that certain part of your brain that's like, completing the collection, you got a new card, you got a rare card, and sort of building it up. But it has been interesting to explain to my kids, like even if all of, you know, let's say there's 50 cards to collect and you open a pack and it has five cards in it.

Starting point is 00:51:35 And even if all the cards were equally likely, people do this, oh, I opened 10 packs and then I have 50 cards, 50 unique cards. It's like, no, that's not how it works. If you actually look at it to get 50 unique cards where cards can occur, you know, with an even distribution, you need a lot more than 10 packs because that last card, even if they're not juicing it, even if they're not having rarity, even if they're not cheating you that last card, and you don't know which one it'll be, it'll be sort of different for everyone, but that last card,

Starting point is 00:52:08 just statistically, the time to a complete deck for some people, like 10% of people, is going to be just enormously large number of pack openings they have to do. And so it really like, if you're going to play it, you have to have this attitude of like, it kind of doesn't matter. Um, but I will say they have, uh, a competitive game. The competitive game is not super strategic. You can just kind of play around. They have single player battle events that come up and you can play against the computer, which is kind of interesting.

Starting point is 00:52:38 So it fits in that, like, you know, for a few minutes each day, you can kind of open it up and do your thing. Uh, people do the same with Duolingo and their Duolingo streaks of, you know, for a few minutes each day you can kind of open it up and do your thing. People do the same with Duolingo and their Duolingo streaks of, you know, oh I've been every day for 200 days I've been practicing this foreign language. It's not clear they actually like know the foreign language but it's just like certain thing of like doing the same thing over and over in streaks and so if you if you're interested if that at all appeals to you, they have iOS and Android.

Starting point is 00:53:09 Again, I don't spend any money on it. If you're prone to wanting to spend money to complete something or to have the best or to have the ultra special rare cards that you see online, probably stay clear. But in general, that's not my particular vice. So I can play it and just put it down and enjoy it for the enjoyable time it is without spending money.

Starting point is 00:53:31 That makes sense. I saw this crazy stat relate to what you were just saying. If you shuffle a deck of cards and then go through that permutation, that it's like 99 and there's like so many nines after that percent chance that nobody's seen that order before. Yeah, so if's like 99 and there's like so many nines after that percent chance that nobody's seen that order before.

Starting point is 00:53:46 Yeah. So if you like perfectly shuffle, which is actually really hard to do, if you perfectly randomly shuffle a deck of cards yet, you're probably the only person in all of time that has ever had that permutation. Yeah. It's so wild. It's just crazy. Yeah.

Starting point is 00:54:02 Yeah. All right. My tool to show is 5.4, which is one of these small language models. So there's, you know, everyone knows GPT and all these things. There's another category of small language models. These are basically, you know, imagine if you were making a video game and Crusader Kings, you're making Crusader Kings 27 and you, and you actually want an LLM to like generate the dialogue, right? Speaking of stuff, but you know, you would want that to run on people's computer.

Starting point is 00:54:37 And so there's this question of like, you know, what can, when can we get to the point where you could deploy a desktop app with an LLM and be confident that a vast majority of your market could actually play it? It's kind of like Crisis or Cyberpunk. When these games came out, they were virtually unplayable because the graphics were just so intense. You had to have such an intense graphics card. But now you know a commodity

Starting point is 00:55:08 computer like you could buy a computer from Best Buy and you'll play cyberpunk at Ultra right. And so you know we're marching towards that. 5.4 is the five models are just extremely extremely high quality and I just can't stress this enough. I've been playing with five for a lot this week and it is right up there. You know, benchmarks are just so easy to game and, and so, so, you know, benchmarks say whatever they say. I don't even know. But, but when I use five four, I find it's comparable with all the best

Starting point is 00:55:41 models and, and, um, it's very small. Um, I think it's very small. I think it still requires a GPU, although there is something called Bitnet, also from Microsoft, which shrinks these models down so they can run on the CPU. And of course, you know, that, then it starts to get to the point

Starting point is 00:56:01 where it is sort of degraded performance. But as I said, you know, this is a march and so we're not there yet as a community, as an industry, but we're getting there. And so it's really cool to kind of follow along and see just how powerful some of these small models have become. So this model can run comfortably on a GPU at your home and it does all the multimodal stuff. So you can give it an audio and say transcribe this. You can give it a picture and say do OCR on this and it actually will blow away almost any OCR software. Like I compared FIFOR to Tesseract, to EZ-OCR,

Starting point is 00:56:41 to TR-OCR which is also transformer based and 5.4 just blew away all of them like it wasn't even close so so that it's just amazing that you can just spin this up on your computer and so we're at the point now where like if if you could if you're willing to limit your audience to just people who have Nvidia GPUs, you can roll out LLMs as part of your desktop software, which I think is pretty amazing. And I think it's only a matter of time before that bubble, that pie, can grow to cover everyone with a GPU. And it only a matter of time before it can grow to cover everyone with a computer. So we're heading there and it's exciting to watch.

Starting point is 00:57:31 I'm curious what, you know, from a pricing cost standpoint, if there'll end up being some blend where, yeah, like you can run one of these 5.4 locally embedded in your software or even, you know, it's just like installed as a DLL equivalent on your computer and, you know, various things can, can kind of reference it. But then also the self-awareness that like, like you said, oh, the OCR is like pretty good, but that you see examples on edge cases where the OCR via the, you know, that these models will produce coherent but wrong text. So if the text is just way too blurry, it'll just kind of assume what was

Starting point is 00:58:09 probably there and, you know, spit it out. And the ability to recognize like I'm beyond my abilities, like go, go to the cloud and ask a bigger model or, you know, come back and, you know, get a sort of blended, blended way. I still haven't seen much where, you know, and people call hallucinations. Not exactly that right. But just like the self-awareness of how good the model thinks it did. It just seems to always assume it's doing really good.

Starting point is 00:58:37 Yeah. You know what I would do there? You know, if I was, uh, if I was tasked with that today, you know, there's these models have a temperature parameter. And so I would crank up the temperature and then I would run the same model like 10, 20 times to get an ensemble of answers. And then if they don't agree, then I would go to the bigger model. Yeah, that's interesting.

Starting point is 00:59:02 Or I wonder for some of the the OCR stuff specifically, like eventually we get to a approach where you either in the model or, you know, whatever kind of project back the text, right? So like, Oh, read the text, project back the text. Like, is there agreement here or not? Yeah. You kind of like ask it to run itself backwards again, you know, and. Yeah, that's one of these things that people don't realize, but like verifying is actually

Starting point is 00:59:30 like, cause some people will say like, oh, well the model that generated the bad answer clearly can't verify, but it actually can because think about it in terms of like, like, uh, like a watering your garden. So like, if you turned on all the zones at the same time, then like it wouldn't really work, or it would work, but it would be very low pressure.

Starting point is 00:59:51 And so you wouldn't get the same kind of watering. But like, if you're asking it to verify an answer you've already generated, that's kind of like only watering one zone, you know? Because it's like easier to verify than to have to generate the answer. And so it'll actually do a much better job. Yes, there, there is this, I've been working through this.

Starting point is 01:00:13 I want to make sure we move on so we don't run too long, but I have been starting to use some of the code AI stuff. And it's one of the things that I realized is you, when you interact with another person, you would normally have some distribution model of how good they are at what you're asking them to do. Can I give them a high level complex task and they'll accomplish it? Or do I need to help them make an outline? So if I'm talking to another person at work and be like, hey, I want you to do this, but can you come back to me with a plan once you've sort of like thought about it and looked at it?

Starting point is 01:00:45 Let's talk about the plan. Then you go work and then come back to me and like, let's do a check-in, you know, from time to time. But we don't treat the LLMs that way. We just give them arbitrarily complex tasks and just let them go. And so this thing that you're mentioning, people were pointing out that it can often be helpful to say, first, what I'm looking for is a plan on how we could accomplish this.

Starting point is 01:01:08 And then, you know, let's get that plan really, really good. And then once you have a plan, then now like the plan is good, let's go execute on it. But let's do it step by step. Like I only want you to do the first part of the plan and then only the second part and then only, you know, and so by doing it that way, you to do the first part of the plan and then only the second part and then only, you know, and so by doing it that way you end up getting more points of interaction, but also, you know, better quality, even if you just auto-approve everything that it,

Starting point is 01:01:34 that it says and you don't do that interaction. Yeah, yeah, totally. That makes sense. I don't know. I, maybe we don't anthropomorphize the LLMs enough or too much. Like I'm not sure where in the spectrum we are. And then of course they're always changing and getting better. So, you know, whatever you were doing before doesn't always work in the future. Yeah, we should definitely do a show on co-pilot. That's a desperate show that needs to happen. How to talk to your computer. How to train you.

Starting point is 01:02:01 Are we eventually going to have to have like the right way? Yeah. Like communication skills classes with like appropriate ways to interact with your chat bot. Oh, interesting. Like those, those communication classes in gen ed that I thought were useless, they're actually useful. I can talk to my computer with them.

Starting point is 01:02:21 Okay. All right. Well, let's just, let's just hard pivot again. There it is. I did it again just for you. All right. Memory management. All right. Well, let's just hard pivot again. There it is. I did it again just for you. All right. Memory management.

Starting point is 01:02:28 Memory management. So I suggested this topic. I feel like Patrick is going to know a lot more than I am, but I will start by talking about why I suggested it. We were having an issue at work where we um, you know, we were just blowing up on memory and the pod would die and very, this is like not an uncommon thing, um, where you run out of memory and then the OS just starts like machine gunning your program. Um, and, uh, you know, it made me realize like, you know, as, as one of the more senior kind of like engineers or developers at the company that like, this

Starting point is 01:03:11 is a skill that's not very accessible. Like there's aren't a lot of people who know what to do when things run out of memory, like, like, uh, you know, a lot of people might just start randomly turning stuff off or you might have some intuition, but like, what's the principal thing that's imagine like you walk into a code base, you're at McKinsey or Deloitte or something. And you're, yeah, this is like the job from hell I'm about to describe, but like your task is just walking into some random company's code base and

Starting point is 01:03:42 reducing their memory footprint. Yeah. How do you go about doing that in like a principled way? And how do you maintain sort of the right knowledge about your system so that it doesn't get to the point where it's blowing up and everything? So that's really the motivator. And so I think we'll dive into a bunch of different things that you can do both to like catch, detect these things and also to mitigate.

Starting point is 01:04:15 You guys got a better explanation of that topic. I misinterpreted it to mean just broadly all the things about memory management, because to me, memory management is as a C++ program or something that is a uh, haunts your, haunts your dreams at night. Um, but, but I don't know. I think, I think we're gonna do a little bit of a dabble, dabble across, across boats.

Starting point is 01:04:35 I mean, they are, are interconnected and I think your statement is interesting. I hadn't really thought about it that way, but when something's running too slow, the ability to make it go faster, it you know maybe is more at hand for people. You study that in school right, like big O notation, algorithm, so generally people have some experience in sort of like data structures or algorithmic approaches to improvement of their code. And then there are levels past that, but people can kind of see to them because they're near to them. So there's things like optimizing for assembly code generation or optimizing with SIMD or GPU offloading.

Starting point is 01:05:21 These kinds of things that you can do multi multi-threading the right. Like, but I feel like people are exposed to those a lot more than like you said, the tooling is more under or not used as often for a lot of folks. And just in general, most people don't think about memory because it's sort of like a pass fail thing, right? Like you run your code and if it finishes, you know how long it took sort of intuitively because you were waiting for it, but you don't really know how much memory it used. Right.

Starting point is 01:05:52 I hadn't really thought about it, but yeah, I, I, I guess you're right. Like it's sort of one of those, uh, you, like if, if your program busts out the memory size and you don't know of an obvious thing that you were doing that you could just not do that thing, then yeah, what do you do next? Yeah, so maybe you should start by kind of explaining to people like how memory works like in a program. That'd be like a good way to keep it off. So you go to Best Buy and you buy the kind of DDR RAM that your motherboard supports and your CPU. I mean, it's complex. Even that part, right? That's true.

Starting point is 01:06:31 Dude, it's like, it's all the memory technologies. Memory can also be talking about L1, L2 caches, RAM. Most of the time we're talking about RAM. And then as Jason mentioned, if you're in a multi-user situation where you're submitting something on Kubernetes or Docker or whatever up into a cloud, then have a hard memory limit because they want you to play nice with other systems. I like to motivate my memory. Okay.

Starting point is 01:07:03 All right. We're looking at meme pictures while I'm trying to talk now. Okay. All right. We're looking at meme pictures while I'm trying to talk now. And who said we shouldn't Twitch stream? Okay. It was me. All right. You're distracting me. Okay. So the idea of memory is when your computer is, you know,

Starting point is 01:07:23 there is a certain amount of things that fit into CPU registers. So the little bits of variables, but once you start having strings, arrays of objects, these things aren't going to fit in the registers, the sort of like very, very small, very efficient, what your computer CPU operates on. So they may be stored out in L1 or L2 cache, but ultimately that cache gets filled in from your system memory, your RAM. And that RAM is relatively slow to access, but way faster than going out to disk, even an SSD. So SSDs, hard drives are pretty slow,

Starting point is 01:08:01 RAM's a lot faster, cache down to registers. So fitting in memory can be about staying in the cache size because you get a lot of speed up. But then more like what Jason is talking about is just how much total memory consumption your computer has. So if you're like me, one of the first times you bump into this is you write a recursive function and you screw up the terminating condition.

Starting point is 01:08:26 And so your computer just keeps every time it recurses into the function, pushes all of the state into the stack and just keeps pushing and pushing and pushing into the stack. And eventually your stack grows too big and your computer crashes and it's saving some system state into memory. But the other way you can run into it is as you're reading data in, if you are deriving a lot of data and just keep adding to it, then you will crash.

Starting point is 01:08:59 And how you handle that can vary a lot. But ultimately, the memory management happens at several levels. So your program that you're writing, depending on the language, the compiler, the runtime, is tracking the objects. Unless you're in sort of like C++ or C, in general, it's trying to free up as much memory as it can as it goes along, as you're not using it anymore, so that the longer your program runs, it doesn't just consume arbitrarily more memory over time.

Starting point is 01:09:34 So in C++ or C, you kind of need to do that yourself, or some people will handle that with shared pointers, unique pointers, things that will, so-called smart pointers to help them out. But in languages like Java or something else, like maybe Python, you'll get garbage collection. So the runtime is trying to monitor, hey, no one's referring to this bit of memory anymore, so I'm going to go clean it up and kind of organize it. In Rust, which is becoming more important,

Starting point is 01:10:05 that is done through ownership so that it's more efficient, but still kind of done for you. And there's careful tracking of who owns the memory and when they don't do it, not using it anymore. But once you go past that, ultimately getting memory, so when you allocate memory, either through an allocation command

Starting point is 01:10:24 or just you ask for something and it generates the memory for you. So not a fixed sized object, but something that you're dynamically allocating. That's a call to your operating system. And so your language will have a way of asking the operating system for, hey, I need, I need some memory. And your operating system is trying to give you blocks of memory that you can use and space in that. And then when you go to free it, it's trying to keep it, you know, coherent and together, but also across all the programs running. So the first thing people will do is, you know, open their task manager or, you know, thread monitor and look for

Starting point is 01:11:06 the operating system's record of how much memory your program is taking. Because ultimately, like Jason was saying, his pod got killed, right? What he's saying is the operating system, the hypervisor, is monitoring memory consumption by how many times the program has asked for more memory. And when it hits its limit its job is to protect itself and all the other programs running and will terminate your program and force it to free up all of its memory because it'll terminate it and then go okay now I can free that memory and so it'll free itself up so

Starting point is 01:11:38 watching the sort of first course of action would be monitoring what the operating system is reporting as how much memory is being used. So I think probably the most obvious one, but you know, walking through the steps, that's where you're going to find that. Similarly, looking when your job finishes at a log of what's called the high watermark. So the high watermark represents your computer, your program was hopefully borrowing memory and then giving it back as it wasn't needed anymore. And so you may use a terabyte of memory over the course of your program running,

Starting point is 01:12:17 but if you were only using a gigabyte and then as you didn't need it, you were freeing it and getting another gigabyte, freeing it and getting another gigabyte. freeing it and getting another gigabyte. We can talk about better ways of doing that maybe later, but you could go through a full terabyte but the high watermark, the largest amount of memory you ever used was only one gigabyte. And the way to know that is to either run something along with your job that is monitoring

Starting point is 01:12:42 what the operating system is doing. There are other tools for that, more built, more built-in, but basically monitoring that high water mark and then saying I want to stay below, you know, if your pod has 16 gigabytes or whatever, saying hey I don't want to get more than 13 or 14 because if I'm getting up past that there's a strong chance that unless I know very clearly what the max is gonna be it'd be very easy for a spike to cause it to bump over the 16 gigabyte and just get force killed and so if you don't want to do that you would sort of monitor those so the easiest way to tell how much your program is using in

Starting point is 01:13:17 general is looking in the operating system. Yeah that makes sense and like as a as a you know I just as a desktop user, sometimes you've seen this where, you know, maybe you've had some program go off the rails or or whatever, but like you it uses all of your RAM and then you find like you can't even move the mouse. And like everything's kind of deadlocked and you're kind of hosed, you have to hit the hard reset. Well, like, you know, on Amazon, they don't want to do that. Like Amazon doesn't want your program

Starting point is 01:13:49 to cause the entire machine to like lock up and potentially ruin other people's programs who don't even know you exist or whatever on that same machine, right? And so they will really aggressively, so the way it works when you're running in the cloud is you say, like, I promise you, Mr. Cloud, I'm only going to use 10 gigs of RAM. And what happens is they constantly monitor and

Starting point is 01:14:17 they can't actually prevent you from using 11 gigs because of just, I guess, the history of compute and languages and everything like they can't actually stop you so so what they do instead is if you go over the 10 gigabyte mark for more than a few seconds it just kills your entire pod and that's their way of ultimately protecting themselves or even sometimes there are situations where and this doesn't happen very often, but you can imagine, let's say you and two other programs are sharing a pod, sharing a physical machine,

Starting point is 01:14:53 and those other two programs both go above their watermark, and that causes the whole machine to start to run out of memory. It might just kill all three of your pods, and so that wasn't even your fault. So that does happen, but it will definitely happen if you go over the limit that you claimed when you started the program.

Starting point is 01:15:14 And so also those limits decide how much Amazon charges you. So it becomes really important to have tight bounds around your memory. Yeah, the thing you point out is a good point. So when you're running on a server, there's one set of behavior. When you're running locally on a desktop, not in a way that replicates the server,

Starting point is 01:15:38 you can sometimes bump into an issue where, like you said, your whole computer slows down. That's one thing. although I feel like modern operating systems have gotten better about protecting themselves by limiting that from happening. But what they will do instead, because they wanna try to let your program finish, is they'll start giving you,

Starting point is 01:15:58 telling you that you have more memory, but that memory is actually coming off of disk. And so what they'll do is they'll say, hey, here's a new chunk of memory. They'll take your old memory, actually coming off of disk. And so what they'll do is they'll say, hey, here's a new chunk of memory. They'll take your old memory, put it out to disk, it's called paging. So they'll page your memory out to disk, and then they'll give you new memory that you can use and you're writing to that. And then you go try to look up something in the other address that was corresponding to what got wrote off disk. And it'll go, okay, hang on, I'm going to put your current one on disk and bring that one back in.

Starting point is 01:16:27 That happens transparently to you, except your program sort of gets paused waiting for that swap to happen. But you'll hear that called thrashing. So it's thrashing because it's writing these pages in and out of your disk and you have to kind of wait. And so your program's runtime just goes exponentially longer because it's sort of stuck doing this

Starting point is 01:16:48 over and over and over again, with the pages coming in and out and in and out. But you wouldn't necessarily know that that was happening because nothing would sort of indicate. And it does this because you may have something like your browser open that's taking, you know, lots and lots of RAM, but you're not using it right now, you're doing development. And so you actually want the operating system to be able to take all of that browser memory and put it out to disk.

Starting point is 01:17:13 And until you go click on it again, it's sort of just hanging out, chill on disk, and you can bring it back and it pops right back up. It's a little slow. So if you ever switch back to an app when you're using a lot of RAM and it's sort of like sluggish in the beginning, this is what you're experiencing.

Starting point is 01:17:28 The operating system is trying to let everyone have their cake and eat it too, and this is the trade-off that they make. Yeah, totally, totally makes sense. So I think, yeah, you totally hit the nail on the head. The first thing to realize is that you need to pay attention to both the high watermark and just the average memory consumption and the variance. So if there's really high variance,

Starting point is 01:17:55 when you run on the cloud, you can choose like a minimum amount of memory to reserve, and then you can choose an upper bound. And the way this is implemented in the cloud is, you know, you're guaranteed to get your minimum amount of memory. Let's say you ask for four to 10 gigs. You're guaranteed to get four gigs, but just, there's other people sharing that machine. And so it's really just whatever they're doing.

Starting point is 01:18:22 So if you say, I want four to 10 gigs, it'll give you four. And if you go over four, it'll check and see, okay, what are the other people on this machine doing? If they're not using all the remaining RAM, they'll give you another gig. And so your process continues and it gives you, you're up to five gig now and you're paying for five gig and then you can go six, seven, eight, all the way up to 10, right? And it's not linear, but you get my point.

Starting point is 01:18:52 But here's the problem. If you're at five gig and you go over and it says, oh, I can't give you six because there's just nothing left on this physical machine. Well, guess what it's gonna do? It's gonna kill your pod. So for a lot of web services they actually set the minimum and the maximum to the same number and this way you don't have to ever worry about

Starting point is 01:19:17 growing. You're just statically getting 10 gig and then you can use 0 to 10 of that at your leisure but you never have to worry about getting killed because you can't grow. And so but in a way that's kind of just like moving the problem somewhere else because you can obviously get killed for going over the max. So it all gets pretty complicated which is why you know it's it's it's an extremely important skill as an engineer to be able to be on top of that. I'll talk about what I do on the Python side and then I'll let Patrick jump into the C and C++ side. On

Starting point is 01:19:59 Python there's two tools you really should become familiar with. One is called PSUtil. And this is a cross-platform tool, despite the fact that when I hear PS and PSUtil, I immediately think of Linux. But they have total Windows support. So total cross-platform, it will tell you your CPU usage, the machine's CPU usage, memory usage, et cetera, et cetera. If you're running in the cloud, it will give you the usage of your pod.

Starting point is 01:20:34 So, you know, the machine could have 128 gigs of RAM. If your pod only has 10 gigs of RAM, that's what is gonna show up in PSUtil. And so you know you could imagine doing things like taking the output of PSUtil and writing it to Datadog or writing it to a Postgres database somewhere where you go and look at that. This is all very common. Another tool that's really important is TraceMalloc and so you know PSUtil gives you this snapshot and it asks the kernel for this information so it's relatively cheap like maybe even free. TraceMalloc is not cheap and so

Starting point is 01:21:20 what TraceMalloc does is it you you say start tracing and it keeps track of all the memory allocations and it builds like a ledger of all of that. And then you say stop tracing and it freezes that, you know, probably converts it to some other data structures that are easier to manipulate. And then you can go through and get high watermark and a bunch of other interesting statistics. You can get like, you know, what are the files that generated the most memory consumption? That's extremely useful. You can also filter that down and say, you know, I want the files that generated the most memory usage, but I don't need like torch.py. Because like clearly PyTorch.py generated like 99%

Starting point is 01:22:09 of the memory if I'm running an LLM, but that's not helpful for me, right? So you can filter it and everything. Both of those are amazing tools. Every time I start a project, I always have some way of running those tools pretty easily. Some type of util library with a context manager so that I can do trace malloc and psutil at will. And that's pretty much all you need to get you most of the

Starting point is 01:22:36 way there. I guess the other thing I would add, which I think is universally applicable, is to have a place to send that information. So at the company I work at now, we use Datadog. In the past, I've worked at bigger companies which had bespoke solutions. Oh, I did use Prometheus at another company. A lot of different options, but just having a way to collect and aggregate that information is really useful.

Starting point is 01:23:09 I haven't actually used either of those, but you know, it is important, like you're saying, actually making sure you have a good way of. At least having a special mode where logs are coming out with priority. So even if your pod gets killed, you know what was happening right before that happened can be overlooked because right in those last few seconds, or second or milliseconds is like exactly the thing you need to see. And so sometimes you need to be a little careful to make sure that you're not like waiting for a buffer flush or something to happen, and you're not getting out

Starting point is 01:23:43 that insight that you needed at the last moments. Another tool that I assume probably can run on servers, but we mostly run on desktops when we need is Valgrind. So similar to what Jason is describing, running Valgrind will give you a lot of insight about how much memory used, but also other related memory adjacent topics to not just size usage, but total number of allocations. So every time your program

Starting point is 01:24:11 does an allocation, there's some cost to be paid. And then every time you free, there's a cost to be paid. And so the more you do allocate and free, so like in my example before, you know, gigabyte at a time through a terabyte, that's there is going to be a cost to that that you wouldn't necessarily have if you didn't do as many transactions. And there's also cost associated with as you borrow and free and borrow and free. Sometimes the OS needs to do optimization, cleaning up, making sure that you know, it collects back stuff it has so it can give it out to other programs. And so Valgrind will help track all that,

Starting point is 01:24:49 but it'll also do things like tracking which pieces of memory you have and haven't initialized, which ones belong to your program. So if your program tries to read, we were talking about arbitrary code exploits earlier in the show as one of the news items, but what happens is you try to read a piece of memory you weren't supposed to. Now, Valgrind can't

Starting point is 01:25:10 always tell if you're reading from, you know, an array that happens to be next to your current array, right? Like, but it can tell if you're reading from a location that's just clearly not part of what you're supposed to be, and so it can report those to you and can be very useful. I will say it's not the easiest tool to run. So in general, I don't just run it because I'm bored and like looking for something to do. But if you have a problem being able to reach for those tools can be very valuable and tracking it down.

Starting point is 01:25:41 Yeah, that makes sense. I remember, and this is a long time ago, this tool might not even exist. I use this thing in C++ called address sanitizer. Yeah. Yeah, that was pretty good. I had that in, but then I removed it. But yes, there are compiler options where you can turn on address sanitization. And what it will do is configure it so that, again, if you try to read from memory that

Starting point is 01:26:01 hasn't been, you know, that you're not supposed to be, that you basically, the compiler will alert you to that. Hey, you're this thing here you're doing may read from uninitialized memory or may, you know, be dereferencing a pointer that doesn't point to something valid. And so it'll catch, it'll catch those things for you. So Valgrind does all that and more, but it's kind of expensive. But it does it after the fact, right? So you don't compile with Valgrind does all that and more, but it's kind of expensive. But it does it after the fact, right? So you don't compile with Valgrind. So Valgrind is doing it to your...

Starting point is 01:26:30 So yeah, it's doing it after you've already built your program. The address sanitization is something that is like a compiler flag that you add. And so it's part of your program build. Got it. Okay. That makes sense. So Patrick, what do you do? You're using, you have 10 gigs of RAM.

Starting point is 01:26:49 You can't, you don't want to pay for 11. Your program is sometimes fine, but sometimes it hits 11 gigs and gets killed. What do we do? Okay. Well, with that tee up, we'll talk about first reducing the amounts of data size that you're using. So one option can be to compress things. So now compression doesn't always mean like zipping it up, right? Like that would be counterintuitive, but sometimes you can trade memory cost for runtime.

Starting point is 01:27:23 So if you think about, you know, I have a stock price, right? So you could naively say, I'm going to store a double of all of the stock prices. And you know, at every tick or every second, I'm going to store double for these stock prices. And then you can kind of work out how many seconds, how many stock tickers that you could track in your memory. So one way to say, hey, I need to save space is to figure out instead of storing, you know, a double for every position, which is overkill, hey, how do I do this in fixed point? You know, do stocks actually trade to one E minus something ridiculous, you know, amounts of precision, or can I use a sort of known thing? I would say that's a form of compression using fixed point math.

Starting point is 01:28:08 Similarly, you could use something like an arithmetic encoding where you store how much the signal is moving by. So when, it always blew my mind when I was younger, but like, I had a CD player that said one bit DAC, and I was like, I knew enough to know that's really weird, like how do you have a one bit DAC and I was like, I knew enough to know that's really weird. Like how do you have a one bit DAC and why is it something they would boast about? Well, I still to this day don't know why they boast about it. Uh, but you can kind of figure out what a one bit DAC means, which is when it's playing

Starting point is 01:28:38 the digital audio signal, it basically at every timestamp is either a one or a zero. That's all you get when it's one bit. And if it's a one, they move the signal up and latch it. If it's zero, they move it down. And so if you wanted to have a steady signal, you would just do one zero, one zero, one zero at equal weight. And it would just bounce around at roughly the same value. But then if you had, you know, one, one, one, it's going to move ever so, you know, slowly up.

Starting point is 01:29:03 And as long as if those ones and zeros are coming out at a rate fast enough that you can match the changing rate of your output signal that you're wanting, Bob's your uncle, I guess, you're good to go. And so there, one bit at each thing is going to be a lot easier for your program to store. So those are things that are like nominally, I guess I would say as, as compression, but more just using the minimal amounts of data to represent

Starting point is 01:29:31 what you actually need to, even if it sometimes means sacrificing precision or runtime, because you, you have to convert those, those back. Um, another thing is like references to data. So if you think about, I store a bunch of web pages in memory for some reason, and web pages have all these repeated HTML tags in them. So you could go through and say, again, trading a lot of processing speed for memory.

Starting point is 01:29:59 But you could say every time the string, I don't know what's a angle bracket P angle bracket. Like I'm going to have angle P angle bracket just represented by a single value that's a pointer to the string angle bracket P angle bracket. Probably not the best example. But if you have these strings repeated over and over and over in your memory, you can sort of make references to them and pull them out. And again, it's sort of make references to them and pull them out. And again, it's kind of compression, but kind of referencing where you kind of layer these things up so your total data size becomes a lot smaller.

Starting point is 01:30:32 But if you need to process it, then you kind of pay for it. And then by a similar thing, every time you kind of derive data on your sample, the instinct is to kind of do that ahead of time so it's ready to use. But if you have just vast amounts of data, it can be a problem. So you can do lazy initialization, which can help with other things as well.

Starting point is 01:30:56 But this is where you're sort of saying, I'm not going to derive this data until I need it. And then when I'm done using it, in that moment, I just let it go. And if I need it again, I'll just generate it again. And so you do this through all the pieces and it requires code and data change, but you're kind of going through the stuff

Starting point is 01:31:18 that you hold in memory and saying, anything that I'm deriving, I just re-drive every time I use it. And that prevents me from, if you think through techniques we've talked about on here before, like memoization or caching and that kind of stuff, you know, those are great. Those are trading increased memory for decreased runtime,

Starting point is 01:31:34 but you can flip that around and actually say the reverse. Like I'll trade more runtime to have less memory impact. Yeah, that makes sense. Along those lines, another thing that I see a lot especially with Python developers or actually mostly with AI developers in any language is like missing back pressure. So I'll give you an example. Imagine if like you have one thread that loads data and another thread that does the ML right? Well doing the ML and and the auto grad and all that slow.

Starting point is 01:32:06 And so the load data thread will be able to load data faster than the other thread train the model. Yeah. And so you get to the point where you, you could take this to the limit. You have an infinite amount. You have the entire data set minus the little bit you've trained on already in memory and, uh, and then that first thread is done and in practice, what minus the little bit you've trained on already in memory and then that first thread is done.

Starting point is 01:32:28 And in practice what happens is your pod gets killed or your computer gets killed or your computer catches on fire. Something happens where because that thread ran away, it generated way too much data for the other thread to consume. And so the way this is handled in Python is through generators. So the way it works in Python, and actually I'd love to know how you would do this in C++, but on the Python side there's functions where instead of a return value, you can actually

Starting point is 01:33:02 have a return value, although very few people do this, but you typically would just return none, but along the way you're yielding. And so you can like have a for loop or in the for loop you're yielding values. And what that does is it allows like whoever is calling that function to consume that value right away, even though the function that they've called hasn't finished yet. And so you can have a generator that yields data to be trained on. And this is getting way down to the guts of the language. I'm definitely no expert, but I think the way it works is like, yeah, oh yeah, so the way this works is that the training model says, you know, give me, you know, the next value. And then when the data one, you know, goes and gets a new value and then calls yield, it's going to wait. It says, hey, I have new data that I can yield, but no one's asked for it. And so I'm just going to wait here. And then the training system can

Starting point is 01:34:13 take however long it needs to take. It'll come back and say, hey, yield me something else, generate something else. And so in this way, you have back pressure. So I mean also when I was building the That's the sage replacement the eternal terminal It was a constant struggle to maintain back pressure so that at any time you could control C and Kill a program even if it was trying to send you you know an infinite amount of output Yeah, the C++ side is there like an equivalent of this or how would you design a system like this? Well I think you unintentionally segued to my to my next point anyways but that was perfect. No I mean I think what

Starting point is 01:34:53 you're saying is a version of sort of I'll say it's probably the more just generic approach which is as long as if you're in a multi-threaded environment you know producer and multi producer multi consumer and then what you want is a fixed sized queue. I always think about back pressure slightly different, where you're asking the producers to slow down, but it's sort of the same thing. So you have a pool of producers that are reading training data, downloading from URLs, whatever, and they're putting into a queue. At some point that queue could fill up, and if that queue fills up, and they're putting into a queue. At some point, that queue could fill up, and if that queue fills up,

Starting point is 01:35:26 then they're basically blocked waiting to send the next data into the queue. The consumers are pulling data out of the queue, and as soon as they empty something out, one of the producers gets to put their thing in, and you have to be careful with thread control, but with proper locking and everything, you can do that without much problems. And there's even lock free ways of having data structures

Starting point is 01:35:50 for doing that. And so in general that would be very similar to kind of how you're describing, but without maybe the keywords. There are ways to do sort of the same kinds of things with keywords, but like I think in general that would be how I would describe is like a set of threads consuming and a thread of a pool of threads producing, um, in the same way. To like level that up one notch and a lot of those, you know, um, things could be done this way is with something called a ring buffer. So if you talk about sort of a naive implementation of the queue

Starting point is 01:36:26 every time you go download, you know, your training data and let's say it's images. So they're all going to be, you know, roughly the same resolution, right? That's something you will have done before you started. So you know probably generally how big one can be. But you say we're not going to do that initially. You're just going to, I have a JPEG, it's a hundred kilobytes. You go ask the operating system, give me a hundred kilobytes of memory. And then you say, oh're not going to do that initially. You're just going to have a JPEG. It's 100 kilobytes. You go ask the operating system, give me 100 kilobytes of memory. And then you say, oh, I'm waiting to put it in the queue, waiting to put it in the queue. OK, I put a pointer in the queue, a pointer to that 100

Starting point is 01:36:53 kilobytes. The object takes that pointer out, pulls the image, frees the memory. And then the next person downloads the next thread, downloads the next image, 101 kilobytes. Gets a gets a new allocation right and so you're doing all this allocation free allocation free and we talked about there's expense to that as well as part of memory management and so what a ring buffer does is say if you know like the max size is gonna be 200 kilobytes or even just you know make it a lot bigger

Starting point is 01:37:22 a megabyte let's just say a megabyte and then say I'm gonna have the fixed size be 20 what I'm gonna do is each of the 20 slots in this sort of think about it as a ring has its own memory and a thread pulls from the ring to go read the image into that memory block that memory block then is consumed out when it gets to its turn and you have a head and tail pointer for where that is. It gets consumed out. So you get a limited number, but you also get to reuse the memory blocks. So each new producer can't produce until he gets a memory block, but then it's refilling. It doesn't have to go to the operating system again, right? Because it already has that memory. Your program owns and holds that memory.

Starting point is 01:38:07 And so that is reducing the amount of overhead to the operating system by saying in advance, if I offer some more limits, then I can reduce that interaction. And so you get the fixed size queue, you get queue API behavior, but the implementation is such that, you know, you need to ask the queue

Starting point is 01:38:25 for the memory to write to. And this is very, very common in embedded systems where you either want to, are required to completely avoid dynamic memory allocation because you don't really need it in that case. You just allocate to your program at startup, all the data for the ring, and then you never need it anymore, right?

Starting point is 01:38:44 You're reading into that buffer, and then you're just, you're doing your own memory manager, right? Your own little operating system to keep track of it. And I guess that's you're saving time because the pulling, getting the memory from the OS is a lot more expensive than keeping the references. Well, even if the memory, getting the memory was really, really cheap, you still have to call out to the operating system. The operating system is in a different context has to run work.

Starting point is 01:39:12 So even in the best case, the system call has overhead. And so you want to avoid that if you can. Um, but then like you said, right. If let's say you were doing a billion image reads, reads, that's a lot of allocation, deallocation, thrashing of the memory. Every one of them is slightly different sized. So you're getting all this fragmentation, and the OS is trying to handle all that.

Starting point is 01:39:35 Versus if you manage it yourself, yes, your program is doing more. So I wouldn't reach for it in the beginning. But if you kind of know, hey, I'm at the point where I need to optimize, it's a great tool to reach for. Yeah, and it also reduces your variance because you've allocated this upfront,

Starting point is 01:39:51 so it reduces the upper bound of what your program's gonna use. And your OS wouldn't know that, like in the example you were giving in Python, which is fine, because it's fast to write, it wouldn't know in advance how many consumers are gonna be pulling from the yield at the same time from the generator, right? advance how many consumers are going to be pulling from the yield at the same time from the generator, right? Or how many generators are going to be here? It can't know.

Starting point is 01:40:10 So here, it would be very complicated to ask for it in this way. So this is the trade-off where instead you're sort of putting more limits on your system, but in exchange, you're getting better performance. Yeah, that totally makes sense. So the next one is a little similar. This one comes into, if you ever use protocol buffers or any kind of message where you're similarly, you're reading along and as you're processing data, you realize you need more memory to put stuff in. So in a protocol buffer, you have like a tree of messages, and as you're decoding them, you're finding new branches of the tree. You need to go get a little bit more memory, a little bit more memory, a little bit more memory to put that data in. And at the start, you don't know in memory how much data you're going to need. When you read a JPEG image,

Starting point is 01:41:01 it's similar. If you want an array of zeros and ones, that is the resolution of your image, so you can put it into a neural network or whatever, you kind of know that size, but as you're reading the JPEG, you know, you're sort of how fast that conversion is happening varies as you go through the image. So if you didn't know the output size, it would just end up being dynamic the whole time. So then what you can do is something called an arena allocator.

Starting point is 01:41:30 An arena allocator basically says, every time you ask for memory, you ask it from this special allocator you have, and all it does is it never frees memory. It just keeps growing within the bounds. So you block out 10 megabytes, and as long as if you stay under 10, it just keeps growing within the bounds. So you block out 10 megabytes. And as long as if you stay under 10, it just keeps giving you new memory

Starting point is 01:41:48 from the 10 megabyte block. And then that whole block, even if you quote unquote free stuff from that you were using earlier, that whole block just stays with you. And then at the very end, you just delete the whole thing all at once. So instead of a thousand tiny allocations,

Starting point is 01:42:03 you have one large allocation. And so that is also an advantage. Distinct from all of that, you know, let's say none of those strategies work, you know, you tried them all or whatever. Your other option, again, going to add runtime is having an on disk cache. So you can write things to disk that you know you aren't going to need for a while, similar to what the operating system might try to do for you. But you write things out to disk as you're reading them in and then, you know, I'm processing something, I'm deciding whether to use it or keep it. If I'm going to throw away most of the things I'm reading, then I just write to disk the ones that I want to keep.

Starting point is 01:42:40 And then when I get to the later stage of my program, I'm reading them in one by one. And that way you don't just have this like growing, you know, list of all of the images before I am gonna, you know, process any of them. And so implementing your own disk-based cache or using a library which is gonna provide that for you, various levels of simplicity or sophistication

Starting point is 01:43:01 you could do with that. Yeah, totally. Yeah, on the Python side, there's this tool called a Shelf. There's actually even better one, but it's not built into Python called SQLiteDict, but basically these act as if they're regular dictionaries. You wouldn't even know the difference. Like they implement all the dictionary APIs,

Starting point is 01:43:23 or most of them. But under the hood, all of that data is being put on disk. And so if you have a situation where maybe latency is not that important, but you're going to have to store a ton of information or you need it to persist from one run to another. But in this case, you just store a ton of it and you don't want to go over your limit. You can use these various tools. I think in C++ there's Mmap, right? Which like maps memory, maps disk memory. Yeah, but that's for a slightly different purpose, but yes.

Starting point is 01:43:58 Okay, all right. So what's the C++ equivalent if you need to like dump objects to disk and then bring them back later? Yeah, I mean, you could use SQLite, but yeah, ultimately there's no, unlike Python, there's not really a serialization built in, right? So it's generally not advised to just write your structure as a block of memory out. So you would need some serialization library to help you with that. Like protobuf for one of these things. Yeah, protobuf for, yeah, exactly.

Starting point is 01:44:29 That makes sense. Cool, so yeah, I think this, we did a great job kind of covering a lot of this. If you have stories, other tools, don't hesitate to email us or post in the Discord. The Discord's actually even better because there is an audience there. If you email us, then we have to remember to go back to previous episodes. And if you're a longtime listener,

Starting point is 01:44:55 you know that we're terrible at that. So go to the GitHub, sorry, go to the Discord and post ideas there, thoughts there. There's a community of folks, really interesting conversations happening over there. And it's really exciting to see. And Patrick and I do check it every now and then. And in general, we really appreciate everyone's engagement and also your support. We definitely appreciate your financial support through Patreon. You are our sponsor. There is no person paying us to say anything other than you folks. So we really appreciate it.

Starting point is 01:45:38 And we will put a bookmark in this one and we will catch you all next time. Music by Eric Barndaler. Programming Throwdown is distributed under a Creative Commons Attribution Share-A-Like 2.0 license. You're free to share, copy, distribute, transmit the work, to remix, adapt the work, but you must provide attribution to Patrick and I and Share-A-Like in kind.

Programming Throwdown - 181: Memory Management

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.