The Standup with ThePrimeagen - Memory Safe C

Starting point is 00:00:00 You guys ready to talk today? Today is a pretty good topic. Are you guys ready for today? I'm ready. Okay, yeah. I mean, I'm not ready. You guys don't know. I'm just ready.

Starting point is 00:00:08 We're going to start anyway. Yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah. Anyway, sorry. Today, we are talking about Phil C, the memory safe C. Is it useful? Is it good? Is it the future of C? As always, we have with us, Teage, creator of a C-based memory course on boot.

Starting point is 00:00:27 Where you get to go over how a language. What was that, DJ? Good point. Yes, great points. Okay, well, don't interrupt me saying that I'm like in the middle of a really strong intro. All right, anyways. As always, we have with us, Teage TV, creator of a C-based memory course on boot.com where you go over how to create a garbage collector and Casey Muratory legendary C-Chad game programmer.

Starting point is 00:00:46 Check out Computer Enhance.com for his amazing courses and lecture. And we also have a special guest with us today. Low-Level Security Extraordinaire and also Seed at C&Asum, Chad. Check out low-level. Dot Academy for amazing courses and all your low-level needs. Now, hey, no problem, buddy. So now, let's talk a little bit about Phil C. gets to say a comment and it didn't break the intro.

Starting point is 00:01:09 On the last one, suck it, nerd. He said it cleanly. He got it in there, T. It was like part of the flow. I was part of the flow. Talking, T.J. And you just were like, hey, that's a great point. I didn't even think about how that would be relevant to this today.

Starting point is 00:01:25 So I was surprised. Go ahead, Brian. Anyways, just for everybody to understand a little bit about Phil C, here's some basic C code along with a special standard fill up at the top that's brought in. That allows a special kind of printing character called the capital P. And in there, it will actually print out, hey, look at this. This pointer actually has three values in it. You, the C, developer, will only see just, you know, the integer.

Starting point is 00:01:50 But underneath the hood, there's actually some extra information stored. And this actually shows it's lower and it's upper bound. You can see that we allocated 16 bytes, and of course there's 16 bytes difference between the lower and the upper. That is fantastic. And with that, it allows for some special behavior. So here we go. We malix 16 bytes again. I attempt to access byte 42, or technically the 42nd index into here, which would be out of bounds.

Starting point is 00:02:15 And Phil C says, ha, ha, ha, ha, ha. I've thwarted such an easy and very obvious thing. Now, some of the things about Phil C that makes it a little bit unique and probably is going to going to make a lot of people mad is it's actually garbage collected that's right it's a c that is garbage collected so if you look underneath the hood even if you call free it does not technically free it marks it as being freed and then later on the asynchronous multi-threaded behind-the-scenes garbage collector will go and free that memory so it's a very if this is not your grandfather's see okay this is this is modern see and what he means whatever that's fine good one t you

Starting point is 00:02:55 I'm just saying he made one. It's fine. It's okay. Keep going. They have garbage collector and C for a while, but yeah, that's okay. Go ahead. Yeah, I know, but it's not, that's not like your standard operation and C. People don't think C, oh, yeah, that garbage collected language. Like, that's not the first thought that comes into people's heads, whereas this is garbage collection at all times. Yeah, standard. Standard C does not have a garbage collector. That is true. Right. You're talking about some additional thing when you talk about C garbage collection, right? Yes, yes, yes. Whereas this is going to be like your standard experience is actually garbage collected. Now, obviously, Phil C. does. And with a lot of performance penalties and runtime checking. Like when I did that little check into P-42, it obviously did a runtime check on that when you attempt to run it. It crashes your program successfully.

Starting point is 00:03:38 And so that is a yes, TJ? I was wondering, do we have any bouncing balls to show the performance penalty for charts? Because I feel like I can't, as you said performance penalties, I literally wanted to slap myself in the face that I did not bring a bouncing ball chart. to benchmark this for Casey. I'm so angry. I'm so angry. As you can see from this diagram. Very scientific.

Starting point is 00:04:05 There are some performance penalties. It's like, no, actually, I can't see from that diagram because the diagram sucks. That's why I can't see it from that diagram. That's pretty funny. Look at all these engineers sitting at their neat little desks. It takes dirty work to keep a code base clean. Every day sickos are out there committing unreviewed code and when that happens, linters won't save you. You need someone like me.

Starting point is 00:04:33 Feet your free scrum bag. Who are you calling scrum bag? What's this slop you're trying to push? Unnecessary comments? Global state? Nested ternaries? Oh, my bad. I didn't even read the code yet. You disgust me. Step away from the keyboard. Just let me explain. Is that a mouse? He's merging to prod. You have the right to remain silent. Anything you push to GitHub, Canon will be used against you. You have the right to a debugger. But if you have you You cannot afford one, a public stack race will be made available here. And one more code criminal off the streets where they belong. HR.

Starting point is 00:05:09 Look, I didn't yet... I know I didn't review any of the code, but I was going to have CodeRabbit review it from the start. With one-click fixes, install enforcement, I don't need MerchCob. I would never merge I reviewed code, but a first pass with CodeRabbit always makes things go faster. Actually, you can try it too at coderabbit.ai. Next week on Merge Cop. The Diffler's out there, and I'm going to be the one to deprecate him. So to kind of wrap it all up, so everybody understands how Phil C works.

Starting point is 00:05:43 The runtime impact of Phil C is going to be somewhere between 1.2 to 4X is what I have seen. And then you will see comments on the Internet that say something along the lines up. What's 20% these days? Nobody cares if you lose 20% performance, which I know that Casey will probably lose some sort of sleeper hair over with that kind of comments on the Internet. Wait, how do we get from 1 to 4X and go from that to 20%? Well, because 1.2 would be, you know, 20%, but then 4% would be... Oh, you mean, if you just took just the minimum, as assuming that you were going to get the minimum for some reason without evidence? Okay, sure.

Starting point is 00:06:19 Can I say, there is another performance penalty? Can I, can we talk about that really quickly? Yeah, yeah. Before I do that, I just have to... I'm going to be a little bit on the rust side today, so I came prepared. Oh, no. Perfect. Okay.

Starting point is 00:06:36 Hey guys, it's Rustyge checking in today. The other problem is that there's a serious memory additional performance constraint here in the sense that it's going to allocate a lot more memory for lots of different kinds of programs. So I just wanted to make sure that that's clear. It's not a zero-cost abstraction. Okay, it's not a zero-cost. So, Prime, you may continue. Actually, Teage is very, very correct.

Starting point is 00:07:01 It is not, in fact, a zero-cost. cost abstraction. Anyway, so I just wanted to talk about this because the thing that really sparked all of this was that pseudo-RS is kind of this new brand of pseudo. It's the new pseudo on the block. And Ubuntu saying, hey, we're using it. But all the comments are like, why? What's why are you doing?

Starting point is 00:07:18 Like it seems to be a, at least it appears, I don't know if it's actually true in real life, but in Twitter life, it appears to be a largely negative experience. And so with that in mind, some people are saying, well, why don't you just use Phil, Phil C, the performance of pseudo is negligible, and you can get 90% of the safety without having to deal with Rust and a rewrite and rediscovering all the bugs. And so that's why we have with us Casey and low-level learning. TJ is just here for fun. And so therefore, I think it would be fun to start probably with the security side, just because I think that that is probably the most poignant or at least the most reasonable place to start when it comes to a memory safe C. Ed, if you'd like to kick off any talks about that, I have a lot of questions about it, but any initial statements?

Starting point is 00:08:05 Yeah, I mean, so I guess to kind of kick this conversation off, right? What is spawning this? If you guys aren't aware, pseudo, you know, well-known, well-loved program that tends to be the target of a lot of vulnerability research because it is one of the most widely available, widely used set UID binaries, right? It's a binary that runs its root and use it to run things either as root or somebody else. And so a memory corruption vulnerability in pseudo obviously violates a pretty hefty, pretty scary security boundary. And so as a result, there's a push to re-rate a lot of bin-utils,

Starting point is 00:08:36 one of them being pseudo, in Rust, right? Kind of a cool idea. This also, for some reason, kicked off an effort to, like, rewrite all of the core utils in Rust to include, like, Echo, Cat, Sed, which is a whole other conversation. But unfortunately, despite it being written in Rust, shocking, I know, I think there were two CVEs that came out recently. in the RS implementation of pseudo. So it kind of begs this bigger question.

Starting point is 00:09:00 I think, you know, we've talked about this in the previous episode that I was on with Casey. You know, all because you write it in Rust doesn't mean that you're going to get the exact same behavior. And if you can't confirm via some set of testing that you have, you know, not regressed at all in your functionality, then you're potentially introducing new vulnerabilities to the code through logic errors. And that's, I think, exactly what we saw with pseudo-RS. So instead of rewriting everything in dollar sign, new lang, and I'm sure we're going to iterate on new lang every 20 years, why not drop in a new compiler that has sanitizers, that has intermediate representation checks to put in on an arbitrary

Starting point is 00:09:37 pointer the vector size, right? And that's what Phil C does. I have not used Phil C personally. I'm kind of curious, Prime, how it went for you. Overall, I think Phil C is a great idea. There are some compact issues. You have to be compatible with the LLVM tool chain to make it work, which is kind of an issue for some pieces of software, but generally, great idea. I love the concept. So that's kind of my first take. Casey, what about you? I'm on the same page.

Starting point is 00:10:02 Essentially, like, rewriting something in Rust to me is usually a pretty bad idea, because the problem is that your really Rust can only guarantee you a couple security things that it's specifically designed to do, but all of the other ways you can have bugs are still there. So it doesn't really handle all of, like, your logic errors or things like that that could result in your... your program doing something potentially catastrophic, especially in the case of like a core utility, like pseudo. And so really, like, I would never have, like, I would be like, don't do that.

Starting point is 00:10:35 Like, like, we know that there are memory, you know, safety problems in the languages these were written in, but they've been kind of beaten out over the years. And we can keep looking for them and so on. And so rewriting something in Russ like that is just very dangerous and I think ill-advised. So something like Phil C is kind of a very nice middle ground, I would say, which is that like, we, hey, we can. can preserve the logic integrity of this thing that has now been battle tested so that we're pretty sure that we've found a lot of the logic errors that I'm sure we're in there originally that we've now kind of gotten out over the years as people found them. But now you can also get this added layer of memory checking just to make sure that memory things we haven't found yet, maybe this will protect against those. And then you're really just down to like, okay, are there bugs in Phil C itself, right? And, you know, maybe those take a little while to suss out, too.

Starting point is 00:11:24 Like, maybe there are some edge cases that, you know, will be found. But until then, it's like that seems like a much better – that would be true of Rust as well, right? Like, there's no difference between Rust and Filcy in that sense. You can always have bugs in the implementation of the memory safety. So it really does seem like a much more sensible approach if you're just talking about our goal here is to make sure that we're providing the safest possible version of something like pseudo. Yeah. Okay, so I mean, those are all kind of like, they feel like intellectual arguments, more than actual, like, practical arguments. Do you think there's an actual practical argument to say that Phil C is going to be something that is, that is, like, fully usable?

Starting point is 00:12:03 Like, Casey, could you use Phil C in a game? Or would you even ever consider using something like Phil C in a game engine slash game? Well, I mean, you'd have to have some reason why you felt like what it's offering you is important. So, you know, there are some ways in which you could imagine this being. important. For example, if you have a lot of user-generated content in a game where potential bad actors might be trying to use that ability to upload content to create sort of security exploits, right? You can imagine something like Roblox or Fortnite or whatever, where they're kind of constantly trying to increase the amount to which their users can contribute new games, new modes of play, whatever.

Starting point is 00:12:52 You could imagine something like that where there are cases where you're very scared about things like memory protection and stuff like that. And so I could see maybe some arguments in certain specific scenarios where you're like, we really need this. So we're going to take this isolated part of the engine and maybe we will use something like Filci. We'll accept the performance hit, but we just really need this. We need is all the security we can get.

Starting point is 00:13:17 But for the majority where you're talking about, talking about just running a game in an isolated context on someone's machine, it's unclear what you really need this for, right? For the average game development, it's not really, and I don't, I mean, the person who created Phil C works at Epic, right? They are working on, I think they work on Verse, the Epic, like Epic's Fortnite programming language thing. And so, like, you know, I would imagine most of like an Unreal Engine or something that wouldn't really adopt something like Phil C, but you could see some specific cases where that might be important. And for example, Verse, which is their language, that would be a case where he probably is

Starting point is 00:13:57 employing some of these techniques there. I don't have any inside knowledge of that, and I don't work on that. So that would be my take on it, is you'd have to be in an area where you really cared about it, because otherwise I'm not really sure what you would need it for. Does that make sense? Yes, I think that does make sense. I mean, where my big misunderstanding probably just comes from is that I don't understand. I was trying to read some of the things with the differences between like A-San and UB-San and how that, how Phil C actually can circumvent some of those problems.

Starting point is 00:14:30 But at the end of the day, TJ brought up really good points because he's being the resident Rustav. There are things that Phil C just can't do. And they even, they even shouted out, which is like file handles. They are something that we can't, that can't be represented in the same kind of context.

Starting point is 00:14:46 Well, here, I'll give, I'll give the example. First off, that's probably the longest to Rust users ever gone without speaking, so I just would like that to go into the record as a moment for allowing other people to speak. I do have Rust running in production right now, so that is something that also most Rust users have never done. So I would put myself ahead of almost every Rust user in the entire world.

Starting point is 00:15:10 Hey, same guy, listen. Yeah, so just throwing it out there as two things that make me very qualified to talk about this. But I do want to, can I, I need to say one thing, though. You also did the zero to I use Rust, by the way, at the same standard rate as any Rustev who starts speaking within the first like five seconds. You mentioned the fact that you used Rust. By the way. By the way.

Starting point is 00:15:35 I don't, I thought I wasn't allowed to run Rust C unless I did that. I still need to run Rust C sometimes. So I don't understand. So, okay, anyways. So this is on a serious note instead of meming about Rust users. This is not a reason that I would rewrite a program. So this is like a different class. So if we want to start, we can maybe talk more about like the rewrite situation first

Starting point is 00:16:05 and then talk about, you know, like if you were starting off a new project, which one you might want to choose in pros and cons. because the thing that I think a lot of people are missing, and Ed and case you guys both alluded to this at the beginning, maybe I have to take my Rust hat offer this to say something nice about Phil C. It's like, you can't just rewrite everything. It actually doesn't work. Like, we don't have enough time or resources or a bunch of other things. Even just like if you think you can bug for bug, remake it exactly right and handle every case,

Starting point is 00:16:37 it's just not realistic to make that happen. Like even, like, I think probably I could run Neovim, you know, at like, even at Phil C's current worst case, 4x slower, and Neovim would still feel fast and good for me, right? You know what I'm saying? Like, even if it was four times slower. And that's assuming, and I'm, I see as well, we've got Pizzlinator in the chat, who is the creator of Phil C, by the way. like Phil C is gonna I think get faster than what it is right now as tends to happen for people who actually care about software and making excellent things so it's like I could run neovim and then I would not be able to be exploited by certain cases of memory safety bugs right because I'm sometimes running slightly untrusted things in neovim I'm running someone else's code running like via a plugin or other things like that so it's not realistic to just say like neovim just rewrite it in Rust. That's just do it. It'll be easy. That's insane behavior. That's a, that's a not a realistic scenario. So that's the thing, for some reason, I feel like a bunch of Rust people are just like talking past it of like, you could just have a safer world today. In fact,

Starting point is 00:17:51 Graydon Hore, creator of Rust, said more memory safe programs is better for all of humanity, right, in talking about Phil C. So anyways, yeah. This may kind of be jumping backwards in the conversation a little bit there, but Prime, I kind of heard you, I think, sort of raised the question. You said, like, I have questions about, like, what's the difference between, like, address sanitizer and using something like Phil C, right? Yeah. And I thought maybe that might be something that's worth, like, talking about a little bit. Yes.

Starting point is 00:18:21 Yes. Because I've A-Sand a few times in my life. I'm no A-San expert. And so I just know that it's caught. I've done silly things with standard vector in the old C++ world where they're just like, you're doing stupid things and I was like, oh, that is a stupid thing. Like, I just didn't know what I was doing. Right. So, just to be clear, the idea behind things like Phil C and also just there's like

Starting point is 00:18:47 kind of a broader security idea here, actually, than just Phil C, which is that arm chips, or the arm specification, I should say, and certain vendors supply actual hardware support for the kinds of things that Filci does. It's not exactly the same as for there. Philcy does things beyond what this hardware would be doing for you. And we can talk about that a little bit. X64 spec also has now introduced this, although I guess I haven't really followed closely enough

Starting point is 00:19:18 whether anyone's shipped any of it yet. But like there is a specification for it, right? And so I'm assuming that like zions and, you know, epic processors down the line will have this for servers or something, right? I haven't followed out the update. But Apple M-series chips now, for example, do implement this arm, this particular arm extension. Anyway, these things are not designed to analyze your program when you're developing it and tell you whether they found security vulnerabilities. They're designed to actually just stop the program any time it's running when it would have done something bad.

Starting point is 00:19:54 So the idea here is to make sure that something that address sanitizer may, wouldn't catch because when you were running the program you never exercised the code path that happens to do this thing or whatever else you know like the static analysis part can't catch things or what you know whatever you were using this is designed to basically say look the actual runtime model of this thing literally can't do these bugs when they would have happened the program simply halts it's basically just a thing that says when you would have written to a piece of memory that was actually a different piece than this pointer was something supposed to ever be able to write to, we just stop. The program faults, and it closes. And so, again, it doesn't really prevent the bug in the sense of the program working. The program still doesn't work. What it does is it stops it from turning into anything other than basically a denial of service attack, right? So does that make just light sense, just to start with?

Starting point is 00:20:50 Yeah, yeah, yeah, yeah. Okay. Yeah, there's all the talk in chat right now about ASAN. We should probably highlight, like, ASAN, for example, is not meant to be an exploit mitigation in the sense that you use ASAN to block hackers, right? Because all ASAN does is put a predictable pattern as a shadow region and a memory space so that when that is corrupted, it's like, hey, it's not the same. You probably should fix this. But the pattern that it puts is is deterministic, right? So like an attacker could just put the FAs on the right spot. And,

Starting point is 00:21:22 you know, that would not trigger ASAN while still violating memory safety. So ASAN is more like a a tool that you compile your code with to make it as fall over as fast as possible as possible to put it in your fuzzling rig in CI so that in two weeks when your fuzzling is done it hopefully has crashed at some point if there's a bug contrary to Casey's like the Phil C thing which is let's instead of having A San just kill it

Starting point is 00:21:47 let's put in like intermediate representation logic within LVM that says like hey we're going to assign lengths of things to see arrays such that you are not allowed to program in memory on safety. So it's kind of two different schools of thought on like sanitization versus runtime, if that makes sense. And for people who don't know, because like a lot of people probably haven't written,

Starting point is 00:22:11 like systems level code, like maybe Ed, you can just give a really brief, brief overview of like how that memory on safety leads to bad things happening on someone's computer. Because I feel like a lot of people, they just literally don't even know what's going. on there?

Starting point is 00:22:28 Yeah, 100%. I mean, so like the crux of every memory exploit boils down to this pattern. Let's take an array and we want to index into that array using the index I, right? Now let's, instead of that I being a number that we program into our code statically, we read I off of the network. And we're using some type length value network protocol where we have like the type of a field, the length of a field, and then the number of, or the length of the field, then the data, right? Most of the time, it just happens that oops, the developer forgot to validate

Starting point is 00:23:03 that I is within the scope of that array, and so you get an arbitrary, read, arbitrary, right. And that ability to arbitrarily read and write data gives you ASLR bypass. It gives you stack canary bypass. It gives you the ability to override relocatable, so you can redirect program control to somewhere else. And that, like, that is the crux of almost 80% of memory corruption bugs, right? There's some other ones that are a little more weird. So if you literally just create proper bounds in a language that is memory unsafe that disallow a user to write outside of their confines that solves the problem. And so Phil C, again, I haven't used it. I haven't explored too much into it. What I understand is that Phil C uses LVMIR to say like, oh, you've made an array

Starting point is 00:23:43 that is 64 bytes. I'm going to put, you basically turn all arrays into vectors and they do runtime checks on the vector before you do rights. And like, oh, if you violate that, you can't continue. And so that stops most bug classes. I think Casey was talking. I say it's a little it's more hardcore than that. Like Phil C is like kind of like if you actually, if

Starting point is 00:24:04 you're someone who is 100 percent, you know, at the rust party ladling out the rust Kool-lade yeah into into the little rust cups that you hand out and go like let's do this, let's drink this guys. Then you should love Phil C because it takes it way more seriously even.

Starting point is 00:24:20 So essentially the way that Phil C is designed to work is it's a lot like the hardware extensions only instead of trying to be relatively transparent. It's saying, look, we're going to incur a cost. And so what it actually does is it tracks essentially all allocated objects of any kind. So anytime you actually do any kind of allocation, it's going to say, all right, I'm going to track that in using, I mean, what you would call like memory outside of the addressable space of your language, meaning doing what you're able to do inside a Phil C program, you cannot access the tracking data, if that makes sense, right?

Starting point is 00:25:02 And so that tracking data will then be used any time you're trying to work with one of these pointers, you know, like a regular C pointer. To you, it looks like you're working with the pointer normally like you wouldn't see. But actually, Phil C is going to look to make sure that that pointer originated from the actual part of memory that you are accessing, right?

Starting point is 00:25:26 So it tracks not just are you in bounds, meaning not just are you within the particular object that you, that you sort of claimed you were accessing, but are you in the one that this pointer originally came from? And this works even if you, like, read and write pointers to, like, blind memory and stuff like this, right? Because essentially what Phil C is doing is it's using sort of almost like a super,

Starting point is 00:25:51 It's using a 64-bit value that you don't directly use, you indirectly use it by looking at sort of that backing, that tracking information to make sure that you're tagged, that you like match what you're trying to access. And so when they do this in hardware, it's a sort of a weaker version of this. What they do is for pointers in the hardware versions of this, they just use the top bits of the pointer that wouldn't be used because you don't have that much memory. you don't have 64 bits worth of memory in a machine. They use the top bits to assign, say, like, a four-bit tag, a value between 1 and 15, let's say, and 0 is usually reserved, that sort of stuff. They'll use that tag, and then every block of memory that you sort of allocate, or rather that you mark, you can say like this region of memory,

Starting point is 00:26:40 you can mark regions of memory with tags, whatever you want, some random tag that you associate. Every time you use a pointer to access some memory, it checks to see whether the tags match. So that even if two blocks of memory are right next to each other, if you go from one to the other by increment of the pointer, like low-level was saying, if you go outside of like an array bound,

Starting point is 00:27:00 the tags hopefully won't match because hopefully any two neighboring regions got different tags, right? But that's really like, there's some other things that you can use that these extensions do, but for the most part it's doing that sort of thing. And Philsey is just sort of like the on steroids version of that, right? It tracks everything about these so that you can't even have accidental conflicts of the tags. It's not just using some simple, like, 4-bit scheme.

Starting point is 00:27:25 And it also does, like, more aggressive use after free tracking and all this other sorts of stuff by using garbage collection to, like, never remove those things so that it remembers, like, each region until it's actually not accessible by anyone anymore and so on. So hopefully that's just a little bit, like, that's like, so it's a much more complete system than something like address sanitizer, if that makes sense. So that's a lot of stuff in there, but. Is it effectively turning all memory accesses into like a virtual pointer where like, it's like a look up into another object table and then it does access? I mean, that's probably not the, that's probably it sounds a little bit more, uh, that might be a slightly more aggressive way to say it.

Starting point is 00:28:05 You can think of it more like, um, there's almost like a shadow version of thing. Yeah. You should go take a look. They have a very nice page that has it just something. Yeah. I'll show you, but I have to power up first. Jesus Christ. Where do you have only from? Why? Why?

Starting point is 00:28:22 Okay. That's pretty good. Now I'm Super Scion mode because we have a whiteboard. Super Say and Rust user. I like it. Yeah. Okay. So in this case, right, we have like pointer X. It's pointing to 40. Right? Y. 4.4. Right. You can imagine in memory, we've got these two pointers, right? What happens in normal C when you access like X5?

Starting point is 00:28:44 Yeah, you just get zero or whatever, whatever like the next if element is. Yeah. We get over here. Yeah. But in Phil C, it says, yo, this, if we did like Malik 4 and Malik 4 for both of these, right, in this case, it's going to say it's got a lower bound, a 4-0, a upper bound, a 4-3. Yeah. And here's where it currently is. Well, I understand that.

Starting point is 00:29:03 I'm working here for the implementation, right? Like, in the context of this example, what is X? Is X a pointer to a thing or is X? To you the user? Yeah. The X is just number. The X to you the user is still just this pointer. Interesting.

Starting point is 00:29:15 But this is, I'm assuming, this is what I'm going to. maybe we can get confirmation as far as I could tell from reading and everything else it gives you back exactly this this is what you think you've got I think it's wider so I don't think that's how it works it's 16 bytes wide though right no no no no no it's it's the same size it's 64 bits oh it is okay it's yeah I I don't use Phil C and the first time I heard about Phil C was when you guys said we were doing an episode about Phil C so I apologize for not knowing more about it, if I was a Philcy user, I'd be happy to give the flux patient.

Starting point is 00:29:51 I believe the way that it works is they use a shadow allocation. So every time you allocate something, they allocate twice that effectively, right? Yes, you're right. You're right, Casey. And then they use, so effectively what X is, is it's going into the shadow region, which then tells Phil C where, like, how to get the additional date, right? Data. In fact, you said the person who wrote it was in chat.

Starting point is 00:30:16 Is that a correct? Would that be a good summer? Because I read about this literally yesterday, so that's all I know. But that, I believe, was how it was explained in the documentation. Yes. Yes. I think you're right, Casey. Yes.

Starting point is 00:30:29 That was my apologies. It doesn't return the wider one. It has one somewhere else that it's holding onto that tells you. It's called Invisicaps, right? It has the invisible capabilities stored off in a region in which you're not allowed to access in which the thing returned to you look just like a pointer. It sounds like a pointer. And in C, it's just literally a number. and then boom, bottom, bang, it does the runtime checks against it.

Starting point is 00:30:50 And that's why you can still add to it and work with it like you normally would, because because they have a shadow region, they can still figure out what's going on there, right, or things like this. And so I don't know the specifics. I want to now read more about it because I thought it was really cool when I was reading. It was like, oh, this is a really neat thing. So I'd like to go read about how they handle all the actual practical details, but I didn't immediately find sort of the document of that.

Starting point is 00:31:14 There's really good overview documents, but like, oh, how did you handle this? How did you handle that? Like, I didn't find that immediately. So I'd like to go read, like, how do you actually use the shadow region? Like, how is it doing that? And all that sort of stuff. And also, the other important thing that they bring up in this documentation, which I think is important to mention as well, is that if a naive implementation of stuff like this would have a lot of problems with threading potentially. Because, like, different people can sort of mutate pointers and from, you know, and you can have race conditions, this sort of.

Starting point is 00:31:44 stuff. And so Phil C is also also very careful to make sure that they get those things right, which again, like, if you look at it, it's like, wow, this sounds very complicated. Like, why did it have to, you know, why does it have to allocate all this extra memory or things like that? And the answer is because they were, they were trying to take this really seriously. Like, not like, oh, we caught some memory bugs, but like, no, we really do provide a secure operating model, which is, which again, if you're drinking the Rust Kool-Aid, you have to respect that. This is, this is doing it real and on existing programs. It's pretty great. Yes, we like that. That's the crazy thing for the Russ people.

Starting point is 00:32:22 Even with my Rust super crazy hair on right now, that I don't really get. It's like, this is good. This stops this from happening. Right now, if you ran C code and you did this, it would be chill. You're like it would let it, it would be chill. It would let you just read from memory. That's not the thing that you're doing. That's bad. We don't like that. We want computers to work good. Like, that was the thing that I don't get. So that part's good.

Starting point is 00:32:49 And it seems like for some reason everyone's mad about it. And I don't really get why they're mad. I don't know. Because it's good. Most people really hate it when you introduce something good. That's not the thing that they were previously telling everyone was good. Right? Like they don't like to hear that someone else made something good that might be a reason not to use their good thing.

Starting point is 00:33:08 That's true. Right? Like heaven forbid, right? But yeah, there's, I do have. have one question about Philsey that I wasn't able to answer directly from my sort of like five second read. So again, like it'd be nice to meet maybe if if if the author is in chat, maybe they can come on sometime and tell us so that they, so that people can have a more accurate description of what it's actually doing because that would be really nice. And obviously they would not get it wrong because they wrote it. But what I was going to say is I don't really know how this works with things like,

Starting point is 00:33:43 okay, I create a block of memory that I'm going to sub-allocate out of, and I wasn't sure, like, what it does for that. Like, that's what one thing that's like on my list now. I'd like to go read about, like, how it deals with that. And, like, are there, is that just disallowed? Is it allowed? And we just say, oh, well, it doesn't catch problems. What? No sub-capabilities right now.

Starting point is 00:34:09 Okay. I assume that's because if you take a pointer from that region, you technically have a lower bound. of the original lower and the amount of the big upper. And so that's a thing that makes it kind of hard to port a lot of existing programs because a lot of them will do that, right? And so then it's kind of like, okay, those programs kind of have to be rewritten to, like maybe there's some way that, you know, Phil C could be extended to support that. Like, hey, allow us to sort of write a section that's like, here's the,

Starting point is 00:34:36 here's how the memory marking is going to work inside our thing. I don't know. But like that would make it challenging to port programs. that were written in that style, which is a very good style of writing. Like, it's very efficient. And so it's kind of, I don't know what you do about that if it's, you know, in terms of I want to use this in some subset and I want to just use my existing C code. Effectively, the creator, Piz is saying the best thing you can do with your arena allocators

Starting point is 00:35:03 is to replace them with individual Malix. Okay. I mean, which is not that hard because usually your arena allocations are going through like a pound define kind of a thing to like allow you to switch them for debug and stuff like that anyway or to mark like where they're coming from and I assume I guess let me ask

Starting point is 00:35:20 another question while we have the benefit for the author in chat so do you then just like how do you because normally arena freeze are just going to free the whole arena so is that just taken care of by the garbage collector then don't call them free let the GSP deal with it

Starting point is 00:35:34 don't yeah then so then that's not a hard port yeah right that's not a that's not a difficult port, I don't think. For most people, because at least for my own code, I always have those running through a macro so you can mark them with file in line and debug and stuff like that. So that seems pretty reasonable.

Starting point is 00:35:50 And if you've been following along, we'll link as well Pizzlinator, your X in the, like, description of the episode. But like, there's been working on a bunch of different, like, very complicated, very real world

Starting point is 00:36:06 C programs that are big, that you cannot not just rewrite to Rust. Guys, you can't do it. And talking about like EMAX and Ruby and a few other ones, you can follow along as he's kind of like micro blogging. As they say, that's in these days, right? Micro blogging.

Starting point is 00:36:26 I'm on Tumblr, obviously. You can tell from my hair. So you can follow along there for a bunch of different ones and see what he's been up to and how he's gotten there, which is cool. Cool. TJ, as the resident Rust expert, yes, thank you. Could you help us understand things that Phil C doesn't do?

Starting point is 00:36:46 Or why would you, like if you're starting brand new, but you're really good at C, why would you even want to choose Rust if you want to type, like, or at this level of safety, and you're like, well, I could just use Phil C, but why would you want to choose Rust? Yeah, a blue field project, if you will, for me. Yes, yes. A blue sky project, T.

Starting point is 00:37:05 Blue sky, a blue sky project. If I was starting a new blue sky project today, what would I pick? Okay, so one reason you wouldn't use Phil C. You don't enjoy writing C. That's one reason. That's a legit reason, okay? There's lots of projects that don't need to be written in C. Okay?

Starting point is 00:37:25 That's fine. In fact, there's lots of projects that don't need to be written in Rust. I almost have to take this off when I say this, right? It's just true. You lost your blue hair on that one. I know, I know. I get to agree. I think you're going to have to lose the wig, Teage.

Starting point is 00:37:38 That was, yes. You've been, yeah, you've been removed from your board seat on the Rust Foundation for that. But okay. Yes. But let's say you're starting out a new project. And the requirements are such that Rust makes, or some systems language makes a good choice. Right. So, or you just have some other requirements.

Starting point is 00:37:58 The example for me that, like when I was working at Sourcegraph, we have a very straightforward in and out system where we have text come in and we need syntax highlights to come out on the other side. It actually matters for that to be really fast. You want it to go really fast. It's always going to look the same shape. You have text come in. You'd like to copy it not so often. You're going to run this for like lots of different files and lots of different languages for lots of different customers. You'd like this to be really good and really nice and really, really fast. Okay, you have options for what you could do, right? You could write that service in C, I guess, right? It's maybe a little bit, it's a little bit harder to picture, but there are like decent like parser generators and other stuff you could use to get you part of the way there. And then you could apply some like syntax

Starting point is 00:38:52 highlighting from that and all that good stuff. But now if you're going to go and do something with Phil C, you're going to experience a slowdown. And if you don't do that, you know, you're more likely to experience some bad security vulnerabilities on a bunch of untrusted input because you're highlighting random people's code right so oh they put in a unicode character that was too wide for your thing and then overflow this buffer and now you access this it's corrupted memory and you screwed yourself over too bad so sad right so uh so in in some cases like this where you need you need a lot of speed you're maybe going to work on a lot of untrusted input you're connected to the internet you're doing other stuff like that um like i think there's

Starting point is 00:39:30 good reasons why you might want to choose rust and you'd be able to get like faster performance or something like that with rust i mean also i think um you know from a like technical standpoint a lot of people appreciate the some of the like type system guarantees you can get from rust that filsi is not going to give you you're hopefully now like you won't in uh in many cases have like an actual security vulnerability you won't have a c vee from these kinds of problems but it will crash like at runtime, which is good. That's objectively better. Also, Rust crashes at runtime for things too.

Starting point is 00:40:07 That's also how it solves some problems. So that's not like if Rust acts as something outside and it's unchecked, that's just an expector of panic and then it will crash. Sweet, that's memory safe. We are happy. That's actually good. But like there's lots of things. The example you had brought up earlier for like file descriptors, right?

Starting point is 00:40:27 In Rust, if I open up a file and I am making my, own, you know, like API for it. I'll have the file descriptor, like, or I'll have the function take a callback. That callbacks, one of the parameters, will be a reference to the file descriptor. In Rust, the borrow checker will prevent me at compile time from accessing that outside of the callback, right? So you couldn't like save the file descriptor to a global and then try and read from it later. That won't, that like won't work, right? And most of the time it's like not clonable. And not copyable. So you would have to go through like a lot of hoops to try and even get that value out in a way. You could even literally like just copy it. The callback pattern is impossible in

Starting point is 00:41:09 Rust. It's so hard to use. The callback pattern in Rust, it's so freaking tough. True. Yes. I still haven't figured out streams quite successfully yet. Yeah. Um, but that would be a thing, right, that like if I try and use this value, this file descriptor outside of the callback, the type system and the borrow checker, right, the linear type system that Rust has will block you from being able to do that at compile time instead of like runtime, right? That's like for me, I'm a type systems guy like types. For me, that makes me that makes me happy, right? Like, of course, especially with the blue hair on, that makes me really, really happy. So that would be something where like if what you want to do, right, like push left,

Starting point is 00:41:51 we're going to push the errors to happen earlier along in the dev cycle. We're going to experience those at the compiler and there's no like 100,000 lines of battle tested code that has a bunch of stuff you might choose Rust over like Phil C right and then you like and that would be I think a reasonable choice but for me that's everyone seems to be talking past each other about that on the internet which is surprising can I can I ask a clarifying question there I'm not sure I 100% followed that first of all I would also say that the phrase push left yeah strongly like sort of suggests a left to right leading language, reading language, because that's the order you're going in. So I think you lose the blue hair on that one as well. But...

Starting point is 00:42:33 True. You should say more towards the beginning of a sentence push, is the correct way to say that. Yeah, exactly. What I would say is, what is the bug you're trying to prevent with this file handle scenario, just to I understand what you mean by Rust preventing this better than... Like, if you have some function, right, that like opens up and then closes a file. handle for you and says like, hey, or a file descriptor, right? Or any kind of this thing, database connection, any kind of thing where you have like a handle,

Starting point is 00:43:01 right? Uh-huh. Uh, if you're writing this and like, this is not true of just C, but this would be like in Python or something too, you could just like save that handle inside of your callback to some global value. It can escape its region, right? So if you like do with open file descriptor in Python and then inside of there, you like save the file descriptor.

Starting point is 00:43:23 And later in your program, after that original thing's already closed it, you're outside of that region now. You try and read from it again. It'll be an error. It will crash, right? But I mean, that's pretty easy to implement in C or C++ as well, right? Like, especially C plus plus. To do what part? You just wrap that, wrap file handles in an accessor and that will just work. Right? So, like, if you wanted that protection in C, you could just get it, right? Um, like you don't need a language.

Starting point is 00:43:55 Rust being defeated as we speak. No, I mean, I'm not trying to just on Russ. I'm just saying, I don't understand why you couldn't just make that same thing by just making sure that instead of using like an int as your file descriptor, you actually use a struct that does something, you know, or, you know, a C plus plus class for, yes. Upjuriant programmers, right, whatever. But, but I'm saying like sort of regardless of that, I think, I think we're not, we're not talking about the thing.

Starting point is 00:44:18 I'm saying, like, we could prevent in, like, rust from the, with the borrow checker, you can prevent the value from escaping its region, right? Which I don't, which I, maybe there is a C plus. I'm sure there is a C++ thing that can do that. I don't know. I just assume everything is possible. You're just talking about like, you want, like, because they have special cased file descriptors with the static analysis in Rust or something like this.

Starting point is 00:44:45 Yeah, I think I'm, I'm saying. You're talking about type system and I'm like, well, the type system in both can do what you're talking about. So we're just talking about the type system as opposed to other things. Right? Yes. I'm, I'm just talking about the kinds of things.

Starting point is 00:44:56 Like, this is, you know, as like, I'm trying to present the case for why someone would say they would rather write this kind of thing in Rust, right? Is instead of experiencing a crash with Phil C at runtime,

Starting point is 00:45:09 okay. We could find that out from the compiler. Right? That, like, oh, this kind of access isn't allowed. That's not thread safe. That's not,

Starting point is 00:45:19 that escaped its region, but it shouldn't. This kind of value, right? Like there are things that we can do there, right, in like Rust with the type system that fight the same kinds of bugs you might get, you might get solved from Phil C. Right. But instead of having a crash at runtime, we get, it doesn't compile and I handle that case. Yeah, the borrow checker defeats use after free and double free at compile time.

Starting point is 00:45:45 That is the objective of the borrow checker. Yes. That's a probably way more succinct way to say that. Good job, Ed. I like to think of it as just like it's effectively a unique pointer, but at compile time. Right. So my point being, separately from, I think it's obviously good that there's an option where we can solve a bunch of these classes, classes of bugs in C and prevent it, right? In this case of like which language would I pick for certain projects, there are definitely like reasons you would pick rust over just picking Phil C.

Starting point is 00:46:21 for like a new greenfield project. Of course, there's other reasons why you might pick C over Rust as well. Like, you might actually ship it and things like that, which is cool. So, but that's like those kinds of bugs, Phil C does not solve that at compile time. That's all I'm trying to say as like a comparison of the two. And that's like a reasonable tradeoff that people want to make. Yes. And that's also that's where you get the performance from.

Starting point is 00:46:47 That's because of the barrow checker, not the type system, right? just so we're clear on that part, right? I mean... Or just so I'm understanding it, I should say, right? Because that was the part that threw me, I was like, why is that a type system problem? Okay. Technically, actually...

Starting point is 00:47:00 Okay, so it is actually the type system. Well, the borrow checker is sort of like a result of the linear affine type system that Rust has. Okay. And so... I would too dumb in this conversation. I don't know what's going on. I would just say they go together.

Starting point is 00:47:14 You would, like... Right, so you're talking about type system, the type system being like, including the borrow checker. Yes. Okay, gotcha. Yes, exactly. That makes sense.

Starting point is 00:47:22 So I also have another question, which is that does Rust catch all of the same memory errors that Phil C does? Because I don't know if that's true. No. Phil C. might catch more. It does. Yeah. That's the best part. If you really care about security, you probably shouldn't be using Rust.

Starting point is 00:47:40 Yeah. Russ will catch them, but it'll turn the condition into a DOS. So it's like, it's not an exploitable memory access, but it's also not caught a compile time. So like, you can do that. I mean even at runtime, does Phil C catch things that Rust doesn't catch, is my question. At run time? I thought I saw a condition.

Starting point is 00:48:00 I unfortunately forgot to save it, but it's a condition in which Phil C does not catch something in which Rust does. I really wish I would have. Unsaf blocks, there are, I think, certain things you can do that in Rust, inside of unsafe blocks, that Rust does not check. Real quick. Even all of its stuff.

Starting point is 00:48:21 And maybe still, I'm not actually 100% sure if it will do a CVE. But inside of unsafe box, that Phil C does catch. I do it. I do have to go in five minutes. I have a question for the off. Memory safety issues. Yes, Phil Ed. Sis call related memory safety issues.

Starting point is 00:48:35 Are you saying, for example, in a cis call handler where it does a copy to user, like outside of the bounds of a memory region? Like what about cis calls in particular? I'm curious what you're saying there. And then I do have to run. I'm sorry, guys. I just want to say he said he did confirm CVN unsafe blocks is totally possible. He said yes.

Starting point is 00:48:55 Okay, how would Phil C detect that? Yes, Piz. Give us the, give us the nine yards. And then we'll have to just have Phil come on. Yeah, 100%. I'm so curious. The right thing is to have him on the show. If he, if song as he wants to come on, I mean.

Starting point is 00:49:09 He said yes. Okay, great. Yes. Yes. Yes. There's also, I do think, can I just mention what other thing as the Russ person? I'll just put this back on. Oh.

Starting point is 00:49:22 Sometimes 20% slower or up to four times and 1.5 to 2 times as much memory actually makes it not possible to do. Constrain hardware or like other stuff like that. We'll just literally make it so it's just not possible to accept the performance hits because some places in the world, performance still actually matters and you can't just spin up 5,000 new lambdas and pay $10 million in cloud costs. Okay, there you go. You're just summit. You're going to summon DHS, dude. You're going to summon him. He's going to be like, yes, we can. He's like, yes, we can. We will spin up all of them.

Starting point is 00:49:54 No, he's against that. It's also true. You've got to be careful. You can technically spin up all of them. You just pay a lot of money. This is blowing my mind right now. He said that it intercepts all cis calls of the ABI layer, which makes sense. Phil, I know what we need. We need a slideshow with these examples, and we can work on it together. I'll message you after this on X, and we'll get the set up. We'll come back with a slideshow with code examples, and then, Ed, you can ask every single question you want. I've been wanting to do this a lot. Ever since trashed the, the TypeScript presentation, I've been wanting to do more presentations and then have us all interrupt the presentation

Starting point is 00:50:27 to be like, but what about this? What about this? What about this? And that was such a fun time. Yes. All right. Well, then let's end it right now. Low level is going to take a question.

Starting point is 00:50:36 I have to say one quick thing. Oh, go ahead, Brian. I will say that every time I use a union and see, I get so happy and so sad at the same time because it does make me long and yearn for rust. I will say that the automatic tagged unions at the syntax level is just a thing of glory and beauty 100% of the time. But don't worry, in another 50 years, the C++ committee will finally get one working.

Starting point is 00:50:59 We're almost there, guys. One more committee. One more version. One more 2,000-page revision, and they'll get the discriminated unions working properly instead of sucking. Casey, if that were true, then why are some people using, like, C-17 instead of C-94? That doesn't even make sense, then. It doesn't make sense. Go, checkmate.

Starting point is 00:51:18 Because 94 is bigger than 17, so we should be using that one, don't you think? 2020, 2094. I got to go. Bye. Sounds worse then, really. All right, everybody will go as well just because this is happening and already left us and now the stream is completely screwed up. Thank you, everybody, for joining us for this. Hey, if you are watching on YouTube, you can see everything on Spotify.

Starting point is 00:51:42 You get the full episode because you miss a lot of the banter when you only watch it on YouTube. So thank you very much for watching. Thank you for all the guests. Ed with low level. Dot Academy, Casey with Computer Enhanced.com, and Teage with boot. Dot dev slash Teage.

Starting point is 00:51:58 Or by the way. Or by the way, if you want to support both T.J. and I, because I also have two courses on there as well. I just did like a Spider-Man thing. And the name is the Spiteagen. All right.

Starting point is 00:52:09 Thank you, everybody. Boot up the day. Vibe coding errors on my screen. Terminal coffee And here

The Standup with ThePrimeagen - Memory Safe C

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.