CppCast - Secure Coding and Integers

Episode Date: March 3, 2022

Robert Seacord joins Rob and Jason. They first talk about a constexpr wordle game and constexpr unique_ptr being added to the standard. Then they talk to Robert Seacord about secure coding and his tho...ughts on integers. News Wordlexpr: compile-time wordle in C++20 January ISO Mailing Constexpr unique_ptr accepted Rainer's bundle on sale 1/2 off with sales going to support Ukraine Links Joy of Coding Humble Bundle Effective C - An Introduction to Professional C Programming Secure Coding in C and C++, 2nd Edition The CERT C Coding Standard SEI CERT C++ Coding Standard ISO/IEC TS 17961 ISO/IEC TS 17961 - Review Draft Sponsors Use code JetBrainsForCppCast during checkout at JetBrains.com for a 25% discount

Transcript
Discussion (0)
Starting point is 00:00:00 Episode 339 of CppCast with guest Robert Secord, recorded March 2nd, 2022. This episode of CppCast is sponsored by JetBrains. JetBrains has a range of C++ IDEs to help you avoid the typical pitfalls and headaches that are often associated with coding in C++. Exclusively for CppCast, JetBrains is offering a 25% discount for purchasing or renewing a yearly individual license on the C++ tool of your choice, CLion, ReSharper C++, or AppCode. Use the coupon code CHEPBRAINS for CppCast during checkout at www.chepbrains.com. In this episode, we talk Constexpr Wordle and Unique Pointer. Then we talk to Robert Secord. Robert talks to us about secure coding and integers. Welcome to episode 339 of CppCast, the first podcast for C++ developers by C++ developers.
Starting point is 00:01:32 I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? I am good, Rob. How are you doing? I'm doing good. I saw you released your latest C++ Weekly about the constexpr issue you finally figured out, right? Yes, yes. And actually, that's what I was just responding to someone about a moment ago. Proving someone wrong on the internet? I mean, so it's interesting, because like I said, it took me like five years to come up with a good solution to this. That's what that was the whole point of the episode.
Starting point is 00:02:01 The number of people who commented initially, because I don't read all the comments all the time, it's far too many, but I'll often see all the comments that come in the first couple of hours. And almost everyone said, wow, I just had an issue like this yesterday. I've been looking for a solution to this problem. Like, thanks. Like lots of people were like, wow, thanks for this solution. A couple of people were like, well, I think the language should work differently. Well, I mean, it doesn't. So, okay. And then I got two bug reports open on the project, basically, because I did release the tool open source, like I said I would in the video. I got two bug reports, one bug report and one pull request, both telling me that I was doing an unnecessary extra step because of course,
Starting point is 00:02:45 the compiler won't do the work twice. But you showed it in the video. I show in the video. Well, I don't prove that it has to do the work twice in the video. But then the person who opened the bug report actually proved themselves wrong. Shortly after that, they post another comment, like I just tested it. Oops, you know, basically. So anyhow, so I was just using Fred Tingaude's BuildBetch, which we've mentioned on the show before, to do a quick comparison between the different options. So that, yes, in fact, the compiler does actually do the work multiple times. So now I have a handy data point for when the topic comes up again.
Starting point is 00:03:21 I can be like, well, until compilers change, this is the situation that we are currently in. Okay, well, at the top of every episode, I'd like to read a piece of feedback. Jason, you brought up this one with someone emailing you about both CppCast and your YouTube channel. And this is from Rakesh saying,
Starting point is 00:03:39 I use CppCast to catch up on the latest in C++. You guys are doing a really good job here. I had one comment that you could do to improve in the cast, and that is include more notes or links. Many times I don't even know about the things you talk about, like most recently, Jakob talking about Conan. I wasn't even sure how to spell it out, but on Instinct, I searched this and did indeed find out that it is exactly how it's spelled. I think things like that would be really useful for developers. I responded to Rakesh privately saying we do try to put these things in the notes.
Starting point is 00:04:08 It's hard for us, you know, because we talk about a lot. Yeah, so I mean, obviously, we always have like, you know, a couple news items, and we always put those in the show notes. But if something just kind of like organically comes up during the conversation, you know, we kind of need to remember to make a note and sometimes we forget to to add it to the show notes yeah definitely we'll always try to be better about that and always try to include as much as we can uh in the show notes well we'd love to hear your thoughts about the show you can always reach out to us on facebook twitter or email us at feedback at cbcast.com don't forget to to leave us a review on iTunes or subscribe on YouTube. Joining us today is Robert Secord. Robert is standardization lead at Woven Planet, where he works with Toyota and its suppliers to accelerate software development
Starting point is 00:04:55 while improving quality. Robert is the author of several books on C, C++, and Java programming, including Effective C. Robert, welcome to the show. Hey, thanks. Good to be here. What does standardization lead? What is your job and title? Yeah, that's a very good question. I just started this week, so mostly I don't know. But it's basically to try to standardize internal software development practices and also to try to influence some external standards like C and C++ and get involved with some of the security and safety groups and
Starting point is 00:05:34 MISRA, things like that. Wow. You enjoy the new role that you have found. Yeah. So far, the first two days have been great. On my third day, I get to talk to you guys, so what could be better? Well, Robert, we've got a couple news articles to discuss. Feel free to comment on any of these. And we'll start talking more about your books and your work on secure coding and things, okay? Yeah, sounds good. Okay, so this first one is a blog post on Vittorio Romeo's website. And this is WordleXper, compile time Wordle in C++20.
Starting point is 00:06:13 He's got a nice video of playing Wordle in this C++ code that he wrote. Fully, you know, constexpr code. I think Wordle is kind of phasing out, but it was kind of fun to see this. Yeah, just barely missed the fad there. Did you get into Wordle by any chance, Robert? Either one of you. Oh yeah, I guess you're Rob and I'm Robert for the purposes of recording. I was playing it for maybe six or seven days and then I got a little bit tired of it. It's hard to avoid that temptation to bore your friends
Starting point is 00:06:46 by posting your result online. Was there anything particularly interesting about the code? I didn't dig into the code personally. I just watched the video. I kind of wish that it had some way of automatically forwarding state to the next instantiation of the compiler. Yeah, with every time you run constexpr or at all, you need to send in the state as the compiler. Yeah, with every time you run constexpr wordle, you need to send in the state as a compiler flag so that it can run again and update with your new guests.
Starting point is 00:07:14 I guess I could do one of those has includes to see if a previous state header file exists. And if it does automatically include that and ignore the seed that you explicitly passed in or something, then you just have to delete the state file each time. So it can't write its own state file during compilation. Curses. When are we going to
Starting point is 00:07:37 get things like hash embed and stuff? John Heat is working on it for C at least and progressing in C++. It's definitely not going to make it into 23 for C++. I think at least at one point, John Heade's hope was to get it accepted in C so that C++ had no choice but to accept it. We looked at it in the last meeting and, you know, it was sort of big and scary enough that, you know, a lot of the WG-14 committee members got a little bit, you know, afraid of it. So we're going to have to see it again.
Starting point is 00:08:11 And I'm not sure what approach John Hughes is going to take, but maybe if it could be sort of reasonably cut down, that might make it sort of more digestible and more acceptable to the C committee, at least. I swear, it's about once every other week or so that I think, man, wish I had a way of easily including data in here without having to do some pre-processing step or some weird build system magic or something like that. Yeah. I've had to enter a few larger arrays from time to time, and it's not any fun. Yeah. So you're involved actively in the C committee, but not the C++ Committee, is that right? Yeah, I'm getting there. So I think my last involvement with C++ is I
Starting point is 00:08:56 hosted a meeting here in Pittsburgh. I don't even know how long ago it was, maybe 10 years ago. It was the year that the Pittsburgh Penguins and the Pittsburgh Steelers both won the championship. So thanks for giving me a chance to get that in there. And so, yeah, so right now I'm in the SC-22 compatibility group that's led by Aaron Ballman. So I'm going to get my feet wet there. And I'm also joined the SSRG, the safety and security group within C++. So, you know, pretty soon I'll have to get my Insight memberships in for, you know, Underwoven Planet, change my organization. I guess the third organization I've represented at, you know, in the language committees.
Starting point is 00:09:39 That's a good segue to our next news item. I feel like I may have made a mistake by putting this in here. We already discussed the January mailing of C++, ISO C++. I am wondering that because this was just a week ago. So maybe we haven't. But for the sake of distinguishing the Robs in the room, Robert, there's a bunch of SG22 compatibility things in the last ISO C++ mailing. Yeah, we're looking at a bunch of things. Of course,
Starting point is 00:10:14 the timing's not great now because all sort of new proposals for C23 are past due. So we had some compatibility papers that made it in. And to a certain extent, I've become the designated chump for, you know, reformatting C++ papers for C. I did a UAX 31 paper that got adopted into C. So we're going to use the same format for identifiers for you as C++. And we just discovered this past week that, you know, we managed to disallow the use of Unicode characters and identifiers that weren't entered through, you know, universal character names. So now I'm, just before this podcast began, I was working on a paper to fix that problem. So presumably those sort of defects can be still looked at, because that might be a pretty significant breaking change for C23 if we don't address that.
Starting point is 00:11:09 I actually worked with my new boss, JF Bastion, on a volatile paper. Oh, your boss is JF? Yeah. We know JF, yes. Yeah, yeah. So you can express your condolences in whatever form you like. I can't think of the last time I had a manager who could discuss things technically in depth. So this might be a new experience for me.
Starting point is 00:11:34 So far, it's been great. You know, we actually had a, and I don't mean to segue you yet, but we had a chat about integers last night. Because he is actually in Tokyo, and I'm in Pittsburgh. So, you know, we talk mostly in the morning and the evening. Yeah, so I just want to make sure we were kind of aligned with, you know, the same views on integers. Turned out we are. That was good. You should take advantage of this time frame where he's recently become a manager, but still knows technical things and can discuss them with you because who knows how long that'll last before he loses that ability i'm sorry i'm being terrible here i get the joke i
Starting point is 00:12:11 mean you know we used to have manager training but you know as a stopgap measure you know we'd bang their head in the door a few times slightly lobotomize them so they could function you know as a manager. So just on a more serious note, though, I have been very curious about this basic lambdas for C proposal that's been going around in the C committee. And now if it gets approved in C, we have to do something about it. It's one of these compatibility papers in the ISO mailing list. Just curious if you have any insight or opinion on C lambdas. Yeah, I like the idea of doing something. It's a little bit strange.
Starting point is 00:12:56 You know, just over the past six months, even, the C committee has become a little bit more conservative, regressive. So we've kind of added some new members who think, you know, C is fine the way it is. And maybe, you know, we'd gone too far and we should backtrack to C89. And so it's become a little bit harder to adopt C++ features. So Jens Gusted, you know, he presented sort of a collection of papers all around the topic of, you know, generic programming in C to kind of improve generic programming in C, which is, you know, really terrible as it is, really broken. You know, they didn't go very well. So he's going to make some changes and bring them back again. But I'm very much in favor of Landis because I've been proposing a kind of a defer mechanism for C based on the Go language. And everyone agrees that the defer mechanism looks a lot better with Landis,
Starting point is 00:13:47 but not everyone agrees that that's a reason to introduce Landis. So hopefully we can make some progress. Hopefully we do get that into C23. Can you explain what you mean by a defer mechanism? I'm not familiar with that. Yeah. So it's a new keyword, defer. It's really based closely on the go mechanism. So what happens is you allocate a resource, memory, open a file, and then if that allocation succeeds, you might say defer the free of the memory or the free of closing the file. And what will happen is when you exit something, that code would then be executed and that resource would be released. You know, we don't have all the fancy C++ stuff in C, right? So we don't have destructors and we don't have exceptions and all these things. And so, you know,
Starting point is 00:14:40 I think this goes a long way to kind of addressing resource management and C. And when the idea came up, it really had a lot of immediate support. You know, people really got behind it. And so WG14 voted to produce a technical report that, you know, kind of documents the feature because this is a new feature without much implementation experience so you know we're gonna you know pursue that tr after c23 is wrapped up and yeah there's a lot of excitement because it's something kind of new you know a lot of the enhancements we're making to c are almost bug fixes you know if aaron were here he'd say oh we've got attributes you know but you know i think this new c23 is kind of lacking any really you know big splashy new feature that's going to get people's attention yeah like you kind of said before it's kind of how c rolls slower adoption
Starting point is 00:15:39 even than c++ i feel like oh yeah i mean in committee, you know, we consider C++ to be an experimental language. There's that. I don't know what to say. I guess going back to the news, the next thing we have is, you know, another ISO paper. And I think this one just did get voted in. And this is Making Stood Unique Pointer, Constexpr. Yeah, Andreas Svertig. I think it's his first paper accepted into the standard.
Starting point is 00:16:09 Have any plans for what you'll be able to do with this one, Jason? The experimental code that I was working on immediately before this call to do compile time benchmarks, I had to basically roll my own constexpr unique pointer effectively for what I needed to do. Because the only shipping compiler that supports constexpr vector and string is Visual Studio. So I had to hand roll something real quick that kind of smelled like a constexpr vector for the sake of proving what I needed to
Starting point is 00:16:40 for this compile time stuff. So yes, it would have been easier if I had Unique Pointer. All right. And the last thing we have here is Reiner Grimm, who we mentioned a few weeks ago is doing this training, is also putting out that you can get some of his books available with a coupon, a bundle of C++ collection for half the price, and all proceeds are going to go towards Ukraine, which is great. Yeah. Yeah. Those guys need all the help we can give them. Yeah, absolutely. Check the news about it every morning. It seems like they're still able to hold the cities and everything. So I hope things are going well. I hope it's over soon. Yeah. I feel like the State of the Union address felt overly optimistic at the moment,
Starting point is 00:17:26 but hopefully everything does work out well for the Ukrainian citizens. Yeah, I think they've been exceeding expectations. I think in the long run, at least, they'll win. You know, they'll be successful. A lot of pain and suffering before that goal is achieved. So I might as well get this plug in, but there's also a Humble Bundle available right now from No Starch Press that includes my Effective C book and some other books. And the proceeds of that also go to charity, although I didn't check to see what
Starting point is 00:17:58 the charity was. I don't think it's going to go to Ukraine, but I could be wrong. Yeah, and Humble Bundle is, you know, they're always doing different charities, usually like selling video games and things like that at heavy discounts. That's pretty cool. I'll have to get the link from you for that for the show notes. On the subject of your book, you have written a few. Do you want to tell us a little bit about some of the highlights of the different books you've worked on? Yeah, you know, writing books is interesting. It's kind of, you know, women have this hormone that's released after they have a child that makes them forget how awful a process that is. And I think, you know, authors have this same
Starting point is 00:18:35 thing in some sense. You know, every time I finish a book, the first thing I say is never again, and then I go ahead and write another one. But I don't know. I had some early books on kind of software process type things. And 2003, I moved into a CERT organization at Carnegie Mellon University's Software Engineering Institute. And, you know, decided C was where all the problems were. And so I kind of reinvested myself into C language programming and wound up writing Secure C coding in C++, which has been a pretty successful book. It's still selling pretty well. It's in a second edition now, but it's still going pretty well. A few years ago, I was up at Recon in Montreal, which is a reverse engineering conference, and I gave a talk on dangerous optimizations in C, or as David Keaton calls them, optimizations.
Starting point is 00:19:27 Bill from NoStarch, he grabbed me in the hallway and he wouldn't stop irritating me until I agreed to write a book for him. So that's how Effective C came about. But it's been a good product. I think people like it. Everyone's having a hard time figuring out if it's an introductory book or not. I mean, so it starts at the beginning and it covers everything, which makes it an introductory book. But I didn't want to teach people how to do things wrong and have them need to relearn how to do them correctly. It goes into, you know, kind of the necessary level of detail, which is deeper than you would normally find in a crash course or introductory book. But people do seem to like it, although they still sort of seem to be confused about who the audience for the book is.
Starting point is 00:20:17 That's an interesting comment and something I've thought about and something we've discussed a little bit with other authors on the show before, Rob, is, you know, if you want to show a simple example, but you don't want to show bad practice and bad code at the beginning of the book, you know, what do you do? Do you show the simple example that's bad practice and put a comment saying, this is terrible, don't ever actually do it this way, we'll fix it in the next chapter? Or do you show enough background detail that you can show the best practice at the first page? It sounds like that's something you kind of tried to balance from what you just said. Yeah, I'm not sure I tried to balance it. I mean, I went for kind of just trying to have good code most of the time. I guess, you know, I'm thinking my intro chapter now, and I did have kind of a, you know, here's a first pass at this, and here's a second pass at this, and here's a third pass at this. So I incrementally sort of improved the first example. But before this, I did the Cert C secure coding standard, which I forgot to mention. That was also published by
Starting point is 00:21:18 Addison Wesley. But we developed that on a wiki, the SEI. You know know we had sort of anti-pattern code then we had good code and we just put the anti-pattern code in sort of a pink box and the good code in a blue box or green box to kind of try to discourage people from taking the bad code and cut and pasting it into their systems but there's always this risk of you know showing novice programmers bad code because they might not realize it's bad code and they might decide to reproduce it in their system. You just gave me a wonderfully terrible idea. I'm thinking about your anti-pattern code in its pink box, right? And then you have other code that's the same color as the background intermixed with the code that you're trying to show. So it's like one of those things from an old school, like the back of a cereal box
Starting point is 00:22:13 where you had to have the decoder, like the red filter, right? So if someone were to copy and paste the bad example, when they paste it into the thing, all of the code that they couldn't see is now also in there and the example's broken, so they can't even try to use the bad example. Oh, that sounds like fun. Sounds like kind of a use case for bitty. It's a cute term for bidirectional,
Starting point is 00:22:37 like Unicode characters, right? So they are... Oh, okay. Yeah, they're these Unicode characters that let you kind of reorder the characters in your source code or even the tokens in your source code. And what people have been showing, demonstrating recently is that you can put these bitty characters into comments because it's not parsed, obviously. You know, basically take code that looks like it's actual code, but it's code that's in the comment, right? So you might have a line of code that says, you know, if admin do this, you know, very secure thing.
Starting point is 00:23:13 But in reality, the test to see if your admin is in the comment. So it's not actually compiled into the executable, although the source code appears to have it there. This is what the security, the C++ security and safety group is looking at right now, addressing that and various homoglyph-type attacks where you've got great characters which look the same as Latin characters, and so you think you're accessing one identifier, but really it's a different identifier. So addressing a range of security issues around Unicode. Both Facebook and YouTube say that C++ has an excessive number of special characters in it, just C++ by itself. This is something I've run into several times, and I'm trying to post videos and articles and stuff. And so I
Starting point is 00:24:05 ended up having to find a plus character from Inuit language symbols to be able to actually put in. So that's how C++ Weekly is C++ Weekly on YouTube because it's not actually pluses. Oh, clever. Yeah. I might write my next C++ program in Ugaritic or something. But yeah, there probably are too many characters in C++ and C. There's papers for C and C++, which parallel Unicode Annex 39, which tries to kind of address Unicode security. And part of that is to sort of further restrict the set of characters. And I think it's probably too late for C.
Starting point is 00:24:50 I mean, C has discussed it and so it's too late for this, which is kind of unfortunate because, you know, as we go forward and, you know, we expand the use of Unicode and, you know, various compilers start supporting UTF-8. I mean, they're already supporting UTF-8. People start to use these characters and then down the road, we might have to take them away, which is going to obviously break existing code. It's a no bueno situation, I suppose. I wonder if the discussion for just a moment to bring you words from our sponsor. C-Line is a smart cross-platform IDE for C and C++ by JetBrains. It understands all the tricky parts of modern C++
Starting point is 00:25:30 and integrates with essential tools from the C++ ecosystem, like CMake, Clang tools, unit testing frameworks, sanitizers, profilers, Doxygen, and many others. C-Lion runs its code analysis to detect unused and unreachable code, dangling pointers, missing typecasts, no matching function overloads, and many other issues. They are detected instantly as you type and can be fixed with a touch of a button, while the IDE correctly handles the changes throughout the project. No matter what you're involved in, embedded development, CUDA, or Qt, you'll find specialized support for it. You can run debug your apps locally, remotely, or on a microcontroller,
Starting point is 00:26:04 as well as benefit from the collaborative development service. Download the trial version and learn more at jb.gg slash cppcast dash clion. Use the coupon code jetbrains for cppcast during checkout for a 25% discount off the price of a yearly individual license. Definitely want to talk more about secure coding. But before we dive deeper into that, I did want to back up and give you a chance to talk about integers. Because when we were first setting up the interview,
Starting point is 00:26:35 you said you definitely wanted to talk about integers and how you thought the C++ community was not handling integers well. Is that right? Yeah, I thought, you know, what the hell, I'm addressing C++ community. I might as well start a huge fight. Yeah, like some years ago, I sent a proposal to the ICSI conference, the International Conference of Software Engineering, do an eight hour tutorial on on integers, they turned it down, of course, right, because who in
Starting point is 00:26:57 their right mind would sign up for that. But the guy, you know, email me and he said, you know, can you really talk about integers for eight hours? I said, sure, I'll just talk really fast and skip the boring bits. So one of the issues I've been seeing, John Heath brought us a paper on bit operations to try to introduce them into C, rotate operations and so forth. They all had signed int for the count, right? And count is something that where you have zero items and then, you know, you get more items, right? But you never have negative five apples or negative five bits, right? And so I think the C way to look at, you know, signed integers is that you'll use a signed integer if you're trying to represent a value which can have both positive and negative quantities.
Starting point is 00:27:52 But I think C++ has a different, you know, although, you know, incorrect view, which is, you know, that unsigned integers don't really give you all that much greater range, so the range is not that much of an issue. And unsigned integers are something which will have modulo behavior, as opposed to something that's used to represent only zero and positive values. The other thing that seems to be driving C++ is this sort of mistaken belief that, well, okay, so sine integer overflow is undefined behavior. We all know that. But it seems like a lot of C++ decisions are driven by the idea that because sine integer overflow is undefined behavior, we can have tooling, which then does something different for that undefined behavior. So when it detects the overflow, it can trap or do something different. That sort of leads C++ to specify APIs that use signed int instead of unsigned int places, you know, for things like count where unsigned is more appropriate. Sort
Starting point is 00:28:59 of with this idea that we're going to encourage our users to write bad code so that we can detect that bad code in tooling, right? And that seems really misguided, right? So I think the thing that people are really missing is that it's okay to write non-conforming tooling, right? So if you look at UBSAN, UBSAN has a flag to trap on unsigned integer wraparound, right? So they've implemented those tests. You can do that, right? Not every implementation, not every, you know, set of flags that you pass to your compiler invocation has to be conforming to the standard. And there are compilers, you know, like IBM, which by default are non-conforming to the
Starting point is 00:29:42 standard. I mean, there are sets of flags you can use to have a conforming implementation, but by default, their implementation is not. They actually treat unsigned integer wraparound as undefined behavior to improve their ability to optimize loops and so forth. And, you know, that community has embraced that behavior, even though it is non-conforming. I agree that I have heard so many of these arguments about signed versus unsigned. And a lot of them stem from one
Starting point is 00:30:12 particular CPPCon talk that was given probably about five years ago now that talks about unsigned integer math being slower. And partially it comes down to the fact that the compiler can't eliminate some certain cases of undefined behavior. So in some cases, if you use signed integer math, it's faster. And so this is the kind of thing that I've had to deal with several times because in my role as an educator and YouTuber and whatever, people will come up to me and like, well, didn't you see that one video? Don't you know that signed integer math is slower?
Starting point is 00:30:50 And I'm like, you know, sure. And that one hyper-specific optimizer case it is. But then I see people who, because they have this in the back of their minds, they take all the places where the standard currently does unsigned integers, like the size of a vector or whatever, and then they just blindly cast that to an int. And they actually pay a small but real cost
Starting point is 00:31:16 of doing a conversion from an unsigned 64-bit int to a signed 32-bit int. Most platforms, it has to add a couple of instructions when it's actually faster to just stick with the type that the standard already gave you. So my mentality personally has always just been to stick with the type that I was given. I don't really care. You gave me an unsigned size, I'm going to use an unsigned size. Yeah. I mean, the argument in this talk appears to be, you know, hey, if your code is wrong and have undefined behavior, we can make that code really fast. You know, I mean, I sort of disagree with that whole premise, right? Hey, if you don't care what results you get, hey, we can make it super fast.
Starting point is 00:31:56 So I would say unsigned integers are faster in many situations where you want to correct result. And, you know, even things like, you know, signed integer, you know, say a remainder operation with signed integers can result in overflow, which is surprising to most everyone. So division, right, if you have int min divided by minus one on a choose complement architecture, which is now you know c23 and c++ support that will overflow right because choose complement is this asymmetrical range where there's always one more negative value than positive value right so if you take the minimum value and divide it by minus one you're going to produce a value that can't be represented in that sign type right so you'll have overflow so it turns out that if you replace that
Starting point is 00:32:46 division with remainder operation, you have int min remainder minus one. Mathematically, that produces zero, which say, you know, the preprocessor will have no problem giving you a zero. So literally like zero modulus minus one. Is that what we're saying? No, no, int min. Int min. Okay, okay. So the minimum value, you know, you can represent as that sine intertype. If you divide that by minus one, right, it's going to be too large to represent a two's complement number, because two's complement always has one more negative value that can be represented in positive because you know zero is basically
Starting point is 00:33:26 represented as a positive representation right two's complement doesn't have negative zero right like one's complement or sine and magnitude representation would have so when you go to remainder and i'm not going to say module because i don't believe it's a module operation in the c standard again which I'm more familiar with, in the text of the standard, it never says the name of that operator is, but if you look in the index, it's referred to as the remainder operator, and it does actually give the behavior of remainder operator. So int min remainder minus one, mathematically, that should produce a zero.
Starting point is 00:34:02 Any number divided by minus one will have no fractional part remaining, right? But on, say, x8632 on Intel processors, remainder is implemented as division, as a sort of, you know, so you'll divide and then you'll get the remainder. And so it'll generate an IDiv instruction. And the IDiv instruction, if it's given int min divided by minus one, it will overflow. So the remainder operation on Intel platforms and other platforms will actually overflow and result in a fault. It used to be implicit undefined behavior in C, meaning we forgot to document its undefined behavior. But now it's, I can't remember if it's in C11 or C17.
Starting point is 00:34:44 I think C11, it's explicit undefined behavior. That's interesting, too, because sometimes, you know, you have people come up that are like, well, why don't we just eliminate all the cases of undefined behavior? And you're, if I'm hearing you right, you're effectively saying this is also undefined behavior at the CPU architecture level. Yeah, I mean, undefined behavior is very misunderstood. I mean, people tend to think of it as errors in the standard, you know, but which have actually introduced undefined behaviors into the C standard. You know, the first one being making remainder operations explicitly undefined for those values. But, you know, undefined behavior is really there for one of three reasons, right? So the first reason is because you have a defect that is difficult, potentially impossible to diagnose, right? So the standard
Starting point is 00:35:46 can't require a compiler to diagnose it, so it makes it undefined behavior, which leads to the possibility that, you know, it will be undiagnosed and, you know, the code will just do some random sort of thing. In C, we used to say it could go off and play the game of life. I think C++ refers to nasal demons, just because of the more satanic nature of the C++ committee. The second thing undefined behavior does, it just basically allows for the implementation behaviors, right? So if processors can vary in how they perform in operation, C and C++, don't want to dictate, you know, int min remained or minus one has to produce a zero, right? Because on Intel processor, that would require, you know,
Starting point is 00:36:35 emitting some branch code, right, to test for these values, you know, do something appropriate. And so, you know, they basically allow implementations to sort of have their default behavior instead of forcing a particular behavior on it, right? So that's the second use of undefined behaviors to give this license. And the third use, which is probably the least understood, is that sometimes it's just there to allow implementation extensions, right? So if you do something like f open on a file, there's a set of defined mode characters. And if you specify a mode character that's not defined by the standard, that's undefined behavior, right? So if you do that, nothing particularly weird is going to
Starting point is 00:37:17 happen. But what happens is, you know, you've got these file systems with really varying capabilities and security is tend to be one of those, right? Default permissions, things like that, the mode you want to open the file in. And so, you know, the C standard, presumably C++ as well, makes that undefined behavior so implementations can add additional mode characters in order to access those additional features of their specific file systems. We just add it unreachable in C, and that basically is a way of allowing the user to inject undefined behavior into the program to help with optimization.
Starting point is 00:38:02 So that's now a user feature got voted into C23. So there's support for that because, you know, C compilers and users like to optimize code, right? They like to have highly performing code and this can help with that. But there are standards like MISRA which just say no undefined behavior, right? So you couldn't use unreachable
Starting point is 00:38:23 in a MISRA conforming program, you know, or, you know, use a mode character besides the ones defined by the standard for a call to F open, which actually means that, you know, your MISRA conforming program can be less secure, right? Because you're not able to access some of these additional mode features that would improve the security of your program. And, you know, hopefully that's something MISRA will get around to fixing, you know, as they kind of extend to sort of, you know, more complex systems, you know, having kind of Linux as a, you know, operating system on safety critical systems and so forth.
Starting point is 00:39:02 I guess, is that something that you might be focusing on more in your new role working on MISRA standards? Yeah, I think I'm supposed to get more involved in that. You know, I know the MISRA guys have been asking me for a long time to help them out. And it's not actually a public, you know, standards body. So I keep saying sure, you know, and asking them to pay me. They don't seem to have the money. But I guess going forward, it sounds like Toyota or Woven Planet will, you know, pay me to help develop MISRA standards. So it should be fun. MISRA is specifically used in automotive, is that correct? Well, you know, it was originally designed for automotive. It's motor industry standard for reliability. And I'm not quite remembering the acronym, but I think originally
Starting point is 00:39:52 I might be getting this wrong. It might've been Rolls-Royce and Jaguar in the UK both had safety standards for C and they decided, you know, it would be easier if they sort of combined them and then handed it off to a different organization to manage for them, you know, reduce costs of maintaining the standards. So that was the origin, and it's been managed by sort of independent companies. I'm not sure, it might have been a recent change that I haven't kept up with, the governance of MISRA. But yeah, until recently, it's sort of been managed by these private organizations. And your books, several of them are on, say, CERT in the title. And I think we've mentioned that a couple times. But what exactly is CERT? Yeah, so once upon a time, CERT stood for Computer Emergency Response Team.
Starting point is 00:40:46 But, you know, with all acronyms, eventually people discovered that you cannot copyright or trademark an acronym. And so CERT no longer stands for anything. It's just CERT. Same thing with JDBC. It no longer stands for Java Database Connectivity. It's just Jibbica. It's just CERT. I didn't even know there would have been a reason to worry about trademarking JDBC, honestly. I started my career with IBM, and they had that problem. They couldn't trademark IBM because it's an acronym. They made a little blue logo thing and they trademarked that. I remember somewhere in the Hudson Valley, there was a winery. They reused like that logo for the winery and, you know, IBM just sued them into the ground. They're
Starting point is 00:41:37 very aggressive about protecting their name and their trademark. So the books that you have that are about CERT, is that a specific set of standards also? I'll tell you the story of this since you asked. So I was at a C-Standards meeting in Berlin in 2006, and I had some members of the C-Committee approach me with the idea that CERT should develop a secure coding standard for C because they had issues with MISRA at the time, perhaps more so than now. So part of their issue with MISRA was that, you know, MISRA had thrown the baby out with the bathwater, in particular by, you know, not supporting recent versions of the C standard. So I thought that was a great idea. So I flew back to Pittsburgh and we started work on that. So, you know, we set up
Starting point is 00:42:26 wiki pages for secure coding in C and secure coding in C++ and secure coding in Java. C was probably the first one to get completed. And then we published a first edition of that with Addison Wesley. Try not to buy that or if you bought that, you know, throw it out. It was a first go. It was a learning experience for us. But subsequent to that, we started a study group in WG14, which went on for three and a half years. And we produced a technical specification. I want to say 17961, but I'd have to check that I got that number right. And also produce the second version of the Certi-C secure content, which is pretty solid, really has held up since publication. Yeah, and then Aaron Ballman was working for me at CERT.
Starting point is 00:43:11 I had both Aaron Ballman and David Keaton working for me at CERT. So Aaron Ballman completed a C++ coding standard, but it's been published by the SEI. It's available on their website. It has not been published as a book. We worked with Jim Manico. I don't know if you know Jim, but he kind of got us started on a Java secure coding standard. And then I had a student who wanted to work on the Java coding standard as a thesis, as a master's thesis project. So I said, sure.
Starting point is 00:43:41 He did a very nice job. I gave him an A. And he came back to me and he said, sir, you do know that you can also give me an A+. So I said, oh, I didn't know that. So I gave him an A+. That's Drew from Mohindra. He's back in Pune, India, but we keep in touch. Well, Robert, it was great having you on the show today. Thank you so much for talking to us about secure coding and integers. It was great having you on. Thanks, I had a good time. Definitely could have spent more time on the secure coding aspect, but we got to talk about integers. Secure coding and integers are the same thing. I mean, because when you talk about C and C++, the biggest issue tends to be accessing
Starting point is 00:44:23 objects outside the bounds of the object, right? That leads to arbitrary code execution. And the way you manage to access objects out of bounds is you take a pointer and you add it or subtract an integer from it, and then you dereference memory at that location, right? And so if you've lost control of your integer values, right now you're accessing memory out of bounds. So a solid understanding of integers is really kind of at the core of secure coding. You know, you can write a C or C++ program that doesn't do file IO, but you can't write one that doesn't use integers. Very true. Thanks, Robert.
Starting point is 00:45:02 Thanks a lot. Thank you, guys. Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter.
Starting point is 00:45:25 You can also follow me at Rob W.ving and jason at left to kiss on twitter we'd also like to thank all our patrons who help support the show through patreon if you'd like to support us on patreon you can do so at patreon.com cpp cast and of course you can find all that info and the show notes on the

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.