CppCast - Analyzing Undefined Behavior

Starting point is 00:00:00 Episode 187 of CppCast with guest John Rieger, recorded February 20th, 2019. Sports analysis of programs written in Java. In this episode, we talk about Kona papers and search algorithms. And we talk to University of Utah professor, John Rieger. John talks to us about teaching C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? Doing pretty well, Rob. How are you doing?

Starting point is 00:01:24 I'm doing fine.host, Jason Turner. Jason, how are you doing today? Doing pretty well, Rob. How are you doing? I'm doing fine. Getting over a little cold. Just had family in town for the week, and life is carrying on. Things get to be busy again for me, travel-wise, coming up here soon. Looking forward to Core C++ in Tel Aviv. Yeah, that should be fun. Yeah, March is going to be as well, so hopefully we can work that out with our scam. Yes.

Starting point is 00:01:44 Okay, well, I thought that was like to read a piece of feedback. This week, we got a comment on the website from Ray saying, Hey, Dark Matter Programmer here. Fun show. Go has this feature since 2011 that you guys were discussing regarding updating your code with new versions. With the new release, Go tool fix could be used to update your code to new APIs. Here's from Ray. So yeah, we're talking with Etiquette about, you know, Leaf and his language and how he wanted to upgrade the code from one version to the next and how it might C++. I know

Starting point is 00:02:15 Titus Winters has talked about doing it. Yeah, and specifically, it sounds like Ray here is talking, mentioning Go. He says to use new APIs, and it makes me wonder if it's breaking changes in the language as well, or if it's just about APIs, which I think is more the kind of thing that Titus has been talking about. But it'd be nice if we could break some old C++ features and just have it automatically get taken care of by compilers. Right. Okay, well, we'd love to hear your thoughts about the show as well. You can always reach out to us on Facebook, Twitter, or email us at feedback at cbcast.com. And don't forget to leave us a review on iTunes.

Starting point is 00:02:48 Joining us today is John Regeer. John is a professor at the University of Utah, where he's been on the faculty since 2003. He likes to work on compilers and software correctness, but used to work on real-time and embedded systems. When he has free time, he likes to go hiking in the desert with his kids. John, welcome to the show. Hi. Thanks a lot. Thanks a lot for the invite. Yeah. Utah gives you lots of opportunities for hiking, right? Yeah. There's a pretty big mountain range right outside the city here. And then,

Starting point is 00:03:13 yeah, it's pretty nice. And then we have sort of good access to the national parks like Zion and Capitol and such. Is University of Utah in Salt Lake City? Yeah. It's basically maybe a mile or two down. It's kind of right along. Coming from Denver, I always get confused when I'm in Salt Lake City? Yeah, it's basically maybe a mile or two down. It's kind of right along. Coming from Denver, I always get confused when I'm in Salt Lake City because I'm used to the mountains being on the west. And when you're in the Salt Lake City, I feel like I never know which direction I'm facing

Starting point is 00:03:36 without having the mountains clearly as a landmark for me like that because I feel like, I don't know, it's less obvious there. And it's on feel like i don't know it's less obvious there uh and it's on the wrong side okay well john we got a couple news articles to discuss uh feel free to comment any of these and we'll start talking more about uh the work you're doing okay great okay so this first thing we have is an article uh from bartek's blog and we've talked about you know several of his articles before and had him on the show once um and this is uh five awesome seaballs papers for the kona 2019 iso meeting which is happening right now and uh yeah he starts off by doing a good review of what is already in seaballs

Starting point is 00:04:16 previous and then he highlights uh five papers that were just just being introduced with kona a couple of which we've already talked about like like the No Discharge to Have a Reason paper that was written by Isabella Muerta and John Monique. Yeah. Yeah. The freestanding, if that goes through, to be able to have some good guidance on that would be nice. Yeah, and then there's another one,

Starting point is 00:04:41 integration of Krono with text formatting. I know we've talked about the format library a couple times. Having that apply to things like Chrono makes a lot of sense to me. Yeah. John, did you have a chance to look at this article? I did, and I didn't have too many specific thoughts about those papers, but I was sort of interested by some of the new items for C++20. So I don't know that they're the most important overall,

Starting point is 00:05:04 but the ones that I found really interesting are the addition of contracts, I think is really great. I think this is something when I teach students to program, I really try to lean on them to put a lot of assertions in and do a lot of kind of try to try to make their assertions and their unit tests play well together. And I think contracts will really kind of play to the strengths of that sort of, that sort of thing. And then I also like the C++2's complement. That's kind of play to the strengths of that sort of thing. And then I also like the C++ is two's complement. That's kind of a long overdue change, I think, that's really nice to see. That'll simplify a number of things.

Starting point is 00:05:31 So sometimes when I'm talking to people at contracts, some people feel like it maybe wasn't quite ready for primetime. Are you familiar enough with the actual what was accepted in the standard to have any opinion on what was accepted? No, no. I looked it over, and it looks interesting, but I have not been following the sort of technical development. Okay.

Starting point is 00:05:52 I think the main thing that stood out to me was in some conversation I had with someone was it might not play nicely with constexpr, which would be a huge deal to me, but that's still to be determined. Well, yeah, I think this is the thing with C++ is it's gotten pretty big and complicated, and I guess everybody's aware of this, but it's one of these things where it's kind of interesting to balance these really cool features with the fact that it's pretty hard to play

Starting point is 00:06:16 well together. Yeah. Okay, the next article we have is by Martin Horonovsky, hopefully I'm pronouncing that right, and he's the current main maintainer of Catch-2. And this article is about the future of Catch-2 and how he is considering, and it sounds like he's going to go forward with changing the distribution model

Starting point is 00:06:35 from a single header library to actually splitting it up amongst multiple libraries and V files and actually building it so we can deploy it through things like VC Package and conan and he goes about some goes about some of his reasoning for wanting to consider this change and uh and talks about you know his experience maintaining it because he's kind of taken over the project uh from phil nash and done the main majority of uh maintenance and closing issues over the past few sounds. Yeah, it sounds like a big change and a big deal. I did not realize that Phil was not the main active maintainer at the moment.

Starting point is 00:07:11 Yeah, but it sounds like he got good feedback when he proposed this online, and it sounds like it makes sense. He might keep an older version of the library as a header only, so if you want to use it in that fashion for quick examples, you can still do so which i think is probably a good move i'll say for my part i do use the header only version but at this point i'm using it via conan and so if it if it was a seamless change and the next version that it grabs just did the right thing then it wouldn't really matter to me right he didn't go too much into specifics but what exactly are the problems with uh ingesting single header libraries something like conan um well i mean all it does is download

Starting point is 00:07:49 a header basically in conan and then but actually using it is actually pretty easy but i think his main concerns are things like compile time okay okay and then the the next thing we had here was uh c++ on c where you just came back from, Jason. They already have like almost 30 videos online. So Phil is uploading these really quickly, I guess. Yes, they are going very quickly. Yes, that's great. I'll have to start watching some of these.

Starting point is 00:08:16 Any specific ones you wanted to point out aside from your own talk? Well, my talk's not up there yet. Yours isn't actually here yet. Okay. There was technical problems. Oh, okay. But it should should be up soon it might not end up going up oh okay that's too bad yeah but you know it happens uh i'm not too concerned about it i i went to you know you can only go to so many talks when there's three four tracks happening uh but i really liked bjorn's um programming with contracts in C++,

Starting point is 00:08:46 bringing it back around to contracts. Okay. And then the last thing we have talked about is this article between linear and binary search. Jason, you want to talk about this one? Yeah, this was actually given to me from someone that I met at the conference. I just clicked on the article and a pop-up came up on top of it. It was given to me by someone from C++ on C. And it's just an algorithm instead of doing a linear search or binary search, he has this biased search where he's saying most of the most likely scenarios that the thing he's looking for is at the top of the container. So doing like a power of two kind of search. So it's like favors the top each time it searches. Now at the start, he says that he was unable to

Starting point is 00:09:35 find a name for this algorithm. But there's some possibility that it was like a skip search or something like that. And since we have a computer science professor amongst us, I was wondering if he had any thoughts on this. No, I don't know the name for this, but I did enjoy the piece a lot. I really like these kind of deep dives. And I always find it really interesting how these things which seem so simple, these things we cover, you know, kind of in, you know, first and second year courses often, you know, when you actually look into it and look into the performance effects, there's almost this kind of fractal complexity where, you know, it just gets more and more kind of relevant detail and interesting stuff going on. The deeper you look, you can, you know, it's, it's, it's, I really like this sort of piece. I think it's great to, great to get this kind of, kind of this, this thinking put down somewhere where people can see it.

Starting point is 00:10:18 Right. Thanks. Okay. Well, since you were just talking a little bit about, you know, studying computer science, what classes are you currently teaching, John? Well, right now I have just one. It's a software engineering course, and it's a little bit different than our usual kind of courses because this is for a new professional master's track we have.

Starting point is 00:10:35 We'll sort of come in and we do a kind of accelerated master's degree in 18 months. Anyway, and I'm doing the software engineering component. It's really been a lot of fun. Right now they're writing an image compression assignment in C++, and I was just thinking while we were at Catch-2 that I should have them maybe use that for some of their unit tests. I hadn't. We've kind of been doing homegrown unit test stuff, and I think one of these nice solutions is the next thing I should do.

Starting point is 00:11:01 So I think you kind of indirectly brought up a question that I was considering asking you about. Often when we have people who are fresh out of university or whatever on the show, this discussion comes up about whether or not computer science curriculum should be more of a theoretical thing or more of an applied thing. And it sounds like you're talking software engineering is more like an applied computer science kind of basis for this master's degree. I don't know, would you mind going into that? Yeah, well, to sort of to start off the, you know, the question you raised, of course, is one we sort of kind of fight with every day where, you know, we want to be relevant,

Starting point is 00:11:37 we want the students to kind of have seen things that they're going to see out in the world. But on the other hand, we don't want to teach them things that are going to become obsolete in two or three or five years. And so we try to walk this balance, walk this line sort of just like you say. And this kind of professional master's track does fall more on the applied side. But on the other hand, we try to teach, testing is this incredibly general thing that requires a lot of creativity, a lot of hard work, a lot of hard thought. And, you know, I try to emphasize that aspect of it while also making sure that they see the kind of tools that I,

Starting point is 00:12:12 you know, some of the best of tools that they're likely to use out in the world. Okay. So you said this is the only class you're currently teaching, but I'm guessing you've teached other courses in the past. What are some of your favorite classes to teach? Well, my current favorite ones I think right now are, I've been switching off every year between advanced compilers and an advanced operating system. And both of those we sort of, I make those quite applied as well, where we do some sort

Starting point is 00:12:37 of theoretical readings, but then we just spend a lot of time reading. And in the operating systems course, we usually can read the entire source code for this operating system kernel called XV6. It's kind of a reimplementation of Unix version 6, but for modern x86s. And it supports multi-cores. And, you know, this thing boots up and it's a real OS. But in all other respects, it's kind of as simple as possible. It sort of barely has a file system, barely has a shell. Everything is really, really simple.

Starting point is 00:13:03 But you can read the entire kernel in a semester. and that's really fun for people, I think. It's really nice to see the kind of diversity of things that live in an operating system, and I really kind of enjoy walking around. And then the assignments in a course like that are, you know, adding system calls, adding features to the file system, maybe tweaking the scheduler, these kind of things. And unlike when you use Linux or something as a basis for an operating system course, you can kind of rewrite kind of major parts of this operating system because it's so tiny. Whereas in Linux, we're really confined to kind of playing around on the edges because it's this sort of monster monolith that's, you know, very, very kind of hard to operate. So when your students modify the operating system and are playing with it,

Starting point is 00:13:40 do you use virtual machines for the testing? Yeah, absolutely. Absolutely. So XV6. And so I'll send you guys a link after we're done here. But it's this open source. It's written by some people at MIT. And it basically comes with an environment where it boots up and it just basically works. And it even, you know, you can bring it up in multi-core. It all sort of is really seamless.

Starting point is 00:14:01 This kind of thing where I remember when I took an operating system course first, you know, a number of years ago, you know, this is developing, we're hacking on Minix on 286s. And, you know, you had to have dedicated machines, and you had to, you know, we had to boot them off of floppies and stuff. And this was, this was, you know, kind of fun, I think, at some level, but, you know, just dramatically more work than it is just to bring something. Now, you also said advanced compilers is your other favorite course, I believe that's what you said. Yeah.

Starting point is 00:14:28 I was wondering if there's an intro to compilers and this is the second level, or if that is like the main class is just called advanced compilers. No, no, you're exactly right. There's an intro to compilers course, which I, for whatever reason, haven't ever taught. I think we just happen to have other faculty who cover it pretty well. But yeah, in advanced compilers, what I do is base it on LLVM, which sort of, so it's a little bit the opposite approach as I use an advanced operating system. So working with LLVM, now we're stuck with a big monolith that we can't really change much of the core parts of. But what I do

Starting point is 00:14:59 is I have them work on new optimizations, work on static analysis, and sort of the theoretical component of this course is static analysis, which is sort of predicting the behavior of programs without running them. This is very core to optimizing compilers. You can, for example, predict that a pointer is never null, the null checks that the programmer has put in, they're just not ever going to fire. And so we have lots of, so that's really a lot of fun.

Starting point is 00:15:22 And yeah, anyway, so that's sort of a neat course to teach. Has it ever happened that any of the projects from your students ever end up becoming pull requests into LLVM or Clank? You know, I always kind of push that. I always tell them I'll just give them an A if they can make this happen. And, you know, it's just there's always so much time pressure and so much time pressure. And a lot of the stuff we're doing is kind of, you know, it's kind of, you know, sort of wonky experimental stuff.

Starting point is 00:15:48 Like I'll tell them to do something like increase the precision of one of the static analyses in LLVM. You know, and people can do this, but, you know, when you really look at it, is the slowdown of the compiler introduced by this change worth the increase in precision? And usually it's probably not just because we're just sort of playing around. And I have the meme for kind of low-hanging fruit. worth the increase in precision? And usually it's probably not, just because we're just sort of playing around. And I have the meme for kind of low-hanging fruit. And so a lot of times it's sort of, you know, the stuff that actually needs to be done is kind of maybe less fun.

Starting point is 00:16:11 And so it hasn't happened yet, but I keep waiting for it. Now, I'm also curious, going back to contracts, like at least on a theoretical level, what opportunity do you think there is that contracts can lead to different optimizations compiler? If it says, well, you've already promised me this thing's X or whatever.

Starting point is 00:16:31 Yeah, well, the straightforward sort of runtime interpretation of contracts is just to check them. So, you know, at a surface level, they're only going to make things slower. And, you know, I think this is a kind of slowness that we're happy to pay, right? Because the, you know, we're trying to optimize for human time. And we're sort of trying to document these important properties. But, yeah, as you say, you know, you could turn these assertions into assumptions where the compiler is then allowed to assume that these properties hold. And you could even push these things around in an interprocedural fashion. And so there are, I believe, sure, there are opportunities

Starting point is 00:17:05 to speed up programs, but my guess is that those haven't been seen in practice very often, if ever, if you see what I mean. Sure. But, you know, there's... Okay. So in addition to your coursework, do you do research of your own? Yeah, I have some research projects, and basically they're in compilers these days. And the one that I'm kind of most excited about right now, even though I've been working on it for several years, is a super optimizer for LLVM, and what this does is kind of looks through the program being compiled at the IR level. So for those people who aren't familiar, LLVM is very centered around an intermediate representation, an IR, which expresses the program, and it sort of looks a lot like an

Starting point is 00:17:44 assembly language. It's got explicit loads and stores, it's got adds and shifts, and expresses the program, and it sort of looks a lot like an assembly language. It's got explicit loads and stores, it's got adds and shifts, and these kind of, the control flow is not structured, it's jumping. So it's very much a neutral, sort of an architecture neutral assembly language, and the way LLVM works is a front end converts your source programming language, like C++ or Rust or whatever, into the IR. Then we have a bunch of language agnostic or at least mostly language agnostic optimization passes which run to try to improve the IR and try to make it as sort of as clean and fast as possible by cutting out, you know,

Starting point is 00:18:13 maybe dead code. And then finally there's a back end which translates the immediate representation into x86 or ARM or something. And so what the super optimizer does is it looks at IR, it kind of runs as an optimization pass, and then it uses the thing that makes it different from the existing optimization passes is it doesn't kind of have a canned idea of what to do to the IR to make it better. It rather asks an automatic theorem prover. It kind of phrases a question about the IR in such a way that an automatic theorem prover can come up with, potentially at least, an improvement to the IR. And so one example that I'm kind of proud of is some codes end up with pop counts in them. And pop counts are, this is an operation that counts the number of set bits in a machine.

Starting point is 00:18:56 And in some domains, pop count is kind of important, as you can imagine, and most programs don't do this at all. But things like chess programs and Vue.exe and things might end up doing pop counts. So what we often do is we write an open-coded pop count. That is to say we come up with some sort of an algorithm for a fast way to do it, and we put that in our C++ code, and that sort of sits there. And what that does is it misses an opportunity to use, for example, a better algorithm or maybe an actual machine instruction for pop counts.

Starting point is 00:19:24 So, for example, a Core i7 will have a dedicated pop count instruction that's actually quite fast. You know, on an Intel Atom or something, on a laptop chip, it might be microcon real slow, but the high-end chips will have a very fast pop count. So anyway, going back to Super, what Super can do is that using the solver, it can actually, at least in some cases, not always, but depending on how it's coded, but it can recognize the pop count from the open coded version. So it can sort of see that this collection of instructions taken together does this operation and then it's free to sort of change the implementation to some other one.

Starting point is 00:19:53 For example, LLVM happens to have an intrinsic for pop count and then the backend will translate that into the most suitable for your target. So it's really the kind of, the thing that's sort of fun about doing something like the super optimizer is it sort of, it's not creative, of course, but it kind of has a little bit the feeling of being creative. It comes up with sort of interesting alternatives to the code that's there. And it's sort of, that's, I think, the reason I like working on is that it's sort of fun to look at its output. It comes up with sort of clever things that humans wouldn't have thought of in most cases. So is that at all part of the LLVM project? Is it something we download and run separately? It is totally not part of LLVM. And so the way it sort of currently works

Starting point is 00:20:30 is it runs as an optimization pass and it interacts with a cache. So discovering new optimizations turns out to be pretty slow and inefficient. And so it's not something you want to really be waiting for. So what we can do though is we can sort of strip a lot of interesting code and put it in the cache. And then we can have some sort of a multi-core grovel over that cache offline. And then next time you compile using Super, it would have access to the contents of that cache. So that's kind of the short-term solution, the research solution. And that's not necessarily a very good idea for a production scenario. And so there's something that's not working at all yet, but that I would like to do in the future, which is generate rules for the compiler based on discovered optimizations that

Starting point is 00:21:08 sort of are very simplified versions of the optimizations that can run as part of. So is this like, if you're working on a project where every nanosecond is important, would you recommend running it through super in its current state today? No, definitely not. Okay. There are some real problems where, and one of the problems we've been fighting for multiple years, and it's a very thorny problem, is LLVM has, at the IR level, at the internal representation level, it has some concepts of undefined behavior. And so most of us are sort of familiar with undefined behavior at the C or C++ level,

Starting point is 00:21:45 but these internal notions of undefined behavior are kind of even harder to think about, because they're designed to sort of, they're designed to have sort of a partial effect on the program meaning, which is to say when something is undefined, it's just that value that's undefined, and it's not the sort of whole program which becomes undefined like in C++. Okay. And what that means is that sort of reasoning about these programs is actually pretty difficult, and it turns out that a lot of different people contribute to LLVM, and not all of them have sort of understood undefined behavior in exactly the same way.

Starting point is 00:22:16 The rules are not really clear anyway. And so what that means is different parts of the compiler have made kind of conflicting assumptions, at least at times. And so there's a little bit of internal non-coherence surrounding this. And this tends to generally be latent. This tends to generally not really affect users. But super, being based on an automatic theorem prover, will go ahead and see. It's kind of, it's like a little bit like a lawyer. It looks for a little loophole that it can exploit. And if something has been a little bit incautious about undefined behavior, Super will then break it. Okay. And so right now what we're doing is working to create a version of LLVM which doesn't have any of those

Starting point is 00:22:54 problems. And it's turning out to, you know, be a bit of an engineer. And the sort of overall problem though is that since these problems aren't really causing most LLVM users much pain, there's not a lot of priority in the community to fix it. And so it's a problem mostly seen by us, and so it falls on us to fix it. Right. That's interesting. So do you have a plan for moving forward? Yeah. There's a number of us who are sort of interested in this kind of making the version of LLVM that doesn't have any of these problems. And so we're collaborating on this. And then hopefully these solutions will be, you know, once these solutions are proved to not affect code generation quality, hopefully we can

Starting point is 00:23:30 move these kind of changes into the core LLVM distribution, and then you'll have these problems. Can you talk any more about what exactly an automated theorem prover is? I don't think we've really talked about that much on the show before. Sure. Well, so what it is, is a program that you can ask essentially math problems. And so the theorem prover that we mostly use here is called Z3, and it's a tool created by Microsoft Research, and it's open sourced. So it's very easy to include into a project. But what it does is, like I say, it attempts to answer math questions. And so if you ask it, for example, for a solution to the N-Queens problem, what you have to do is kind of

Starting point is 00:24:05 describe the rules for the N queens problem. And so, sorry, I should explain what that is. The N queens problem is setting a number of queens on a chessboard so that none of them can attack any other. So that's a, you know, it's sort of a common logic puzzle that you might ask a kid to solve. You might hand them a chessboard and a pile of queens and see if they can do it. But if you describe the rules to this kind of a thing, then the solver can sort of come up with a solution to it using its own sort of, it has kind of an internal sort of sophisticated collection of heuristic searches and other kinds of searches that allow it to home in on these solutions. And similarly, you can ask it, for example, you can tell it the rules of Sudoku and give it a partially solved Sudoku, and it'll come up with a solution quite a lot faster than a brute force Sudoku solver. So it's using these kind of sophisticated internal

Starting point is 00:24:48 algorithms, and they tend to be pretty domain independent. They tend to be pretty independent of what you ask it. And so of sort of a wide variety of kind of questions that you might want to ask about mathematical situations, it turns out that these things are sort of pretty okay at answering. And it turns out that in areas, for example, like formal verification of software, these used to be kind of prohibitively done. It used to be incredibly hard to kind of actually conclude that a sorting routine actually sorts its output

Starting point is 00:25:15 or that some sort of routine in C++ that's controlling a robot doesn't ever access a null pointer and crash the robot. So this used to be a whole lot of work, and it turns out that these automatic theorem provers offer us a way out. If you describe the meaning of the programming language to the automatic theorem prover, then you can often just ask it the question, does the program ever dereference a null pointer? And it will try to either come up with an example where it does, or it will try to sort of prove that it doesn't. And that's been one of the main kind of use cases of these automatic theorem provers is

Starting point is 00:25:47 to try to find defects in things like chips. So, for example, I have a friend at ARM. So, you know, so ARM, of course, makes or designs at least the processors used in almost all mobile devices. And he works on formal verification of their processor designs using provers like Z3. And so we can also apply these to software. And like I say, this has been kind of the main use case of these sort of tools in recent years.

Starting point is 00:26:09 And not as much attention has been paid to kind of trying to optimize software using these tools. And so that's why I thought it would be kind of fun. I wanted to interrupt the discussion for just a moment to bring you a word from our sponsors. PVS Studio Analyzer detects a wide range of bugs. This is possible thanks to the combination of various techniques, such as data flow analysis, symbolic execution, method annotations, and pattern-based matching analysis. PVS Studio team invites listeners to get acquainted with the article

Starting point is 00:26:35 Technologies Used in the PVS Studio Code Analyzer for Finding Bugs and Potential Vulnerabilities, a link to which will be given in the podcast description. The article describes the analyzer's internal design principles So are there, I mean, maybe kind of, I don't know, taking this back a step, but are there any existing free tools out there that can prove things about our C++ programs to like prove the correctness of them? Well, this is, yeah, this is a really great question. And it's a little bit of a thorny issue because the problem is most of these kind of tools are made by people like me, that is to say by researchers who have a pretty low engineering budget,

Starting point is 00:27:18 if you see what I mean. And the low engineering budget combined with the size of C++ is really not a good match. And so almost all of the people like me work on C. And that may also be a good match because a lot of safety-critical software is in C, and a lot of avionics software. I don't think much of that is in C++ yet. And despite the fact that we see, I think there's going to be increasing uptake of C++ in embedded systems,

Starting point is 00:27:41 a lot of the existing stuff is C. And so this is a bit of a thorny problem, and the kind of tools that reason about C++ end up being produced by organizations. And so, for example, the Clang Static Analyzer, which is, I'm not sure if it's part of LLVM or if it's a side project. Anyway, this can reason about your C++ code and look for likely bugs in it. And in some versions of this, at least some of the time, will call out to an automatic server

Starting point is 00:28:12 to answer sort of questions about whether something can happen. Okay, that's interesting. Yeah, so it's really sort of a problem. I mean, it's a little bit depressing, but like I said, the size of the language really, really does not work in favor of the small players like most of the academics. So another project you're working on

Starting point is 00:28:28 is C-Smith. Is that right? Yeah. Well, I'm not really working on it anymore, but I spent a number of years working very hard on this project. Yeah. Could you tell us about it? Sure. So maybe I'll just sort of start with a little bit of a story. So a number of years ago, I was teaching an embedded systems class, and I was showing the students in a slide the translation of some little C and C++ snippets to assembly language, and I was showing them sort of how the volatile qualifier works, how if you don't have it, then things will be optimized a certain way. And if you make some variable volatile, then the compiler has to emit these loads and stores to do memory operations. And of course, when you're writing embedded software or operating systems and things like

Starting point is 00:29:09 that, where you're talking to memory mapped peripherals, the presence or absence of a load or stores is really, this has to be right. So I was lecturing on this and one of the students pointed out an error in my slide and said, hey, that translation is wrong. And so I said, oh, well, I probably cut and pasted wrong. But when I went and looked at it later, it turned out the compiler had translated the compiler. And it got the volatile wrong. And so I kind of shelved, put this on the back burner, and then somehow came back to it later. And it turned out that the harder I looked, kind of the more bugs I

Starting point is 00:29:36 saw with volatile. It was really, compilers were really not doing a good job following the rules. And you can sort of see the problem as a compiler writer. One problem is that these loads and stores really aren't visible normally. If your program does or doesn't have them and you're running a test suite or something, it just runs, right? You don't really notice this. So the compiler developers were having a hard time noticing that they were wrong. And then the other thing that was happening, I think, is that compiler developers love to write optimizations. They love to make code go fast. And it was really, really easy when adding a new optimization to forget to check if some variable was volatile before doing some optimization on it. Interesting.

Starting point is 00:30:12 It really flies. This requirement to not do these optimizations really flies in the face of what optimization implementers want to do with their time. And it just turned out to be that people miss this a lot. So anyway, so what happened is we started to, I ended up with a file full of test cases that lots of compilers got wrong and started reporting some bugs. And then one of the things a colleague and I realized was that, well, maybe we should start to look for these bugs more systematically. And so my colleague, Eric Eide, took an existing C program test case generator.

Starting point is 00:30:45 So it was a program called Randprog that generated random C programs. And he took that and he started retooling it and reengineering it to be about looking for errors in volatile. And then what we did is we would run these programs. We'd generate a random program. We'd compile it with and without optimization. And we'd run it in kind of a hacked Valgrind, which would report the number of loads and stores to every variable. You know, Valgrind provides this very, sort of, very interface-proof. So we had a hacked Valgrind that would report the number of loads and stores, and if that ever changed when you cranked up the

Starting point is 00:31:14 optimization level, then the compiler was wrong. As long as, you know, for a memory location that was only accessible through volatile qualified story. Okay. So we started doing that, and this was a lot of fun. We reported a bunch of bugs, and we were having fun, and then eventually what we realized was is that the volatile bugs weren't the only ones we were seeing. They were, you know, compilers were just sort of wrong. In general, they were just wrong.

Starting point is 00:31:41 That's great. You know, they're not very wrong, right? Because, you know, they work right there, you know, they're incredibly amazingly reliable. But when you almost anybody who writes weird code has stories about the compiler being so almost anybody who writes something that just looks funny, like maybe you're writing an operating system, and the compiler hasn't really very much been used for that, you tend to run across compiler errors, and people sort of learn to avoid, you know, bit fields or whatever, you know, bit fields are fine now, but, you know, 20 years ago,

Starting point is 00:32:07 they were often a source of. So we started looking for regular compiler bugs. And then the project kind of snowballed a little bit because we just kept, everywhere we looked, we found them. We found them, you know, not in GCC and LLVM, but also in, you know, compilers used to compile source code that ran on airplanes and the source code that run on spacecraft. And so all of the heavy-hitting, expensive compilers for embedded systems of the time, all of these contained these kind of bugs where we could sort of pretty easily trip up the compiler with a randomly generated program. And so anyway, I ended up over the course of some number of years reporting like 500 compiler bugs. I kind of, this is kind of like my

Starting point is 00:32:45 hobby and nighttime job for years. And I eventually got sort of tired of it. I eventually burned out a little bit, which is why I stopped. But, you know, I spent a lot of time reporting compiler bugs. And I like to think that at least in the case of LLVM, that we did the project kind of at a good time where it was in a bit of an early stage, and especially Clang was in an early stage. And I think it came, the project came at a good time to sort of help that code base kind of reach a level of maturity that it might not have gotten maybe until a little bit later if we hadn't done this work. Do you have any idea for how many of those 500 some odd bugs were actually fixed? Almost all of them. So the LLVM community and the GCC community were really, really receptive. They were

Starting point is 00:33:25 really, you know, probably more gracious, you know, because at some level, you know, I'm helping them. But on the other hand, some number of these bugs probably weren't ever going to affect anyone. And the distribution between those two things, you know, which bugs were, was I really saving someone's bacon and which bugs just didn't matter, that's really almost never clear, except in a few cases where I would report a bug and then other people report the same bug as well that they encountered on their own basis. So that kind of thing did happen,

Starting point is 00:33:52 but it really wasn't the majority experience. So anyway, yeah, so we ended up reporting, yeah, so, yeah, sorry, I guess I lost track of your question. I got divided myself there. That's fine. We're talking about the tool that generated the random code, I believe, C. Smith. Yeah, anyway, so, yeah, I ended up finding these kind of errors in almost all the compilers.

Starting point is 00:34:08 No, I guess I would say in all compilers we tested, and it was one of these things that ended up being a fun project, and happily we could open source it. And one time we got a Christmas card from a compiler development team at an embedded software company. They sent C Smith's Christmas card because I think the implication was that it saved them a lot of work by finding lots of bugs. So this is really

Starting point is 00:34:32 sort of a gratifying project in the sense that it's gotten a lot of industrial uptake. I think it kind of solved a problem that people had, but they hadn't really thought that there was an automated solution to. So this might actually be derailing us slightly here, but I think JF in particular, and I know there's been some discussion about like trying to deprecate

Starting point is 00:34:51 volatile and replace it with something that is more well-defined semantics. And I feel like the average C++ programmer doesn't really know what volatile is for. Can you speak about volatile for a minute? And if you have any thoughts on that? Sure. Yeah, it plays a really indispensable role in kind of low-level codes like operating systems and embedded systems for accessing, you know, things like page tables, for accessing things where the actual presence of a load or store operation matters. And of course, you know, this is a pretty small fraction of programmers. And C++, my take is that C++ probably incorporated volatile in the wrong way. It incorporated it as something that's kind of a property of, it's very deeply tied into the

Starting point is 00:35:31 type system. And if you read the C++ standard, you can see places where it affected the language negative. You know, it does things like it affects overload resolution and things that it just shouldn't ever probably have gotten tied into. And so Jeff's proposal, I think, is an extremely good idea, which is basically to try to strip out some of these things that Volatile shouldn't have ever been doing in the first place, you know, while actually retaining the parts of Volatile that benefit systems. And so what you really want to do is mark a particular load in store as needing to happen or not needing to happen.

Starting point is 00:36:02 But this has nothing to do with your big class heart and really should not be playing into overload resolution and stuff. So this is one of these things where I think it's a cleanup that really kind of should happen and needs to happen. And my take is that it will take some of the C++ header files and knock some pretty

Starting point is 00:36:18 big chunks of junk out. If it were removed as a possibility for a member function qualifier, that would halve the number of overloads for many utility funds. That's right. That's exactly right. And that's, so it was a while ago that I looked at JS proposal, but that's my understanding is exactly what he wants to do.

Starting point is 00:36:40 Okay. Okay. You worked on another project called C-Reduce. Is that right? Yeah. So C-Reduce was kind of C-Smith's brother or sister. And so the issue here was, is that, so you would generate a program, and the real crux of the issue with C Smith, so generating a random program isn't that hard. You just kind of spit out syntactical elements and try to connect them in a way that they'll parse, and that's not really that hard. But the problem is if we want to, if all we want to do is try to crash the compiler, then we just sort of blurt out a lot of, you know, syntactic C or C++, and we sort of, we see if it crashes the compiler. But that really wasn't the goal. The goal was to find wrong code bugs, where the compiler translates a program into an executable that doesn't respect the intent of the source code.

Starting point is 00:37:21 And to do that, you have to generate programs that don't have any undefined behavior. And so the real crux of C-Smith was generating code while sort of on the fly of the source code. And to do that, you have to generate programs that don't have any undefined behavior. And so the real crux of C-Smith was generating code while sort of on the fly analyzing the code for undefined behavior and kind of changing the trajectory of the generation so that nothing it emits is ever undefined, if that makes sense. And so that's the job that the tool does, and it did a pretty good job at it. But the problem was is that what would happen is you would compile a program with Csmith at GCC-0 and it would print a checksum of its global variables at the end. And then you would compile the same program at GCC-01 and it would print a different checksum. And so now the question

Starting point is 00:37:55 is where in this mountain of random junk is the problem? You know, what in this mountain of random junk triggered the miscompilation? And this is really not easy. And so the way people deal with this problem is they basically delete some of the code and see if the code still triggers the bug. If it does, then you can delete more code. And if it doesn't, you have to undelete and delete something different. And so as the kind of first major user of Csmith, I spent very much time kind of on this exact task. And, you know, it got old. And C-Reduce is a program which basically automates this. It deletes code, but it also does a lot of other things. It inlines functions. It tries to instantiate templates. So it started off basically solving a problem created by C-Smith. But then later on, it expanded to try to

Starting point is 00:38:43 be a pretty far-reaching test case reducer, which basically tries to make a C or C++ program smaller while still preserving some property that you care about. And you teach it about that property by basically just giving it a shell script. And the shell script can look for a compiler crash or for a miscompiler. It can do whatever it wants. But CReduce will change the program to make it as small as possible while still preserving the property that the shell script returns true when it runs on that program.

Starting point is 00:39:08 And so what it turns out, though, is for whatever reason, most programs that trigger a compiler bug can also be triggered – or sorry, most compiler bugs that get triggered, you know, maybe by like a three megabyte C++ file. So you know how it is when a C++ file has been preprocessed. They tend to be not small. Right. Not small. Most compiler crashes that can be triggered by three or four megabytes of stuff can also be triggered by maybe 50 bytes or 80 bytes or even 10 or 20 bytes. Wow. And so C-Reduce kind of tries to find that smallest program. And it's an intractable search problem, so we can't ever hope to find the smallest program that triggers a bug. But if we can get something that's a couple hundred bytes,

Starting point is 00:39:49 then what we do is we've handed the compiler developers something that they can use to, handed them sort of a pretty good weapon they can use to actually track down the bug. And so, like I say, C-Reduce has gotten a pretty decent amount of uptake among compiler developers just because it kind of solves a problem that's sort of, you know, it's really not really very fun to solve by hand. And it wasn't the first tool of its kind. There were some previous tools that did kind of similar things, but at risk of not being very modest about this, it does it much, much better than other tools. So if I find a compiler bug, do you suggest that I run it through C-Reduce before handing it off, or are you more expecting that the compiler writers themselves are using C-Reduce on the repos that they're given?

Starting point is 00:40:30 Yeah, that's a really interesting question. They, no doubt, are used to it because I think they can't expect a customer, a paying customer maybe, or a busy user to run C-Reduce. But my experience as a random tester was that if I didn't do this, they probably would. Just because the test cases I was coming up with were so ridiculous to look at. But, you know, I would encourage you to do this. If you ever do see a compiler crash, I would encourage you to run C++, just not so much because you have to, but because it's really pretty fun to see what comes out at the end. So if you do that, I hope you'll drop me a line and maybe share the snippet of code that it ends up with, because it really is sort of fun to, it really is kind of satisfying

Starting point is 00:41:08 to watch it strip away a lot of stuff that's irrelevant and leave you kind of with this core program, which ideally at least is pretty small. And it can get, C-Rooters can be tripped up by templates and things. Templates are pretty hard to destroy and class hierarchy and things like this. And there's, we did a lot of work. We spent like, you know, me and a student spent a couple of years working on this. And we did an okay job, but there's a lot of work left to do. We're really not sort of, we really don't have the engineering power to do this work. So we're mostly in maintenance mode on C-Reduce, but, you know, that's fine because it works pretty well.

Starting point is 00:41:39 Well, so last year at C++ Now, you gave the closing keynote on undefined behavior and optimizations. And I think this is a good time to discuss that. Undefined behavior is a large topic, it sounds like. You mentioned it already with C-Reduce, and that really stood out to me. You said that part of the goal is to remove undefined behavior from the program when it was generating your random code snippets. Did I hear you correctly on that? Well, let's see.

Starting point is 00:42:07 Let me try to restate what you said. So C Smith has to generate programs that aren't undefined. That is to say, they don't trigger any undefined behavior. And C reduce, then will go ahead and introduce undefined behavior. Oh. Because it just makes kind of randomish changes to the code. It just sort of destroys syntactic elements. And so if you're reducing a compiler crash, then that's usually not a problem because most of the time the compiler developers will fix a crash even if the program that triggers

Starting point is 00:42:35 the crash contains undefined behavior because it's basically a quality of inflation issue. They don't like to see seg faults in there. Yeah, that's good. Yes. But if you're talking about a miscompilation, now the program has to remain free of undefined behavior, so you need a separate undefined behavior checker running in the loop with C-reduce, because otherwise, with 100% probability, it will end up with an undefined program. And the reason for that is that every step, every reduction step that C-reduce tries has some small percentage of introducing undefined behavior. But once the program becomes undefined,

Starting point is 00:43:08 now it sort of changes its meaning across optimization levels in the way that undefined programs do. And now it'll stay undefined all the way until it's minimum. And so you have to stop it from that. And so basically the strategy is this little shell script that you write, we call it the interestingness test. The question is, is the partially reduced program interesting? And C-Reduce's job is to always generate interesting programs.

Starting point is 00:43:30 And any time it generates an uninteresting one, it just has to backtrack. So in this interestingness test, you have to not only look for the miscompiled, but also look for your... So as long as you have an undefined behavior detection tool, like the undefined behavior sanitizer for LLVM, then you can more or less reliably avoid undefined behavior during the test case reduction. And at the end, you'll end up with something that you can actually remove. Okay, so you just, you've made a sentence, I'm going to attempt to repeat it. You said, when there's undefined behavior, it changes the nature of the program across optimization levels as undefined behavior does, something like that. Which I'm guessing to many people listening,

Starting point is 00:44:02 they said, how does undefined behavior affect the code generation across optimization levels? Sure. Yeah, no, that's really good. That's really good. And so what this kind of comes down to is, so when I teach people programming, what somebody will often want to say to me is, hey, the compiler's broke. And the reason it's broken is because I have this program that did something, and I turned on the optimizer, and now the program crashes, or the program prints something. And what I always tell them is, you know, it's possible the compiler's broken, right? I mean, compilers do contain bugs.

Starting point is 00:44:33 We know this. But it's 99.9% likely that you made a mistake. For example, maybe you have a printf that specifies three or four. Let's say you've put three format specifiers in a string, but then there are only two arguments passed to the printf subsequent to the string. And so I'll just use the C example because the C++ example,

Starting point is 00:44:54 C++ of course makes this much harder to make this mistake. So of course that's great. But anyway, so in the C program, what the generated program is going to do is look at stack locations that don't correspond to anything in particular. And so it'll just print garbage for those third and fourth format specifiers. And now when you optimize the code, the stack layout tends to change because, for example, certain variables are allocated in registers instead of going onto the stack. So the stack

Starting point is 00:45:18 frame changes its shape. And now what gets printed in those garbage slots of the printf almost certainly changes. And so that's just a sort of those garbage slots of the printf almost certainly changes. And so that's just a sort of a simple example of the kind of thing that happens is once you break the rules in an unsafe language like C and C++, then the optimizer doesn't have to make your program do the same thing. Whereas ideally, at least, if your program does obey the rules, cranking up the optimization level should never change your program behavior, except in ways that the language refuses to acknowledge, like timing and things like that. Okay. So I don't know, did that kind of make sense? Yeah, well, yes, it did. And so do you think that this is a good thing? Or do we think, do you think we should be reducing the places that

Starting point is 00:46:02 are undefined in this standard? Yeah, it's an interesting question. The reason undefined behavior kind of came into existence in the first place was because the language is a little bit loosely specified. And here we're talking about originally in C. The language is a bit loosely specified, and so different implementations ended up doing different things. And part of this is by design, because the language was intended to take advantage of platform efficiencies. For example, adjusting the size of an int to be the native size. And, of course, this is very good, right? This is why it's a good systems language.

Starting point is 00:46:33 And when the standards committee went to codify the language, then they kind of ended up having a bit of a reverse engineering problem where they didn't want to say that one of the C implementations was wrong. They kind of wanted to encompass most of the existing one. And so, of course, in some cases, they did say that the existing implementations were wrong, but in a lot of cases, they didn't want to force too much reengineering of compilers because they were afraid, and I think legitimately afraid, that it would then sort of just ignore the standard.

Starting point is 00:47:00 So they said, okay, well, something like overflowing a signed integer, that's undefined because the ones complement people do one thing in this case, the twos complement people do another thing, and the sign magnitude people do a third thing. And so we're not going to try to put an interpretation on overflowing a signed integer because there's no single interpretation. So that's kind of how that kind of came about. And then more and more undefined behaviors sort of kind of crept into the language as the standards evolved. And so, you know, even in the most recent C++ standards, more undefined behaviors get added. You know, not very many, and I don't think they're particularly pernicious ones. But, you know, there's this kind of a way out that the standards committee has here, which is to

Starting point is 00:47:40 just not try to enforce implementations doing something in particular when something crazy happens in the process. And so what it does is it makes the compiler implementer's jobs quite a lot easier because there are these cases where they just don't have to care about, if that sort of makes sense. And this is all well and good, but it is a bit programmer hostile. Some of this stuff is a bit programmer hostile, where some of the undefined behaviors are so hard to avoid and so harmful in practice

Starting point is 00:48:06 that, you know, it's really a problem. And so, you know, the canonical example of that is things like use after, where it's just very, very tough to avoid use after free in a base, you know, and, you know, so real programs do this. And, you know, things like use after free and accessing past the end of an array end up at the root of a lot of security problems that we have. And so some of these undefined behaviors have ended up being um you know in ways that the original standards authors couldn't have necessarily seen they've ended up being um

Starting point is 00:48:33 you know causing some problems in the world so do you see a practical like way out for that like it seems that having checks uh for things like that be, would that add too much overhead or not in C++ for use after free? Yeah, so that's a really good question. And I think it's one of the real central questions for a future of C and C++ is what to do about these things because otherwise it's very hard to write reliable software. And so one answer is that we use tools, basically hacked compilers to insert these checks, as you say.

Starting point is 00:49:05 And this is a really, really good idea. And so, for example... Like the sanitizer. Oh, sorry. Yeah, exactly. The sanitizers are the example. So address sanitizer, for example, causes the compiler to insert code that traps kind of a large variety of really pernicious undefined behaviors relating to memory. And this is 100% these tools just needed to exist. And the fact that they do now has really changed the game for these languages. When I teach C and C++, I basically tell the students to leave these sanitizers turned on

Starting point is 00:49:36 throughout their entire development. And I see, looking over their shoulders, I see all the time it catches, you know, just even little things like shifting a number by a negative shift exponent. You know, this kind of thing really could have unpredictable results using compilers that we use from day to day. And these tools are just incredibly valuable. And they really, like I say, I think they really have changed the nature of developing images. And then the case that we have left is that the tools often have too much overhead to put into it. And one thing that's kind of interesting, though, is that some parts of, for example, the Android system are compiled with some of the undefined behavior sanitizer's checks

Starting point is 00:50:13 left in in deployment. So some of the video codecs, I believe, they were really worried about buffer overflows and basically errors in parsing a video file that came from a potentially hostile source. Right. And so what they've done is just left some of the checks in, and they don't add that much overhead. And so address sanitizer is kind of a pretty heavy-hitting collection of checks

Starting point is 00:50:37 that can significantly affect runtime, and that doesn't tend to be left on. But some of the lighter-weight checks, like for integer overflow, can be left on as long as you're willing to accept that every now and then the program's going to trap when it hits an integer overflow and exit the program instead of continuing on, possibly producing erroneous results. So it really just depends on what you want to happen. The program does something bad.

Starting point is 00:50:59 And of course, this is common to all programming languages, right? This isn't at all unique to C or C++. Now, this particular tradeoff is a tricky one. But the thing that's really different about C++ is there's this default decision to not trap any of this, just to let it happen and let the execution go on, possibly making no sense whatsoever and possibly now being controlled by an adversary. So in your role doing research with code correctness and undefined behavior and these kinds of things, what would you leave our listeners with? What would you say this is like the most important thing you should

Starting point is 00:51:30 be thinking about, I guess, if that might be a big question. No, I think it's pretty easy. I think programmers should just basically turn on all the tools. Basically, these tools are there for a reason. They produce really, these days, really pretty good diagnostics when you overflow a signed integer or something. It just tells you what you did at what line with what value, leaving these things turned on during development and during testing. And then, you know, then at deployment time, we have a decision to make like we were discussing.

Starting point is 00:51:59 But, you know, that's sort of a domain-specific tradeoff that has to be made. But during development and during testing, it's absolutely clear that you just use all of these tools. And what I find is that a lot of people are aware of these tools or aware of some of them, but not everybody knows about them. And, you know, and so I think it's really, I'm really glad you asked me this because I really like this opportunity to try to help spread awareness. And so basically what we're talking about, just to be concrete here, is if you're compiling a C++ program with Clang++, you add the command line option, dash F sanitize equals address comma undefined.

Starting point is 00:52:32 You can use those together. Yeah. Those two sanitizers cooperate nicely, and between them they add a really broad suite of checks to your program, which catches almost all of the really, really heavy hitting undefined behaviors. And so this one small change, you know, it makes the program slow down. So now maybe you're 20%, 30% slower, but you know, this doesn't matter too much during development. And everybody should basically who's using C++ should do this all the time. Okay. Okay. Sounds like good advice. Well, thanks for coming on the show today, John. Where can people find you online? Well, I'm on Twitter mostly.

Starting point is 00:53:08 Email is sort of, I guess, always bad. Just because everybody gets so much. Twitter's a place to find me online. But, of course, I do read email. Yeah, one of those places would be great. Okay. Thanks so much for coming on the show. Yeah, well, I really appreciate the invitation, guys.

Starting point is 00:53:26 Thanks a ton. Thank you. Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter.

Starting point is 00:53:49 You can also follow me at Rob W. Irving and Jason at Lefticus on Twitter. We'd also like to thank all our patrons who help support the show through Patreon. If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast. And of course, you can find all that info and the show notes on the podcast website at cppcast.com. Theme music for this episode is provided by podcastthemes.com.

CppCast - Analyzing Undefined Behavior

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.