CppCast - Fuzz Testing on the GPU

Starting point is 00:00:00 Thank you. And by JetBrains, the maker of smart IDEs and tools like IntelliJ, PyCharm, and ReSharper. To help you become a C++ guru, they've got CLion, an intelligent IDE, and ReSharper C++, a smart extension from Visual Studio. Exclusively for CppCast, JetBrains is offering a 25% discount on yearly individual licenses on both of these C++ tools, which applies to new purchases and renewals alike. Use the coupon code JETBRAINS for CppCast during checkout at JetBrains.com to take advantage of this deal. In this episode, we talk about a new version of CMake and an interview with Bjarne Stroustrup. Then we talk to Artem Dinerberg and Ryan Eberhardt from Trail of Bits. Artem and Ryan talk to us about fuzz testing

Starting point is 00:01:16 and a new tool to fuzz test on the GPU. Welcome to episode 275 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today? All right, Rob, how are you doing? Doing okay. You know, we talked last week about different ways of constraining templates and then how other languages handle it with Connor. And I find myself fighting the way C sharp does generics this morning, which has been very frustrating.

Starting point is 00:02:13 They, they also make you have to forcibly constrain to, in order to do anything in your generics. And there's no way to limit a generic to numeric type. So you actually can't write a generic and do like an add operation with your so it has to be like classes only not seems that way types yeah very frustrating it's power of c++ is that you can make your own built in types that behave like provided types right i don't know of any other language that really lets you do that. Yeah.

Starting point is 00:02:45 Unfortunately, I do have to do this part in C sharp, unfortunately. Well, uh, at the top of episode, I'd like to read a piece of feedback. Uh, this week we got a tweet,

Starting point is 00:02:55 uh, about Connor's episode last week, uh, from Antonio saying, I love listening to algorithm intuition. The first time listening to this podcast, maybe you want to go and watch algorithm intuition again. Fantastic episode of funny anecdotes learning new stuff about algorithms and thinking about different takes on concepts and similar features and yeah it was a great

Starting point is 00:03:14 episode it was great talking to connor again well it was a fun episode and uh i know we mentioned it during the episode but uh i believe the podcast that connor and bry Bryce are doing has officially launched now. Oh, did it? I believe so. I saw people tweeting about it. I think the first episode's out. Well, we should try to put a link to that in the show notes if, in fact, it has. Yeah.

Starting point is 00:03:35 I haven't had a chance to listen to it myself yet, but I'll have to do that. Okay. Well, we'd love to hear your thoughts about the show. You can always reach out to us on Facebook, Twitter, or email us at feedback at cpcast.com. And don't forget to leave us a review on iTunes or subscribe on YouTube. Joining us today is Artem Dynaberg. Artem is a

Starting point is 00:03:53 principal security engineer at Trail of Bits. He helps set technical direction for research and engineering projects and ensures that projects surpass customer expectations. Artem's research interests include automated vulnerability identification, program analysis, and usable security tools. Prior to joining TrailBits,

Starting point is 00:04:09 Artem worked as a security researcher in academia and companies both large and small. He holds an MS in computer science from Georgia Tech and a BS in computer science from Penn State. Artem, welcome to the show. Thanks for having me, Rob and Jason. We're also joined by Ryan Eberhardt. Ryan is a master's student at Stanford University and was a research intern at Trail of Bits this past summer.

Starting point is 00:04:29 He's interested in promoting software security through education, and he builds core code playgrounds and visualization tools that make learning systems more accessible. In his free time, Ryan enjoys making pottery and apartment gardening. Ryan, welcome to the show. Thanks for having me. Apartment gardening? Don't have too much real estate, so got to make it work. So do you have like corn stalks growing in your living room, in your apartment? I have a bunch of small avocado trees, primarily.

Starting point is 00:04:58 How big does an avocado tree have to get before it fruits? It has to be at least three years old, which most of them are like one to two years old. But also they need to grow outside in order to fruit because there's just not enough light inside. So you're growing them but you won't be able to actually get any fruit from them yet? Unfortunately, no. That is unfortunate. But someday we'll put them outside and then we'll get fruit. My wife's grandma has lived in the same house since forever. And apparently a few years ago, someone realized that the neighbors have a giant Florida avocado tree that hangs over grandma's yard. So you can just go out there and pick however many avocados you want. And since they're the Florida ones, they're like twice as twice as big as as what we get here anyhow um but anyhow yeah that's awesome

Starting point is 00:05:51 all right well artem and ryan we got a couple news articles to discuss uh feel free to comment on any of these and we'll start talking more about what you're up to a trail of bits okay of course let's do this okay so this first one, CMake 3.19 is now available for download. Anything exciting in this release, Jason? I don't know about exciting. There's lots of little details. The one thing I do want to look into personally is the new user presets, JSON that they support. I want to know more about what that's about, because I do find myself, it would be handy if it knew that I just, I personally prefer that it always start with a debug build or whatever, right? So I don't know if that's what that does or not, but I'm going to

Starting point is 00:06:36 check that out. Oh yeah, I was looking at this also. I'm actually very excited for presets. So a lot of our, we use CMake for all of our big C and C++ projects. And a lot of them have fairly complex configuration options, or we build existing stuff like LLVM with non-standard config, like we want RTTI on. And this is going to be super helpful since instead of like maintaining this in like readmes and shell scripts, we can have like a machine readable format that we can commit to version control. And people can just point to the right preset where we can point our build system with the right preset. And this will this I'm very much looking forward to this.

Starting point is 00:07:15 You just said something that could be its own entire rabbit hole. But you just said you compile LLVM with RTTI enabled. Yes, we needed to for a reason i can't even remember why but we needed to do it uh and so we have like we build it with rtti on because we needed to do it for something and i don't remember what it was but now everything depends on having it on because if you can't link something with rtti on and something with rtti off so we have like this yeah that's where we are oh it's unfortunate that you don't remember the background because You can't link something with RTTI on and something with RTTI off. So we have this... Yeah, that's where we are. It's unfortunate that you don't remember the background.

Starting point is 00:07:49 I do not, I'm sorry. This happened a while ago, and now I just know it has to be on. Even if the original reason has gone away now, it is still always on for our stuff, because all the other stuff it links against, it needs it on. This could use a review to go back to see if you actually still need it. But, yeah. I don't think I knew that LLVM's default

Starting point is 00:08:10 was to have it off. I think they compile the exceptions off too or something, don't they? I'm not sure about exceptions. I know RTTI is definitely off by default and we need it on for something that we were doing. That's something that I, well, again, complete rabbit hole, but something I've never fully understood because every project that I, well, again, complete rabbit hole, but something I've never fully

Starting point is 00:08:25 understood because every project that I know of that compiles with RTT off, RTTI off, ends up providing their own RTTI layer. Right. Like, why don't you just use the compiler? LLVM does effectively their own RTTI layer. And I've never looked at the history of the project to know why they do this, but I figure since they're providing their own, they don't need the native one. Right. And perhaps, I don't know. I just don't understand.

Starting point is 00:08:50 If the native one is too slow, then you would think that Clang perhaps could do something about that. Right. Don't know. I'm also excited for some of the small CUDA-related changes in the CMake. So we use CMake for the thing that we'll be talking about, and we have to rebuild CUDA code with Clang, and we actually use things like separable compilation, and we have to kind of have some hacks around it at the moment. And so I'm pretty much looking forward to having this

Starting point is 00:09:18 first-class support for this. Yeah, most of those CUDA support things just went over my head because I haven't messed with any of that. Sounds like it's a big release for your team then. Yeah, we're very excited about it. Alright, the next thing we have is an article on TechRepublic, and this is an interview with Bjarne Stroustrup,

Starting point is 00:09:36 and the headline is C++ Programming Language, How It Became the Invisible Foundation for Everything, and What's Next. Definitely an interesting article. It's maybe not too much, uh, new news if you listen to this show regularly, but it's kind of always fun to see a more mainstream, uh, news outlet do a big article about C++ and where the language has been going for the last few years. My favorite part of this is when the, there we go. When the author of the article asks

Starting point is 00:10:09 Bjarne what C++ is used for and where it's used. And he says, a first estimate to both questions is everywhere, which is what I found as an instructor too. Like, I mean, full disclosure, I have taught classes with Trail of B bits here who were discussing the team. And I had no idea that security researchers used C++. Right. Like I just it's things that I've learned along the way. language to extend and maintain because there are just so many people in so many places with wildly different use cases that need to use the language for wildly different goals. And he talks about in the article how difficult it is to choose what to accept in the language and to try not to sync the language with its own weight. And I thought that was a really interesting and challenging

Starting point is 00:11:03 aspect of designing and maintaining the language. Yeah. He says everyone wants, what, a simpler language with more features that doesn't break any of their existing code. Yeah. Yeah. I was very impressed that, as Jarni pointed out, that C++ has managed to ship on time every three years, like for 14, 17, and now 20. For something so big to takes so much international coordination, that's extremely impressive.

Starting point is 00:11:28 Yeah. Yeah, hopefully those releases keep going for years to come. And it's exciting to see it continue to evolve with influence from other languages. It's like the shortcomings of C++ have inspired new languages, which are now also inspiring C++. Very much, yeah. Yeah, absolutely.

Starting point is 00:11:47 I just did some contracting work with a client who's just getting up to speed on post-C++11 era and forward code and showed them in one example three or four new features they didn't know existed, basically. That happens to me every day okay and uh speaking of uh the language continuing to evolve uh we have another mailing from the iso committee for the month of november and yeah i mean each one of these monthly mailings uh i've been amazed at how many papers there they're still working on.

Starting point is 00:12:26 Pandemic aside, do you read into any of these ones, Jason? I didn't quite get to this, unfortunately, which is ironic since I'm the one who added it to the list of articles for us to discuss. No, I got distracted by the article with Bjarne and read all the way through it and then forgot to come back. Right, right. Ryan Artham, do you read through any of these? Yes, so I was looking into this and some of this was definitely a little over my head, but I was very excited that they want to fix the range-based for loop lifetime issue. There is a draft fix for that and that is something so super subtle that it takes you a while to understand that that is even a

Starting point is 00:13:08 problem where if you have a range based for loop and you have a temporary created object like you will have invalid access within the loop depending on how you specify the variable that loops over everything and they're going to fix it they're going to fix it so that you don't accidentally, at least you're going to, there is a proposal to fix it so that this does not happen and bite people because this is something so subtle and so easy to overlook. I didn't even realize about this until somebody at work

Starting point is 00:13:37 actually pointed this out, and I was like, I had no idea I could even do this, and I would have absolutely made this mistake. I think we discussed this specific issue with Nico when we last had had him on right rob i think so yeah he's the main author of this paper right okay so uh i guess artem maybe we can start with you and and you could tell us a little bit more about uh what you do at trail of bits and and tell us a little bit about fuzz testing which is a topic i think we've mentioned before on the show but we've certainly never done kind of an in-depth in-depth dive into fuzz testing in the cpp cast episode yeah okay i'll

Starting point is 00:14:17 try my best to give like a quick description of what we do and to give a rough and hopefully informative introduction to fuzz testing. Sure. So at Trail of Bits, we do basically three different things all focused on security. So one is we do research and development. This is work for customers like DARPA, like the Defense Advanced Research Projects Agency, famous for doing things like creating the internet. We do similar such groups like the Office of Naval Research and Army Research Office and things like that.

Starting point is 00:14:53 We have an engineering support group that does open source project maintenance to where we typically for large commercial customers maintain existing open source projects and also do custom security engineering solutions and we have an audit practice where we do security audits of software devices and less relevant to CppCast but also Ethereum smart contracts and these kind of work in a synergy where we try to develop new and interesting tools in the research side we use them on the audit side, and the audit side helps us improve our engineering and software development processes for the engineering side, which then helps us make better software for research to help go from idea to production faster. So we try to have this virtuous cycle going between the three groups.

Starting point is 00:15:42 One of the things that we do try to do is find new ways to eliminate software bugs, whether that's with better tooling, different development choices or video. And eventually, a lot of this comes down to testing. And one of the most successful ways to actually find real bugs and real software is something called fuzz testing. And fuzz testing is, at its core, is testing your program on randomized inputs. That is, like, you have a program that accepts some input, be it over a file, a network, or standard input and you have something that sends it randomized data and instruments the program to see if anything bad happens. By bad as in like a memory safety violation, some property becomes true that should never be true or something like that. And then it provides a

Starting point is 00:16:40 report that this certain input caused this certain failure, and it tries a new input. This sounds extremely naive, but even this extremely naive approach is surprisingly effective. Obviously, in the real world, this gets a lot more complicated, because if you think about the basics, if you have a program that takes even a 64-bit input, trying this via brute force for any large complicated program is going to take you more time than you have available, you know, possibly in your lifetime. So you're clearly not going to exhaustively test every single input possible. And there's a lot of variety in how you instrument your program, how you intelligently generate new inputs, and how do you instrument

Starting point is 00:17:25 this. And we should be clear that the inputs are semi-random and not completely random. Right. In practice, you have a lot more intelligence than just pure randomness. But the tooling for this has improved a lot since fuzzing was originally a thing. Like back in the late 80s, there was the original fuzzing paper by Dr. like back in the late 80s there was a the original fuzzing paper by dr bart miller at the university of wisconsin where they took random unix utilities and they fed them literally random input data and they saw that a whole bunch of them crashed and they had this test suite and then they repeat this experiment every few years and their latest one is a pre-print from 2020 i'm looking at it right now and they repeat something in this the same spirit that they had before and i'm going to

Starting point is 00:18:10 spoil the surprise for you but they still have a whole lot of crashes and hangs let me see if i can find you yes uh sorry i was trying to think if there's like an exact quote about how much uh whether things would get better or worse and things have stayed approximately the same for since the 80s to now on the basic test suite they were doing of just sending Unix utilities random input.

Starting point is 00:18:37 Okay. They've been doing this for 40 years. About 30, yes. 30, okay. And none of these bugs have gotten fixed? I think that this is not... So bugs have gotten fixed, but as people develop new software,

Starting point is 00:18:52 new bugs are introduced. Ah, okay. So you have to... It's not that none of the bugs are removed, it's that other bugs are added, because software also doesn't stay the same. You have to go for where the ball is going to be, not where the ball was. So even if you fix one set of bugs, you have a go for where the ball is going to be not where the ball

Starting point is 00:19:05 was uh so even if you fix one set of bugs you have a whole new set of bugs that are introduced and there's this interesting thought experiment is if you are have soft tools that find bugs and you fix them but you're also adding new features uh is the velocity of you fixing bugs greater than the velocity of you adding new bugs i don't even want to think about it. That sounds terrible. And so getting back to fuzzing a bit, it has proven surprisingly effective. That is that even though you wouldn't think that sending stuff for random input would really do much, or randomized input, or semi-random input with some intelligence,

Starting point is 00:19:44 but in practice it has been actually extremely effective at finding bugs it really started to take off somewhere around i would say the mid-2000s and the main reasoning for it was that they had these large software companies like microsoft adobe and so on shipping their products and to say the bad guys were using fuzz testing to find security vulnerabilities in their stuff, and with very severe consequences. And so the obvious answer to this is that you have to do this internally prior to release. And so they started adopting these practices and doing a lot of excellent research on how do you fuzz better and more intelligently. And there's a lot of great things coming out of Microsoft at this time, like their paper on Sage,

Starting point is 00:20:27 which combines fuzzing with symbolic execution to better direct which program path that you send your random input testing down on. So you can more intelligently pick next inputs. And they got better and better at this, and they started finding all of these bugs before shipping their software instead of afterwards, which led to a market security impact. And since then, since the large companies started doing it, the tooling for fuzzing started to improve since there

Starting point is 00:20:54 was now a need for it. And fuzzing tooling has continued to improve since then. To now where, for instance, with the LLVM and Clang suite of tools you have libfuzzer. Like libfuzzer comes standard with it and so you have fuzzing tools that ship with your compiler and so which makes it so that it becomes much easier to integrate fuzzing into your build and test system. But libfuzzer is considered part of the sanitizers at this point is that right? I believe that is correct do you know if there's any effort to then also integrate it in gcc just like some of the other sanitizers have been uh i am not sure uh i i don't know i'm not sure if there's any effort to integrate into

Starting point is 00:21:36 gcc specifically but you're correct there's a part of the sanitizer suite and this goes into the instrumentation problem to where, especially in a program written in C++, where you may have an extant memory lifetime issues or memory, spatial memory issues. Let's say if you only overwrite an object length by one byte, it might not cause any immediate failures without sanitizer support. And so you have sanitizers as a part of the fuzzing suite so that you know exactly when you have an actual fault. That's actually one of the harder problems to solve is discovering when something goes wrong, especially in languages like C and C++. So is it fair to say we needed something like address sanitizer before fuzzing could become efficient and accurate?

Starting point is 00:22:26 It's certainly before it could become accurate, for sure. Okay. Otherwise, you have lots and lots. Without it, you have lots of problems trying to figure out, you know, I see. Let's say you actually get to the point where you have some kind of, you know, memory safety error, like a segmentation fault or other issue, you never know what part of the program actually caused it unless you have something like sanitizer support, because the part that crashes may be, I don't know,

Starting point is 00:22:55 tens of thousands of lines of code downstream from the part that actually did the mistake and wrote past the length or used memory that has been previously freed. So sanitizer support made it so it's a lot, it reduced the amount of effort required to find out where the problem is in your code, making the outputs a lot more actionable during software development and testing. For our listeners who haven't used the sanitizer,

Starting point is 00:23:20 sometimes the report that they can give back is almost magical. It's like, you allocated memory on line five, accessed it on line three, deleted it on line 10,000, whatever. Yep. And it says like, and or like, this is exactly how many bytes past the end of your allocation you wrote. And this is like the values you put in it. It's really great. I want to wrap the discussion for just a moment to bring a word from our sponsor, PVS Studio. The company develops the PVS Studio Static Code Analyzer, designed to detect errors in code of programs written in C, C++, C Sharp, and Java. The tool is a paid B2B solution, but there are various options for its free licensing for developers of open projects, Microsoft MVPs, students, and others. The analyzer is actively

Starting point is 00:24:00 developing. New diagnostics appear regularly, along with expanding integration opportunities. As an example, PVS Studio has recently posted an article on their site covering the analysis of pull requests in Azure DevOps using self-hosted agents. Check out the link to the article in the podcast description. So you talked about these old Unix tools being kind of the early fuzz testing. And I'm kind of curious what fuzz testing looks like with some of these more modern tools. You know, if you, whether you're

Starting point is 00:24:33 working on an application or a library, what does it really look like when you want to start using fuzz testing on that app or library? What are you doing? Thank you. That is a fantastic question. Typically, what you need, this goes into developing to make things easier to fuzz. And so you want to have, logically have decouplings of the different layers of your software, typically typically or your different libraries so that each can be tested independently of everything else. And for fuzzing, typically you

Starting point is 00:25:12 have a way that you provide input into your library or program and a way to exercise the target functionality that you're after. So I'll use libfuzzer as an example. So the entry point to libfuzzer is always something like LLVM test one input or libfuzzer test one input, which has takes as an argument as two arguments. One is a buffer and one that there'll be filled with random data and the length of the buffer. And then you are responsible as a developer for saying,

Starting point is 00:25:45 if I have random input, how do I give it to my program to actually test the parts that would be tested with user-facing inputs? And so you need to write the code that, let's say, initializes your library, that configures it, and then that says, like, this is where the input would come in. And in any reasonably complex piece of software, you probably want to have more than one of these fuzzing targets because you have different configuration options for your library, which enable different software, which enable different features and

Starting point is 00:26:19 different functionality. And ideally, you want to try and test all of the different functionality that you offer to your users. So you would want to have different inputs that say configure your library in different ways and then pass these random inputs to it and test them separately. And to actually run these, you'd have this ideally somewhere in your CI system or maybe as a separate fuzzing farm, depending on your hardware budget and how much resources you can throw at it. And that will actually do your testing for you. If you have an open source project, you can use something like Google's OSS fuzz, which essentially has Google scale resources they devote to providing security testing for open

Starting point is 00:27:00 source software, where you send them a, you give them a pull request to their GitHub repo of your open source project, a Docker file to build it, and some fuzzing inputs, and they actually run the jobs on Google's vast infrastructure and send you back reports. Continuously. It's continuous fuzzing. Continuously, yeah. So 24-7 it just keeps fuzzing your stuff, because one of the hard things about fuzzing is it never really ends.

Starting point is 00:27:28 The longer you fuzz, the more confident that you can be that you don't have anything lurking. But as Artem said, the space of possible inputs is far too large to be exhaustive. And so you just let it run for as long as you can and, uh, and hope that, uh, if it finds something, then, then, you know, you have a problem, but, uh, if it doesn't, then you're not so sure. So OSS fuzz is really nice in that it's just continuous fuzzing. You submit your project and it will keep fuzzing it 24 seven. Um, but if you're implementing it in your own project, then as Artem said, maybe you

Starting point is 00:28:03 can set up like a CI job to run it for an hour or two kind of just in the background or set up continuous fuzzing on some small cloud instance or something like that. Is there any way to have any level of confidence? Like if let's say the fuzzer runs for three months and it reaches 100% code coverage. Can I say, okay, I have exhausted all the possibilities or is that just false hope at that point? It's still false hope because even program code coverage does not equal covering all the possible states that your program can take.

Starting point is 00:28:38 I guess you can only prove, what's the saying is, you can only prove the presence of bugs and not the absence of them. So it is good at detecting bugs that are there but it is not good at proving that something is bug free uh that is not something that fuzzing can do for you if you want something like that you would have to go in like a whole another level of like working with program correctness and uh like and various proving systems. There is a lot of work on doing exactly that, just proving that a certain amount of code is bug-free,

Starting point is 00:29:11 typically used in aerospace applications. But there you would, number one, start with usually, not always, but a reduced dialect of C. For instance, something that doesn't allow dynamic memory allocation. I think the Airbus flight control software, I think is the best known example of something that has been proven to be bug free given certain definitions of bugs and certain constraints on the C input to use.

Starting point is 00:29:36 You said Airbus not 737 match. Yeah, Airbus. And like there's the there's a version of the L4 hypervisor, I think, that has also proven to be bug-free under certain conditions and certain assumptions. And by bug-free, I mean that it meets a certain specification, where if the specification

Starting point is 00:29:57 is wrong and has a bug in it, this goes into a philosophical definition of what is a bug. Obviously, in one way of saying it, if the programmer wrote it, that is the code, then there are no bugs because that's what the programmer wrote. But you have this other model of, you know, that is not what you had intended to write,

Starting point is 00:30:16 where there's different levels of abstraction in your software. There's like the thing that the programmer has in their head, then there's a thing that is actually represented in the language, and there's this thing actually executed by the machine. And all of those abstractions are slightly different than where you try to express one with the other. And where a lot of bugs come from is, you know, what the machine does is not quite what the language does. And what the language does is not actually what you had intended to do in your mind as a software developer. Yeah, I'm convinced that gamers actually want a certain level of bugs in their games so

Starting point is 00:30:49 that they have things to exploit and make speedrun videos about. That is true. I'm sure it adds a lot more joy to have these special things in there. In addition to the theorem provers that Artem's talking about, there's another technique called symbolic execution, which is a little closer to fuzzing. the theorem provers that Artem's talking about, there's another technique called symbolic execution, which is a little closer to fuzzing. So with fuzzing, what you're trying to do is you're just like sending some semi-random input, watching the program run, send a different semi-random input, watch it run. And you do this many, many times. The idea with symbolic execution is you

Starting point is 00:31:18 only run the program one time, but you never actually specify a concrete input. For example, let's say that this program takes a number as an input. At the beginning of the program, you know that this program just takes a number, and then you trace through the different branching paths. So say it takes, there's an if else, you kind of trace this program going through both paths simultaneously. Okay. And in one side, you say, okay, now x is some number that is greater than 100. And on the other path, you say x is some number that is less than 100, if the condition was like if x greater than 100. And as you go through these branching paths, you apply more and more constraints until you reach some some abort or some memory safety violation or something bad.

Starting point is 00:32:07 And at that point, you know a set of conditions on all of your variables where you say, okay, x is some number between 100 and 111. And you can kind of trace your way back through the program to say, if we had an input that was specified with these conditions that were applied with all these branching paths that would get us that would cause the program to the the control flow to end up crashing or going to whatever bad place we didn't want so that is an answer to the goal of like can can we be confident? Can we be sure that there is, there are no bugs? I mean, there's still the problem of how do you define a bug, but that would give you the confidence that fuzzing does not give you. But the problem is that it's

Starting point is 00:32:57 not practically usable on, on any production grade, large system because of path explosion. If you have like every conditional every conditional is a branch, and if you have a lot of conditionals, then the number of possible paths goes up exponentially, and you can't investigate all possible paths if there are just so many. So there's a lot of really great people working on this problem and making quite a lot of progress,

Starting point is 00:33:23 but for now, fuzzing seems to be the easiest, most accessible way to at least get something. And I'd also say that fuzzing is quite effective. It sounds like something that you wouldn't like because it's just too simple and it doesn't give you any guarantees, and yet it still finds so many bugs that we can't keep up. We haven't even talked about the problem of once you know which bugs you have, how do you prioritize which ones to fix? And fuzzing produces so many bugs that we don't even have time to fix all of them. That's another reason that some of these bugs are still around from Unix utilities

Starting point is 00:34:00 from the 70s and 80s is, oh, we just don't have time to fix everything. We can try to identify the bugs that might have the most impact. And a lot of fuzzing research has focused on that is once we have bugs produced by a fuzzer, can we build automated tools to evaluate them and tell us which ones to fix first? But even if you don't find all the bugs, we still find enough that we don't need all of them. If you don't mind, I'll share a quick anecdote that when I was working on ChaiScript years ago and first learned about fuzz testing, I ran it through, I think at the time, AFL.

Starting point is 00:34:39 Yeah, I was running it through AFL. And I was playing whack-a-mole, like you were saying. Like, okay, I fixed this bug, I fixed this bug, I fixed this bug. Until I realized I just basically had to scrap my parser infrastructure and pretty much start from scratch and write parsing code that wasn't susceptible to these bugs that I had to keep playing whack-a-mole on. Mostly because I kept parsing past the end of my input string. Because I would be looking for a closed symbol of some sort

Starting point is 00:35:05 or something like that. So yeah, it took me, I wasted a while playing whack-a-mole on the bugs before realizing I just needed to do an architectural change. Yeah, which is a reason why we're so excited about new language security or language safety features. Fuzzing developments are awesome for finding bugs in existing code,

Starting point is 00:35:23 but it would be really awesome if we just didn't have the bugs in the first place. And you mentioned symbolic execution is one of the things that you do work with there at 12bits also, right? I think one of you mentioned that earlier. Yep, that is absolutely correct. We have a symbolic execution tool called Manticore. It works on binaries x86, ARM, ARM64, and also as an Ethereum smart contract. And it tries its best to explore all of your program state that you can given, you know,

Starting point is 00:35:56 available resources and a big enough symbolic input. And as Ryan said, it's a great technique, but you eventually run out of resources because for any real program, you just simply have too many possibilities of all of the set of branches that it could possibly take. So you quickly exhaust any possible search space. And there has been a lot of great work in finding ways to prune this and to make it more practically usable in larger applications. And it's fairly effective. And one great thing you can do is you can often combine it with fuzzing and so use symbolic execution to generate an initial set of possible inputs or you say you know you have a fuzzing input that reaches so far into your program and then you start symbolic execution not at the very beginning but somewhere

Starting point is 00:36:40 already fairly deep inside your program code and you let it explore a bit further because there are certain things that fuzzing just isn't very good at seeing past like the trivial example is if you have like a checksum like if you have some kind of file with a checksum in it if you do the checksum check first then if you try to fuzz it you will never really get past the checksum check you're just going to sit there trying to brute force it until the heat death of the universe. So this goes back to fuzzer-friendly development is for your development workflow, you have to have a mode where you ignore checksums. And then you give it randomized input under the assumption that a sufficiently reasonable attacker, if they are also testing your software for security vulnerabilities,

Starting point is 00:37:25 would just disable the checksum check and then test your software with it. So you should do the same thing. And then they can, you know, fill the checksum back in once they find the right vulnerable input. And we've mentioned AFL and libfuzzer. But we need to at some point get to the point of this interview. Oh, yeah, as far as fuzzers go but you all at trail of bits have worked with several other fuzzers have your own own fuzzers as well right yeah uh we typically uh so we whenever we do things like assessments we use the normal fuzzing speed test speed like lib fuzzer afl and so on uh but sometimes we have to do work in unique or custom domains where the traditional tools just don't provide

Starting point is 00:38:11 the kind of either access or information that we want. So, for instance, one of the places to work is like Ethereum smart contracts. And traditional tools are designed for C++ binary software. So they wouldn't work there. So they have our own fuzzer for those called Echidna. We've had to do work testing embedded software running VxWorks. That was a research project.

Starting point is 00:38:40 And we essentially said, like, here is some software. It runs runs on the X works for Intel and the X works for power PC we want you to test it as much as possible and we're like how do we start this and they're like well no well it's complicated and so it was up to us to figure out how to like get this thing running and how to test it and so we ended up writing a custom buzzer which combined AFL and QMU in a single process that would do cooperative handoff between the two where you know you'd emulate the target in QMU we do a whole bunch of introspection into the program using a VxWorks device model this they have actually very great device model that

Starting point is 00:39:23 makes it very easy to introspect into it The export is very nicely done. And then you can say, this is where it's trying to read from this thing. And so we could provide input that would have been given to a certain device, but provided from AFL. And then since we have an emulator in QMU, we could then see all of the code that that certain input touched, and then provide that as feedback back to AFL's code coverage bitmap, which is how it determines whether your input did something new in the target program,

Starting point is 00:39:55 meaning that it's something interesting to mutate further. And so we had that work, and that worked fairly slowly. That was its big downside. And that was actually one of the several inspirations for what we ended up doing for this project. And we've also have something called DeepState, which is property testing for C and C++ programs. And property testing is a bit stronger than, I don't know if I'm going to say stronger it's a bit it's not it says uh this property of the code always holds or this property the code should always hold and it is up to the property tester to figure out a way to if there is anything that can break that invariant uh and

Starting point is 00:40:37 the goal for this to be kind of like unit testing to where you design different tests saying you know with this uh you know, this function should always return zero for any reasonable input. And then you give it the property test and the property tester tries to find a counterexample to it using whatever means available. And we have several backend plugins for this that tries to use AFL, that tries to use libfuzz, that tries to use Manticore symbolic execution tool to try to find ways to disprove that a certain property exists uh it's open source you have it on our github uh we've had reports from people who say that they use it but i nobody i can mention officially and i'm not actually sure

Starting point is 00:41:16 without with any open source project you don't actually know who uses it until they tell you yeah but supposedly people do use it and it has in fact done a good job at preventing bugs from occurring, which always makes us happy. So I guess that, yeah, go ahead, Rob. I was just going to say, do you then want to tell us a little bit more about the fuzzer that you're working on? Yeah, of course. I would love to. So the fuzzer we're working on is called Mullet. Mullet is the Russian word for hammer. It's named that way since all of our LLVM-based tools, which we have a lot of, have two L's in the name.

Starting point is 00:41:48 So this was a way to put two L's in it, and you kind of think of it as hammering out bugs from software. And so I guess I want to preface this. The whole idea was kind of crazy when we first thought about it. And I'll give you a little bit of the background on how we decided to do this. So there is a blog post by a guy named Brendan Falk of Gomoza Labs. And it's really interesting to where he says that normal fuzzing is really slow. If you're fuzzing embedded targets

Starting point is 00:42:22 to which you need to write an emulator for, what you can do is you can write something that takes the software and runs it massively in parallel using SIMD instructions, like the SIMD instructions on AVX, AVX 512 or something like that, which are all non-modern Intel CPUs. And he did this using a custom jitter and a custom hypervisor and a lot of really amazing custom stuff that I could never hope to write in my entire lifetime. And he ran it on a set of Xeon PHYs, which is like a board with a lot of Intel Xeon chips

Starting point is 00:43:01 that are, I think, not as powerful as normal Xeon, but you get a lot more cores. And he showed something ridiculous amount of throughput, like a trillion instructions per second or something absolutely amazing. And I was extremely impressed by this. And the more I thought about it, I was like, you know, I could never write something like this. This is much too hard for me but we do have the same problem of where we want to fuzz test different embedded software or other software that you need to write an emulator for so you know we talk about fuzzing a lot but if something you need to actually execute the code and a lot of times

Starting point is 00:43:40 you want to execute code that's written for something that simply can't execute a lot of throughput like your raspberry pi like can't get a lot of execution throughput in it and there's other things uh which you know you may want to fuzz that have semi custom hardware they talk to or like there's like one of them or like 10 of them and like nobody's going to give you access to it to run like random, your own random code on it to see how many executions per second you can get. And you have to have a different solution where you emulate the target software

Starting point is 00:44:13 and you try to get as many executions per possible, as many executions per second as possible to look through the full, as much input space as you can per unit of time. And clearly you want to do this in parallel since this is an embarrassingly parallel problem you have one program that executes on multiple different data items and you want to see what happens when you run this program on this different amount of data and it just so happens that there is like a common piece of hardware that is really

Starting point is 00:44:43 great at you know single program multiple data problems and it's called the gpu and we're like well we have these gpus sitting completely unused in all of our computers and we have this massively parallel single program multiple data problem like instead of using simd can we possibly use all of this existing natively parallelized hardware to do this instead? You know, it sounds like the problem fundamentally seems like it would match. The other economics bonus of this is thanks to the big machine learning boom, cloud time for GPUs is relatively cheap.

Starting point is 00:45:21 Since, like, if you have to think of, like, from an economics perspective, if you're, you know, like, Azure or Google Cloud or AWS, you have to provision for peak usage of whenever everybody's training their machine learning models. But when people aren't training them, you still have that GPU plugged in

Starting point is 00:45:38 that you need to, you know, it's kind of expensive to get somebody to take it out. So it's still there using a power. So the off peak GPU prices are ridiculously cheap compared to the like the on peak GPU prices. And so you using this incredible hardware, you have the potential to get a really large amount of computing throughput per dollar. And we're like, this, this is too tempting, We have to see what we can do to try to do this. And the reason I said this is kind of crazy is if you Google up, like, you know,

Starting point is 00:46:11 run native code on a GPU, and you immediately get these long list results of why this will never, ever work. And those results are really reasonable. Those people aren't wrong. So there's a lot of problems with that is that number one uh at least i'll talk about nvidia gpu specifically here i don't i have nothing against amd i think they make wonderful hardware it's just that nvidia happens to be what we are most familiar with and i don't want to accidentally say something uh that might

Starting point is 00:46:41 not apply from one to the other but But NVIDIA GPUs execute, they don't execute like ARM or x86 instructions. They execute something called PTX, which is the GPU instruction set that I think gets compiled down somewhere in the GPU driver to the actual whatever native instructions that are different for each specific graphic card's generation. So number one, you can't just run a binary on them. You need to run PTX code.

Starting point is 00:47:06 Number two, the execution model is completely different than what you get with a traditional CPU. Since by design, a GPU expects to run the same program in parallel on different data inputs. And so normal programs for CPU do this something called divergence, like you have a branch condition, and like in one instance, you take, you know, the true branch once you take the false branch. And if you had this happen on a GPU, you very quickly have each GPU thread doing a different part of the program, which is terrible for it, because that up until recently, in hardware, it literally had to, they all had to have the same program counter up until recently in hardware it literally had to they all had to have the same program counter up until the relatively recent releases of it and so what it'd have to do is it

Starting point is 00:47:50 have to run one thread and mask all the other ones off and then run the other one around the third one around the fourth one and up until like you know thread 10 000 or whatever and so instead of executing everything in parallel you'd actually execute everything serially very slowly uh and you would do much worse than if you had just executed things in a normal processor to begin with. The memory hierarchy is completely different. So, like, you know how in a normal processor you have cache, and, like, you have, you know, like, your L1 cache,

Starting point is 00:48:16 you have your L2 cache, and then you have RAM. You don't really specify saying, like, you know, this thing is going to, like, there's no, like, C, C++ annotation saying this is going to go into L2 cache, and this is going to go into L1 cache, and this is going to go into RAM. But in the GPU, your memory hierarchy is explicit. And the latency numbers are a lot greater. Like, stuff that goes into, my mind is blanking on what all the different parts of memory are called. They're not immediately obvious there's like shared memory there's unified memory there's constant memory and there's another

Starting point is 00:48:50 kind of memory and memory global memory yes yes like global shared and unified all seem like they are mean the same thing but they're not uh but so like there's some memory that's extremely fast but extremely small and there's memory that's uh less, but extremely small. And there's memory that's less fast, but larger. And finally, you have your global system memory, which is the biggest you can get. And it's enormous. But fetching from RAM to a GPU takes even longer than fetching from your RAM to your CPU. But it's from my understanding. It has to go with the PCI bus.

Starting point is 00:49:22 Officially quoted on this. So your latency is is quite high instead of the direct memory bus you have to ask the pci bus for yes and you have to explicitly state you know where you're allocating your memory and where a certain structure is going to reside and so you you can explicitly say that like you know this is going to be in shared memory and this is going to be in global memory uh and so you have a more explicit direction over your memory hierarchy there's no operating system so there are no processes yeah there's no operating systems all of your codes like all of your operating system services are gone yeah you and like there's there's a lot of challenges in like getting this up and running but what it'll be like a, and we thought we could do this. Right. So how successful were you in making this GPU fuzzer?

Starting point is 00:50:07 Pretty good. So I'll, well, I guess I can let Ryan, like, give the details of some of the performance spoilers, and they can get, like, some more of, like, how we got from point A to point B afterwards. So, Ryan, as the intern, were you the one that was made to do all the hard work on this is that artem artem laid the groundwork actually artem probably did the the groundwork that nobody wanted to do and gave me the interesting stuff but um i i did um most of the

Starting point is 00:50:38 optimization work uh and okay and that was most of the work that i did this summer so yeah can you talk about some of the stuff you did and like how fast things got from where they were in the beginning? Well, at the beginning, things were tragically slow, like not even worth talking about. Um, I have a blog post up where I have some graphs and, uh, it was just sad, but I mean, we expected that, um, we expect things to not work. And in fact, things might still not work. We still have a long way to go. And we're comparing to libfuzzer right now on execution performance. How many executions per second per dollar can you get? And the per dollar

Starting point is 00:51:20 is important because libfuzzer is running on CPUs, which have a different billing rate on Google Cloud than for GPUs. But right now we're only focusing on execution performance because executing in this way is not something that's ever been done before. Mutation, like generating the inputs that you actually execute, is much better understood in fuzzing. We know how to do input generation for fuzzers. We don't know how to actually execute the for fuzzers. We don't know how to actually execute the programs on a GPU. That's never been done before. And so in our tests, we're basically just seeing how many times per second we can run these programs, these ARM64 programs on a GPU, which is not quite an apples to apples comparison.

Starting point is 00:52:06 LibFuzzer has a fair amount of overhead for input generation. But on the other hand, LibFuzzer is solving an easier problem in that it's instrumenting source code, whereas we are translating a binary with no source available. Are you actually running? You said running ARM64 on the GPU, but I have to ask, are you actually running ARM64 on the GPU, or are you running a completely different program

Starting point is 00:52:31 by the time it's been translated? It's a different program by the time it's been translated. But what we do is we take the ARM64 instructions, and then Trail of Bits has a library called Remill, which we call it lifting, where it takes the ARM64 instructions and lifts it to LLVM intermediate representation. So for those unfamiliar with LLVM, normally you have source code, and then there's an LLVM front end that compiles it to LLVM IR, and then there's a back end that compiles it to the target assembly. And we're going one step backwards.

Starting point is 00:53:06 So we're taking assembly and lifting it to LLVM IR. And then the LLVM project has a backend for PTX, which is the assembly instruction set that CUDA GPUs use, that NVIDIA GPUs use. And so we're using that backend to retarget that IR for the GPU so that it can actually run inside of the GPU. So effectively, we're translating ARM to PTX and running that inside of a GPU. And it's technically a different program, but it has the same semantics as the original program. But that's what we're evaluating right now. And currently we get six times the throughput of libfuzzer per dollar.

Starting point is 00:53:51 So if you normalize per dollar, we're getting about six times the number of executions per second, which is promising. Brandon Falk, who did the initial Xeon Phi research, wrote an excellent blog post kind of critiquing our work. And he brought up a lot of points, a lot of good points. And I think his primary concern is that once we add in the mutation engine, that our numbers won't be so great. We still think that it'll work out in the end because one, LibFuzzer is running the mutation engine and executing the inputs all on the same CPU core.

Starting point is 00:54:30 Whereas we have the benefit of having a coprocessor, really, where the GPU is doing the execution, but we would do the mutation shared across the CPU and the GPU. So we have more hardware resources that are available that we're currently underutilizing. Our CPU is not really doing anything right now. And number two, we are still really, really, really early in the optimization stages. We've applied three very, very basic optimizations that you would want to

Starting point is 00:54:57 apply to any CUDA GPU program, and they've given us 100x speedup. And there's quite a lot more to go. So, um, we're at six X right now. Numbers could go down dramatically. Numbers could go up dramatically. Um, but we're very excited that we've gotten this to work at all because as Artem was talking about, it's kind of an insane concept. Um, and, and we think it has a lot of promise. I also find it funny that you referred to it, if I understood you correctly,

Starting point is 00:55:25 referred to the CPU as the coprocessor to the GPU. We just have the CPU over here that's not doing anything. We can offload some work to it. Yeah, in that case, that is true. So yeah, we think we definitely still have some performance gains up our sleeve once we do better profiling with some of the NVIDIA profiling tools.

Starting point is 00:55:45 And once we do better profiling with some of the NVIDIA profiling tools. And once we optimize it, figure out exactly what's going on where, and we have a lot of performance gains to squeeze out during the translation phase also. Right now our translation phase is very naive, and I think we can do a lot of optimizations in the LLVM pipeline that does the lifting portion that translates the binary to LLVM. We could still do a lot of work there. All of those tools are actually open source, all of the lifting and translation tools.

Starting point is 00:56:11 We've been maintaining them for a while. This is kind of an instance where, you know, if you have a hammer, everything kind of looks like a nail. And so we've had these lifting and translation tools and we're like, why don't we apply lifting and translation tools and we're like why don't we apply lifting and translation to this problem lifting and translation by the way is the same technique that uh is being used for the new max with rosetta 2 um with the the apple processors yeah that's a wonderful analogy that is all not not quite the same but in spirit that is effectively yes that is totally that's a great way to think about it as to what's happening uh and speaking of that, another performance gain I hope to get is

Starting point is 00:56:46 GPUs are still effectively following Moore's law, where each generation is almost twice as good as the previous. And I'm hoping that by the time we get to a more usable state, there'll be another GPU generation. Hopefully we get it done sooner, but if there'll be another GPU generation, we'll just get a free hardware boost in terms of how much throughput we can get. So, I mean, since you're not running directly on the processor, what are the limitations? Where does this start to fall down? The biggest fundamental limitation, there's a lot of limitations right now just because we haven't implemented it yet.

Starting point is 00:57:21 One of the biggest concerns will be system calls because there is no operating system running in the kernel. So we'll need to emulate the system call or relay it back to the CPU so that the CPU can execute it there. So code that is very system call heavy'll probably get a significant slowdown as it hits like a bunch of syscall emulation or communication overhead where we're sending that back to the CPU and then maybe sending the result back to the GPU. But we also think that that will be pretty feasible. Brandon Falk has had similar issues with Xeon Phi, and he doesn't believe that it's a significant issue but that is going to take a while to do and so for now we're testing syscall lists code and once we do add that support we'll probably be focusing on code that's relatively light on system calls. I think the ideal candidates here are like parsing code or some embedded software that

Starting point is 00:58:32 doesn't have a lot of interaction with the host kernel. Can I use this lift and translate technique that you're using in a more generic sense? i say i've got this algorithm this program whatever that is written for x86 64 and i just want to run it on the gpu in a massively parallel way can i use your tools to just do that uh yes with the caveat that running it on the gpu is not going to fix inherent parallelization problems in let's, the algorithm and what it's trying to do. So this is only useful for you if your problem still fundamentally matches what a GPU is good at. So it has to be a single program, multiple data problem with a relatively low divergence. If, let's say, you have, I don't know, some encoding algorithm that you're running

Starting point is 00:59:22 that is relatively low branch and you're just not getting enough throughput running it in your cpu because you have like these data streams you're translating with this yeah this might this might really help you uh but if you have i don't know you're not going to let's say run something which is like let's say like relatively branch heavy like i i'm trying to think of like a good example right now but uh i don't know i can't think of something but like it's not going to fundamentally solve uh your underlying problem characteristics uh right if it was already a good fit for your gdgpu yes this might indeed help you run things faster. The caveat also is that the translation process currently is not error-free

Starting point is 01:00:08 and requires a little bit of manual fixing. It's getting better and better. We're continuing to improve how much, like, what we can handle. And I guess another limitation I wanted to say is your GPU is fundamentally limited in memory and so is your overall system so this is never going to say take something like Chromium

Starting point is 01:00:32 and run it on your GPU that is never going to happen because that reconsumes all available memory yes, exactly, it would run just one instance and it wouldn't be any faster, in fact it would be much slower we have real program size limitations, both in terms of code size, probably, and in terms of how much memory the target program allocates

Starting point is 01:00:55 and works with that we probably could never really overcome. So this is never going to work for everything, but our hope is that it'll work for its target setup of embedded software, which you need to write an emulator for anyway. I feel like we could keep digging into this. It's a very interesting work we're working on, but we are

Starting point is 01:01:13 running out of time. Yes. Are there blog posts or anything that listeners could look at for more information? Yeah, so Ryan has the Trail of Bits blog post on the Trail of Bits blog. If you search for

Starting point is 01:01:28 Let's Build a High-Performance GPU Fuzzer on the internet, I think that's the title. But you'll find it. Sounds right. I'll put that in the show notes. And you are continuing development on the project? Yes, we are. We hope to fix some of the performance issues

Starting point is 01:01:45 and add features such as input mutation. And we want to do a real apples to apples comparison versus something like the buzzer and hopefully publish some kind of peer-reviewed paper on our technique. And assuming everything actually works, at the end, we're going to try to open source it. Since at Trails of Bits, we try to open source everything we can feasibly open source uh like for instance like all of the actual binary translation tools

Starting point is 01:02:10 and a lot of the fuzzing tools we've discussed their symbolic execution tools are all open source and on github right now so you can find and use all of them uh so we hope to add this to the list once you know we know it actually works and is not a big jumbled embarrassing mess uh no it's an experiment like yeah it's an experiment uh it's very research quality the the build system for it is cmake based but we have a lot of you know translating you know one thing to one architecture to another to a third and like combining them all and like big LLVM blob and running what is effectively our own link time optimization on it, but not quite.

Starting point is 01:02:50 It looks like, have you seen the Pepe Silvia memes from It's Always Sunny where he's looking at the board with the chart on it? It looks like that is how our build system looks like currently for this. Oh, okay, right. Okay, well, it's been great having you both on the show today. Thank you so much for inviting us.

Starting point is 01:03:08 Yep, thank you. Thanks for coming on. Oh, I just wanted to add one thing. TrailerBits is currently hiring C++ developers. So if you are interested in some of the stuff you've heard, we're actively hiring. And if you would like to apply fuzzing to your own project and you are having a lot of trouble, we can certainly help you out.

Starting point is 01:03:30 Please contact us. Okay. Thanks, Artem. Thank you so much. Have a good one. Bye. Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic, we'd love to hear about that too. You can email all your thoughts to

Starting point is 01:03:48 feedback at cppcast.com. We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at Lefticus on Twitter. We'd also like to thank all our patrons who help support the show through Patreon. If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast. And of course, you can find all that info and the show notes on the podcast website

Starting point is 01:04:13 at cppcast.com Theme music for this episode was provided by podcastthemes.com

Your Ad Here

CppCast - Fuzz Testing on the GPU

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.