TLB Hit 💥 - Episode 6: ƑẍɄʑʑ҉⟆Ƒu𝔷𝔷⧫ᶳΩ𝓕
Episode Date: December 15, 2024In this episode, Chris and JF dive into the world of code coverage and fuzzing. They explore why coverage matters, the types of coverage metrics, and how fuzzing helps uncover unexpected software beha...vior. From the practical trade-offs of MC/DC testing to the randomness of AFL’s mutators, the hosts discuss techniques, tools, and the balance between robustness and resiliency. Tune in for insights, laughs, and maybe even a recommendation to "go fuzz yourself!"
 Transcript
 Discussion  (0)
    
                                         All right, welcome to our weekly episode of TLB Hit.
                                         
                                         So let's start with some levity.
                                         
                                         The software tester walks into a bar and orders a beer,
                                         
                                         then orders zero beers.
                                         
                                         Then the engineer orders 2,147,483,647 beers.
                                         
                                         Then they order a lizard, then minus one beers,
                                         
                                         then corny UIP beers.
                                         
                                         Finally, finally, the real customer walks in and asks where the bathroom is.
                                         
    
                                         The bar bursts into flames.
                                         
                                         Now, what's this got to do anything?
                                         
                                         Well, we want to talk about code coverage and fuzzing.
                                         
                                         And this joke is great because it illustrates what happens when you, presumably a human,
                                         
                                         test what you think needs
                                         
                                         to be tested. The problem we thinking meat creatures have is that we don't always know
                                         
                                         where failures might be. That is where metrics like coverage are useful. As a human, we might
                                         
                                         think that testing beer orders is where a bar might full bar. But as the saying goes, it's hard
                                         
    
                                         to make things foolproof because fools are so ingenious.
                                         
                                         Or in this case, the loo is where failure was.
                                         
                                         Anyhow, I heard that needing to explain jokes is what makes them good.
                                         
                                         So there you go.
                                         
                                         So what the engineer is poking at in the bar is in some ways the coverage of what the bar
                                         
                                         can handle.
                                         
                                         So let's talk about coverage.
                                         
                                         What kind of coverage is there?
                                         
    
                                         We think about a program that we're trying to test and we ask, did this thing run
                                         
                                         for a bunch of different things?
                                         
                                         There's lines, whether a particular line of code executed,
                                         
                                         there's branches, which can be taken or not taken.
                                         
                                         We can ask whether loops were executed at least once.
                                         
                                         These are all similar
                                         
                                         to kind of basic block execution counts.
                                         
                                         For functions, we can ask whether functions were called.
                                         
    
                                         Did we get to the function
                                         
                                         entry point? This can also be more interesting for inline functions. Ultimately, the basic blocks
                                         
                                         executing in a sequence form a path. What was the path taken through the basic blocks? The sequence
                                         
                                         of program counters that we were jumping to. Ultimately, there's a big state space in memory.
                                         
                                         For some piece of state, what values did we see that take on?
                                         
                                         And this is similar for value coverage.
                                         
                                         For any given value in the program, what are all of the concrete values we observed for it?
                                         
                                         There's also a categorization called MCDC, modified condition slash decision coverage.
                                         
    
                                         Not to be confused with the iconic rock group ACDC.
                                         
                                         And then there's human injectedinjected conditions of interest,
                                         
                                         which are like cover points.
                                         
                                         Those can explicitly help guide what we're trying to hit
                                         
                                         or look for when we're running the program.
                                         
                                         All right, so why is coverage a zygote?
                                         
                                         Well, ultimately, it's all about quality assurance.
                                         
                                         Is the thing we're making good the software?
                                         
    
                                         Does it do what we think it does?
                                         
                                         I don't know about you,
                                         
                                         but I don't like it when software just doesn't work. And so although that sounds easy and makes sense,
                                         
                                         the details are really tricky. What it actually means to cover code, so for example, with template
                                         
                                         instantiations, is one instance enough? What does it mean to cover a template? Do I have to cover
                                         
                                         all the types or just one type? And that's fine. For type traits, does it even make sense to have coverage
                                         
                                         since it's kind of a meta code or something like that?
                                         
                                         And with constexpr, constant expressions,
                                         
    
                                         if it all executes at compile time,
                                         
                                         can you say the code was really covered?
                                         
                                         And this is getting to the theoretical
                                         
                                         what are we computing and how aspect.
                                         
                                         So there's a way to slice this here
                                         
                                         where we have pure functions,
                                         
                                         which just take values in and produce values out.
                                         
                                         And then there's functions that execute with some state
                                         
    
                                         or some context in the background.
                                         
                                         And things like constexpr
                                         
                                         are really pushing on partial evaluation.
                                         
                                         If this program was partially evaluated at compile time
                                         
                                         versus if it ran at runtime,
                                         
                                         what does it mean in terms of what we ultimately covered?
                                         
                                         The constexpr case is more similar to pure functions that take values in and spit values out.
                                         
                                         There's limited side effects that are possible, and the state is available all in the constexpr evaluation time. In C++, constexpr used to be totally pure, but now we have these things
                                         
    
                                         like transient allocations in C++, so it got more interesting. The full set
                                         
                                         of all possibilities is combinatorial for general programs. So just for perspective, even exhaustively
                                         
                                         verifying a function with a 64-bit input space by executing it is pretty intractable. 32 bits you
                                         
                                         can do in a few minutes, but 64 bits, now you're really in trouble. Some programs like safety related ones might try to have less branchiness by construction,
                                         
                                         but sometimes it's hard to have less branchiness.
                                         
                                         Sometimes the business logic just is what it is.
                                         
                                         Yeah.
                                         
                                         And so we talked earlier about MCDC, modified condition slash decision coverage.
                                         
    
                                         What is that?
                                         
                                         Well, it's a type of coverage that's often used in functional safety software, as you
                                         
                                         were mentioning. So for example, in aviation, they have DO-178, or in automotive, there's ISO-26262.
                                         
                                         And the quick and dirty explanation is that you make sure all independent Boolean combinations
                                         
                                         for each functions have been executed. And you can have tools that look for MCDC coverage and
                                         
                                         give you metrics on that. And we talked about
                                         
                                         combinatorial conditions earlier. Doesn't MCDC just run into this combinatorial problem? Well,
                                         
                                         basically, yeah. And the argument is that for functional safety programs, you shouldn't have
                                         
    
                                         combinatorial space at all, because that's difficult to ensure the safety of.
                                         
                                         And this also gets to, we've talked about what coverage is and what we'd like to know about it,
                                         
                                         but how is it implemented? How do we figure out what was covered? So you effectively do a static
                                         
                                         instrumentation of the program that you're interested in. And then when that dynamically
                                         
                                         executes, you gather up the statistics that your instrumentation creates. So we're going to insert
                                         
                                         things like probe points into the program, and we're going to observe what gets triggered and
                                         
                                         what happens. Or you can also do dynamic sampling, where you run the program in its normal flow,
                                         
                                         but you just interrupt it every so often to see where it was and what it must have run in order
                                         
    
                                         to get there. And that's recording your data as you interrupt it. Yeah. And so you set coverage
                                         
                                         goals of how much coverage you want to meet. Can you even do 100% coverage?
                                         
                                         And how do you do it?
                                         
                                         Do you want to do it with, say, unit tests?
                                         
                                         Or should you just do coverage in system tests?
                                         
                                         That's kind of difficult to choose.
                                         
                                         And there's really good writing online by the SQLite authors.
                                         
                                         We'll put a link in the show notes.
                                         
    
                                         And they have this to say on coverage.
                                         
                                         They have an article, and you look for tensions between fuzz testing and 100% MCDC testing
                                         
                                         in their article.
                                         
                                         And the short of it is
                                         
                                         they basically say that
                                         
                                         they started with 100% MCDC coverage
                                         
                                         to ensure robustness of SQLite.
                                         
                                         But they found that doing that,
                                         
    
                                         having 100% MCDC coverage,
                                         
                                         discouraged defensive programming
                                         
                                         and discouraged having unreachable code,
                                         
                                         right, that you usually write
                                         
                                         in defensive programming. So programming that allows you to defend against the changes of code over time.
                                         
                                         And for a while, they had been doing 100% MCDC coverage, but then they started fuzzing when
                                         
                                         fuzzing became a thing. And lo and behold, they found vulnerabilities. How come? Well,
                                         
                                         they discovered that MCDC and fuzzing are kind of at odds with each
                                         
    
                                         other. They're a balance between ensuring robustness in normal code versus resiliency
                                         
                                         against attackers. But they're kind of also complementary because having high MCDC helps
                                         
                                         ensure that when a bug fix comes in due to a fuzzing report, that bug fix doesn't itself add another bug.
                                         
                                         And so this comparison is a good segue to fuzzing. So fuzzing is basically a million mecha monkeys all trying to put inputs into your program
                                         
                                         so that they can observe good coverage that comes out and see what happens when they put
                                         
                                         those inputs in.
                                         
                                         So there's two kinds of fuzzing that are most commonly used.
                                         
                                         There's generative fuzzing and there's mutation-based coverage guided fuzzing.
                                         
    
                                         So generative fuzzing is generating things that you know fit a particular pattern to
                                         
                                         pass them into this program.
                                         
                                         So it's very similar to defining a parser grammar like we talked about in a previous
                                         
                                         episode, but we run it in reverse.
                                         
                                         Instead of building a tree out of tokens, we take a tree
                                         
                                         definition and expand it until at the leaves we have all the tokens that we want to feed into the
                                         
                                         program. So these nodes expand in their tree form until you hit the leaves, which are also known as
                                         
                                         terminals, and then you spit out what was created and feed it into the program. So generative systems
                                         
    
                                         are similar. You may have heard of Haskell's QuickCheck or the Hypothesis library in Python.
                                         
                                         And these are some of the key examples
                                         
                                         of generative-based fuzz tools.
                                         
                                         Mutation-based coverage-guided fuzzing
                                         
                                         is through libraries and tools
                                         
                                         that you may have heard of like AFL and LibFuzzer.
                                         
                                         So these tools actually keep a corpus,
                                         
                                         a whole bunch of set of examples of stimulus
                                         
    
                                         that they feed to a program,
                                         
                                         and it mutates those to randomly try to increase the coverage metric.
                                         
                                         So they prune out redundant entries, things that are covering the same thing,
                                         
                                         take the interesting ones and perform a genetic style algorithm to cross them over and mutate them.
                                         
                                         And this is tapping into that coverage instrumentation we mentioned.
                                         
                                         They want to know, did I do a good job when I changed this?
                                         
                                         So as it goes down a new path, we find problems
                                         
                                         and it adds it to the set of things
                                         
    
                                         that created interesting input space.
                                         
                                         And this works very scalably
                                         
                                         because it's random and embarrassingly parallel.
                                         
                                         So we just need to exchange coverage findings
                                         
                                         between different things that are exploring parallel paths
                                         
                                         periodically to avoid too much redundant
                                         
                                         work from happening. Yeah, and you mentioned libfuzzer. It has an interesting mode tuned for
                                         
                                         people using protobuffs in particular. It takes an arbitrary proto with fields and can do some
                                         
    
                                         generative expansion, so defining an action space for the mutation. And protobuff contents are the
                                         
                                         thing that get fuzzed based on the proto schema. And it's nice because then
                                         
                                         you don't have to teach the fuzzer about your APIs, because sometimes half growth buff has their API
                                         
                                         for everything. And that's convenient. And related to that, Google has cluster level fuzzing for
                                         
                                         things like open source projects. And there's an industry talks on YouTube about OSS fuzz,
                                         
                                         which we'll link to in the show notes. And it's really fascinating that you can take this interface,
                                         
                                         like just give me stuff,
                                         
                                         and coverage-guided fuzzers can find the interesting things from nothing,
                                         
    
                                         from just the interface,
                                         
                                         and find traps and signal and address certain types of violations.
                                         
                                         Yeah, and again, back to how they do that,
                                         
                                         they're getting the information from instrumentation
                                         
                                         that they put into the program.
                                         
                                         So they want to know what branches get taken.
                                         
                                         They want to know what values might have caused you to go down a particular direction.
                                         
                                         And they use that to change the inputs to the program, trying to figure out what changes
                                         
    
                                         that they make on the inputs will cause them to get further into the program execution
                                         
                                         through the branches to the really interesting paths where those violations and traps and
                                         
                                         signals might occur.
                                         
                                         So what is fuzzing even looking for here?
                                         
                                         When we turn on sanitizers,
                                         
                                         they can look for things like undefined behavior,
                                         
                                         use after freeze, crashes, terminations,
                                         
                                         and all these kinds of things that indicate
                                         
    
                                         something didn't go quite right in your program
                                         
                                         and probably you didn't want that to happen.
                                         
                                         So fuzzing goes great with sanitizers
                                         
                                         because it increases the set of things you can find
                                         
                                         instead of just kind of silently trucking on
                                         
                                         in your program execution.
                                         
                                         Sometimes they also find things
                                         
                                         you don't care about as much,
                                         
    
                                         like your stack might grow overly large
                                         
                                         in recursive programs,
                                         
                                         or integers might overflow innocuously,
                                         
                                         and you have to kind of develop techniques
                                         
                                         to prune out those things
                                         
                                         that you don't care about as much.
                                         
                                         Helpfully, these toolkits also provide tools like minimizers, where if it finds a stimulus that identifies a failure, it can try to prune down that input space and still take a similar path to that similar failure.
                                         
                                         This is similar to kind of delta debugging techniques we've talked about before, where it can kind of minimize test cases with something being held constant like the failure that occurred.
                                         
    
                                         Yeah, and so you described fuzzers kind of as mecha monkeys earlier, and some fuzzers can be pretty dumb overall in how they mutate inputs versus, you know, some of them can end up being pretty smart.
                                         
                                         You can go all kind of generational genetic algorithm on them with a bunch of inputs saved and mixed over time. But really, fuzzers mostly mutate previous inputs based on branch coverage information
                                         
                                         to try to explore new branches that they haven't covered before.
                                         
                                         And they're surprisingly good at this, exploring branches and switches,
                                         
                                         even for things like image formats, which have structured formats with headers and then more data.
                                         
                                         So in a way, they're smart, and in a way, they're not.
                                         
                                         And in fact, if you look at EFL's logo, we'll put a link in the show notes,
                                         
                                         it's pretty cute. It's originally a JPEG from Alice in Wonderland, so kind of a bunny rabbit or something very fuzzy. And the author of AFL turned that JPEG into a GIF, or is it a GIF? I
                                         
    
                                         don't know, showing each step of the fuzzer changing the image, one change per frame.
                                         
                                         So each frame of the GIF is one mutation, basically.
                                         
                                         And the fuzzer can, through its branch coverage, understand the structure of the image format
                                         
                                         and perform mutation based on that structure.
                                         
                                         And if you look at that picture, it's oddly relaxing to watch that logo being fuzzed over
                                         
                                         time.
                                         
                                         I just have to point out that choosy programmers choose GIF.
                                         
                                         So fuzzers do particularly well, as we're saying, when you feed them a corpus of information that
                                         
    
                                         they can then mutate. So say you scraped all the unit tests from your software program that caused
                                         
                                         errors in the past, like say every GitHub issue turned into some unit test that checked for
                                         
                                         regression and came from some resolved issue. That could be used as your seed corpus to help them figure out how to reach
                                         
                                         interesting places in the program easily.
                                         
                                         It's a nice map of where to start.
                                         
                                         And then they do a genetic algorithm style thing where they can cross over
                                         
                                         the interesting parts of examples with each other or mutate them similar to how
                                         
                                         genetic algorithms take bit patterns and do crossovers and mutations
                                         
    
                                         in addition to trying to sample new things from the underlying space. similar to how genetic algorithms take bit patterns and do crossovers and mutations,
                                         
                                         in addition to trying to sample new things from the underlying space. Separately, you can also provide suggestions on which mutators might be most useful for your program. Things like strings
                                         
                                         of repeated text maybe would be very useful for a generalized parser, but maybe not so much for
                                         
                                         the internals of some other system. That's just an example.
                                         
                                         The problem with corpus is ultimately that you have to remember how to pluralize it.
                                         
                                         Corpi, corpora, corpoxin.
                                         
                                         I never remember what the right way to do that is. I think it's from Greek, and so it's corpots, technically.
                                         
                                         So anyways, we talked about using sanitizers to find undefined behavior.
                                         
    
                                         So why are we looking for undefined behavior?
                                         
                                         Well, oftentimes, UB is where unexpected things live.
                                         
                                         Now, I want to say not all UB is really actually bad,
                                         
                                         but if you're trying to fix potential issues,
                                         
                                         then removing all UB can tend to be valuable.
                                         
                                         But it doesn't help to find problems
                                         
                                         in your program's higher level behavior.
                                         
                                         Beyond the language level UB,
                                         
    
                                         your program has behavior as well.
                                         
                                         The programmer knows what their requirements are and what is invalid state for their program.
                                         
                                         And that's why defensive programming is useful when the programmer adds checks for states
                                         
                                         that's invalid in their own program and calls abort when such a state is reached.
                                         
                                         This type of check is useful because the programmer tells the fuzzer what to look for.
                                         
                                         Basically, the fuzzer is told, hey, like try to hit this abort and the fuzzer is going look for. Basically, the fuzzers told, hey, like try to hit the support
                                         
                                         and the fuzz is going to try to find path to the support being called.
                                         
                                         Right.
                                         
    
                                         And you can also add some rejection sampling before you even feed your program.
                                         
                                         You could say, no, this didn't look like an interesting case sample to feed,
                                         
                                         which can also help with that.
                                         
                                         Really, there's two styles of fuzzing that we are touching on here.
                                         
                                         There's binary modes where imagine you've got a
                                         
                                         binary and you don't have the sources and you want to observe what its behaviors are, or you've lost
                                         
                                         the sources or something. You can actually rewrite or observe the binary as a binary instrumentation
                                         
                                         versus in full source available instrumentation mode, you can actually recompile and emit extra
                                         
    
                                         code and extra pieces of assembly into the output artifact.
                                         
                                         So in a way, binary mode ties back into the history of dynamic binary instrumentation tools
                                         
                                         like Valgrind and even static instrumentation tools like Atom. Whereas static and code available
                                         
                                         instrumentation is how the modern sanitizers work like Address Sanitizer, UBSan, MSan.
                                         
                                         So fuzzing uses randomness to explore a format and branches,
                                         
                                         but there's diminishing returns at some point.
                                         
                                         How do we maximize the exploration of the program state
                                         
                                         while minimizing the types of inputs that have the same outcome?
                                         
    
                                         That's one of the key things that this fuzzing infrastructure tries to answer
                                         
                                         as it handles its corpus of examples.
                                         
                                         On the topic of fuzzing binaries,
                                         
                                         it's kind of cool that you can fuzz an implementation
                                         
                                         you can't see at all,
                                         
                                         similar to when you have a binary,
                                         
                                         but you don't know what the source is.
                                         
                                         There's actually fuzzing of ISAs as well.
                                         
    
                                         An ISA fuzzing can find things like unpublished instructions.
                                         
                                         There's a cool Black Hat talk
                                         
                                         called Breaking the x86 Instruction Set
                                         
                                         that found undocumented instructions
                                         
                                         that we'll link to in the show notes.
                                         
                                         Yeah, and we're really scratching the history of fuzzing a bit in these last few
                                         
                                         points that we made. And on that topic, Thomas Doolian had a keynote on fuzzing success recently,
                                         
                                         and we'll put a link in the show notes. It's really interesting to watch.
                                         
    
                                         So let's change the topic a bit and segue to hardware, because you talked about hardware
                                         
                                         just now with ISAs. There are strong ties between fuzzing software and hardware design verification. So you don't want to have bugs in hardware, even
                                         
                                         less in software, because in hardware, that'll cost you millions of dollars to fix. You'd better
                                         
                                         make sure it's correct before going to a fab and making a chip. And the problem with hardware is
                                         
                                         that you have deep and giant stink space with all the flip-flops and stateful elements on the chip.
                                         
                                         And what's been done for a long time in hardware is to use formal methods and tools like SAT solvers to mathematically prove properties of the hardware. And in those circumstances,
                                         
                                         when you can prove correctness of some properties, you don't actually need fuzzing because you have
                                         
                                         a proof of the entire state space. So what that means is that fuzzing is basically trying
                                         
    
                                         to get a dynamic execution to something unexpected,
                                         
                                         whereas formal methods are trying to mathematically prove
                                         
                                         a property from all possible inputs.
                                         
                                         So what's been done with hardware for a long time
                                         
                                         can also apply to software.
                                         
                                         If you can use formal methods,
                                         
                                         even on a subset of your program,
                                         
                                         then fuzzing isn't needed for that subset.
                                         
    
                                         You can use a SAT or SMT solver and prove whether conditions are reachable at all in any possible execution.
                                         
                                         Now, fully inductive proofs are quite difficult to close,
                                         
                                         and I don't know enough about how inductive proofs to really understand when they don't work,
                                         
                                         so it would be fun to learn more. Now, one thing that you can
                                         
                                         is turn subsets of a compiler's IR into SMT inputs. So you can't do that easily for all
                                         
                                         programs and all programming languages, because there's often things left as explicitly undefined
                                         
                                         in there. But it's also a problem when you try to create hardware from a language that has
                                         
                                         undefined behavior, which many or most of them do. So imagine a shift operation that's undefined when overshifting.
                                         
    
                                         What kind of hardware gates do you create?
                                         
                                         What comes out and what gate patterns are acceptable if you overshift?
                                         
                                         If you end up statically or dynamically overshifting?
                                         
                                         Well, clearly, you know, you should care about this.
                                         
                                         Yeah.
                                         
                                         And so I think on the topic of formal methods, they make proving properties of very large spaces more tractable. So think about the difference between a 32-, it's like 90 years to test all those values,
                                         
                                         even for a very, very small, simple function. But we can prove properties of even functions
                                         
                                         with large input spaces because we find symbolically what the computation is trying to do
                                         
    
                                         and use the properties of what we're doing inside of the computation. That's how these formal
                                         
                                         methods work at proving things that we couldn't do exhaustively by testing all the possible inputs. This also relates to a system called TLA+, which
                                         
                                         was originally by Leslie Lamport, which is a formal modeling language and runtime that people
                                         
                                         use for proving properties of important algorithms, including in hardware construction and distributed
                                         
                                         and consensus-like algorithms like Paxos you may have heard of.
                                         
                                         Oftentimes people will translate finite state machines, which are little automata,
                                         
                                         into a TLA plus model and prove properties on that in abstract form. So they won't necessarily do it directly on Verilog, say, but they'll create a model and then prove properties of the model.
                                         
                                         We're going to link to Hello Wayne's talk on getting started with TLA plus,
                                         
    
                                         which helped me get started writing some models in that way.
                                         
                                         All right. So now that you've mentioned Leslie Landport, friend of the show, let's try to jump into quickfire topics on fuzzy because I think we're not information dense enough yet.
                                         
                                         Yeah, we need more.
                                         
                                         So one thing that really blew my mind was also this paper called the Silly Fuzz paper, and it introduced this notion of fuzzing by proxy.
                                         
                                         So the idea here was that we could fuzz something like QEMU
                                         
                                         using coverage-guided fuzzing and build up a corpus.
                                         
                                         And a lot of the complexity and coverage
                                         
                                         that we observe in the simulator
                                         
    
                                         inside the QEMU implementation,
                                         
                                         which is effectively an emulator, right,
                                         
                                         for the hardware architecture,
                                         
                                         that corpus
                                         
                                         is still very useful as a proxy for the complexity and the coverage that would happen inside of a
                                         
                                         silicon implementation, even if it's implemented totally differently. One is software, one is
                                         
                                         hardware, but the corpus is very interesting and reusable even across those domains.
                                         
                                         All right, so different topic. Let's talk about compiler instrumentation.
                                         
    
                                         So part of how coverage-guided fuzzing works
                                         
                                         as well as it does
                                         
                                         is that the compiler and native code generation
                                         
                                         are in on the game.
                                         
                                         And instrumentation makes callbacks to the fuzzing runtime
                                         
                                         and that tracks information in a data structure.
                                         
                                         And doing this in the process
                                         
                                         allows the fuzzing process to run a lot faster
                                         
    
                                         than if it was going on outside the process.
                                         
                                         And in that, maximizing the frequency with which you run the fuzzer program is also a big part of the game, being able to iterate linearly.
                                         
                                         Basically, coverage recording flags can control where and how to add pseudo-branches.
                                         
                                         So consider if you're branching on an arithmetic expression that says something like, you know, x is greater than 32. Well, maybe you still want to know what bits are set to x beyond
                                         
                                         the fifth one, and include those in the things that you try to cover. Yeah, another interesting
                                         
                                         thing in this space is fault injection. So sometimes to hit rare cases, you have to inject
                                         
                                         faults that get you down some path that you would never expect to happen in practice.
                                         
                                         It might be very difficult to craft some input stimulus to get you exactly in that mode.
                                         
    
                                         So in a way, this goes back to our not taking branches podcast from a few weeks ago.
                                         
                                         How do we force our way towards those exceptional circumstances that are being handled in the cold paths of the code?
                                         
                                         So you need enough seams.
                                         
                                         Seams is a really nice word.
                                         
                                         It's some way to reach in and grab a piece of your program and control what it needs to be able to do.
                                         
                                         You need enough seams in your program to enable fault injection-based testing. Sometimes people
                                         
                                         will do this with mocks or fakes. You have to design the software ultimately so you can easily
                                         
                                         do that reaching in and causing of something atypical to happen.
                                         
    
                                         Michael Feathers has a great book on testing patterns that really hammered home this word seems for me. I did really enjoy his book working effectively with legacy code.
                                         
                                         Yeah. So I have to chime in and mention my coworker and friend of the show, Robert Secord.
                                         
                                         He has a book called Modernizing Legacy Systems. It's from 2003 and I haven't read it, but Robert
                                         
                                         will be super happy that I mentioned his book.
                                         
                                         Oh, thanks.
                                         
                                         I won't read it either.
                                         
                                         Nice.
                                         
                                         So another quick fire topic here,
                                         
    
                                         classic software question around assertions and check failure.
                                         
                                         So the question here is,
                                         
                                         is it better to crash or to keep on trucking when something bad happens?
                                         
                                         So it typically depends on how unrecoverable and impactful your invariant
                                         
                                         failure is and what the potential impact of crashing versus trucking
                                         
                                         on in an inconsistent state might be.
                                         
                                         So what we've seen is that different teams and different industries define different
                                         
                                         value functions around this question.
                                         
    
                                         So if you have a big website, you don't want someone to be able to send you a
                                         
                                         packet of doom or something like they just send a packet and your entire website goes
                                         
                                         down. And even with a supervisor process that crashes and recovers, it's possible
                                         
                                         to cause a crashing when there's something like a packet of doom.
                                         
                                         And in industries like functional safety, there's this thing called a hazard analysis
                                         
                                         and risk assessment, AHARA.
                                         
                                         And that's something that's written to see what program failures could be
                                         
                                         and how to react to failure.
                                         
    
                                         And often what you do with that
                                         
                                         is you end up designing a supervisor
                                         
                                         or something that can handle the failures
                                         
                                         and it forces the design of your software
                                         
                                         to handle the problem,
                                         
                                         including if something causes a crash
                                         
                                         and if you have a crash loop and so on.
                                         
                                         So reaching up towards the top shelf of some of
                                         
    
                                         the interesting techniques and research in this space, there's something called concolic execution,
                                         
                                         concolic testing. And the word there is a mix of the word concrete and the word symbolic.
                                         
                                         So the idea is that you throw concrete input stimulus at your device under test and you
                                         
                                         observe what happens. But then you say, hey, I saw that
                                         
                                         this condition never triggered. Can it ever trigger? And I'm just not finding the right stimulus.
                                         
                                         So when you ask yourself this question, you actually go and try to back solve symbolically.
                                         
                                         This is where the symbolic comes in. You try to find if that condition is even possible to reach,
                                         
                                         or if it is possible to reach, what is the cone of inputs in the input
                                         
    
                                         space that will cause that condition to be true? So you can see you're flipping between concrete
                                         
                                         stimulus and then back-solving symbolically for interesting things in the input space.
                                         
                                         So if you do find something, now you've found a great thing to add to your corpus.
                                         
                                         And if you find that it's impossible to actually make this condition true,
                                         
                                         you know that that condition is effectively dead and you can flag that and not care about reaching it anymore.
                                         
                                         Yeah.
                                         
                                         And so like you described, you know, can I even see this condition type of thing, this type of testing?
                                         
                                         It's also related to bounded model checking.
                                         
    
                                         So imagine your program has like a top level loop somewhere, like a wild one.
                                         
                                         And you can ask successively, can the bad thing happen if I
                                         
                                         run the loop just one time, right? You can try to prove like if I run the loop one time, can I make
                                         
                                         the program fall flat on its face? No. Okay, well, what if I run this loop two times? And you say,
                                         
                                         okay, like maybe that can't happen two times. And as you brute force unroll and try to prove,
                                         
                                         unroll once, twice, three times. You gain more confidence in whether
                                         
                                         the condition seems reachable with a reasonable amount of stimulus and time steps. So you just
                                         
                                         usually hard code a limit on unrolling and give up once you reach the limit. Or in some cases,
                                         
    
                                         you might try to prove the equivalence of the loop unrolled a few times or something like that.
                                         
                                         And so if you have a symbolic representation of the valid state of your program,
                                         
                                         you can kind of try to fast forward to some reachable state by changing the value and then walk forward a few time steps from there.
                                         
                                         Yeah, it's kind of like a fast forwarding technique.
                                         
                                         Yeah.
                                         
                                         So I'll point out that Joe Armstrong of Erlang fame, who is the inventor of Erlang, and may he rest in peace, has a great talk called the mess we're in. So it talks about how QA being so
                                         
                                         difficult really relates to the huge state space that we have in modern systems and that they're
                                         
                                         often not super well-defined, at least in a formal sense, and that we're trying to revise designs and
                                         
    
                                         rev them and increase and iterate on them with lots of features over time. And so it indeed does
                                         
                                         look like a mess in many ways,
                                         
                                         but it is our mess. This is the stuff that we work with. This is the medium we work in.
                                         
                                         And so with fuzzing, we're actually solving our carefully constructed mess of the software we
                                         
                                         wrote using these focused beams of random mess. Beautiful in its own way to use mess to try to
                                         
                                         solve mess as we try to find ways of doing things better. All right. Well, on that note, I'm glad that we got to record this episode.
                                         
                                         I really hope that folks enjoyed it.
                                         
                                         Please join us next week for our next podcast episode.
                                         
    
                                         We'd like to thank our sponsors, but we don't have any, and don't forget to like
                                         
                                         and subscribe, tell your friends and go fuzz something.
                                         
                                         Are you telling people to go fuzz themselves?
                                         
                                         Basically, I guess I did.
                                         
                                         All right.
                                         
                                         All right.
                                         
                                         See you next week.
                                         
