Computer Architecture Podcast - Ep 21: High-assurance Computer Architectures with Dr. Caroline Trippel, Stanford University

Starting point is 00:00:00 Hi, and welcome to the Computer Architecture podcast, a show that brings you closer to cutting-edge work in computer architecture and the remarkable people behind it. We are your hosts. I'm Suvinai Subramanian. And I'm Lisa Shue. Our guest on this episode is Dr. Caroline Tripple, an assistant professor in the computer science and electrical engineering departments

Starting point is 00:00:18 at Stanford University. Caroline's research operates at the critical intersection of hardware and software, focusing on developing high-assurance computer architectures. Her work tackles the challenge of ensuring that complex hardware designs are both correct and secure. She has pioneered automated tools that bridge the gap between a processor's implementation and its formal specification, as well as frameworks and compilers that find and mitigate hardware-related security vulnerabilities in software.

Starting point is 00:00:45 This was another episode where we dug really deep into technical nerdery with our guest. But never fear, we also discussed her journey to where she is now and how we nearly lost her to medical school instead of computer science. Suveni and I learned a lot, and I hope our listeners do too. And with that, let's get to the interview. A quick disclaimer that all views shared on the show are the opinions of individuals and do not reflect the views of the organizations they work for. Airline, welcome to the podcast.

Starting point is 00:01:19 We're so happy to have you with us. Thanks a lot for having me. I'm really excited to be here. Well, we're excited to talk with you. So long-time listeners know we're going to start with what is getting you up in the morning these days. Thanks again, Stephen and Lisa, for having me. So it's the summer quarter over here. We're in a quarter system.

Starting point is 00:01:38 And as I think both of you from some of our recent email correspondence, it's been a deadline summer for me. So mostly working on whatever paper has been due next has been getting me out of bed in the morning. But it's been pretty energizing. So I've been working with several of my students. students to write papers on what I think is some really exciting research we've been doing over the last months to year or more. And it's been really fun to help them package their work into these compelling narratives with nice motivating examples and putting finishing touches on taxonomies and formal proofs, which a lot of my papers tend to have and distilling down

Starting point is 00:02:19 comparisons of related work. And so it really makes the work feel fresh, I would say, despite maybe we've been working on it for some of these problems for a while now. Another thing over this summer, I've been working with one of my students to try out some of the hardware verification tools we've been building on industry designs. And the idea is for us to try and understand what's their current impact and how can we improve them. So in general, which I think we'll talk about today, my group's been trying to build an ecosystem of approaches and tools so that we can design and verify what I call highest. insurance, computer architectures. And it's been really exciting over the summer, I would say in particular, to see a lot of these pieces come together in a real way. But yeah, mostly, I just feel really lucky to be able to ride my bike to my office in the morning. And I get to work with amazing

Starting point is 00:03:10 students on hard technical problems in the sunshine. It's pretty exciting. Yeah, well, we can see the sunshine through your window right now. And so, yeah, it's been really nice. Lucky you, lucky you. So tell us a little bit more about this ecosystem that you're building. It sounds like spans multiple prompts, methods, tools, techniques. So tell us a little bit more about that. Yeah, I would say I kind of divide the work we've been doing into roughly three parts. So one is designing new hardware software interfaces. So actually the overarching goal of my group's work is to try and design computer architectures that are high assurance, and by that I mean correct, secure, and reliable. And so one of the first observations of our work is that

Starting point is 00:04:01 our existing hardware software interfaces, like our traditional or canonical instructions set architectures, they don't contain all of the details that you would need in order to really rigorously reason about how hardware may undermine software assurance guarantees or desired software assurance guarantees. And so one of the things we work on is trying to build new hardware software contracts that expose a little more information about hardware to software in a way that's usable. Like maybe I'm going to expose information to software about how hardware might leak the data processes via hardware site channels.

Starting point is 00:04:39 And so we want to come up with these new hardware software interfaces. And then we also want to come up with hardware software interfaces that are useful and usable. so by programmers or compiler writers. So you could imagine coming up with a hardware software contract that's basically giving programmers RTL, which would not be so easy to use. And so in order to try and demonstrate that the contracts we've developed are useful,

Starting point is 00:05:02 will develop compilers and software analyses that are parameterized by them that can find and repair hardware-related bugs in programs. And then I would say the third thing, so I kind of couple those two together, designing new contracts, software analyses that use them. them. A second thing is we want to verify formally that the hardware actually upholds the new hardware software contracts that we design plus existing ones. There's some pretty

Starting point is 00:05:28 staggering statistics about a recent, it was actually 24 Siemens EDA study, cited something like over 85% of hardware projects have critical bugs that escape to silicon despite us spending over half of the design effort on verification. So we're coming up with new ways to democratize, automate formal verification of hardware with respect to new and existing hardware software interfaces or contracts. And then the third piece is we might be able to design some fancy new interface and formally verify that the hardware implements it. But because the hardware wasn't designed with that sort of specification in mind, even if I can write my code to run in a way that's secure on this hardware, it might be very slow.

Starting point is 00:06:16 So then we start to get into, how can we co-design hardware and software with these sorts of high assurance guarantees in mind? So roughly those three areas, yeah. Yeah, that's a great overview of the space that you're working in. You sort of delineated them into like the specification challenge very broadly. And then we have the analysis, verification methods, and the frameworks. And that could eventually sort of lead to new hardware designs as well.

Starting point is 00:06:40 So you highlighted that this is not a new problem. Like verification has been an age-old challenge. Of course, there are new dimensions in which hardware and the entire systems industry is moving. Perhaps we can sort of explore this a little more, some of the age-old problems in compute architecture that deal with the specification challenges, memory consistency models. So those have been around for maybe decades now. And there are several aspects to sort of reasoning, specifying memory consistency models and also sort of verifying them. Could you sort of talk us through your work on how do you think about specifying these contracts,

Starting point is 00:07:12 between hardware and software, and importantly, like, what are the techniques that you've developed in, but of automating this workflow and getting, like, better, like, higher degrees of correctness in the software that you design. Yeah, definitely. Software or software that you design. Yep. Yeah, so maybe I can start with the point you made, which is verification. We've been doing it for a long time, and it's still a bottleneck in the hardware design

Starting point is 00:07:34 process, and why is that? So the way we do verification today of hardware designs is that we rely on. on experts to formally specify desired goodness properties of our hardware. So they handcraft these formal properties or formal specifications of what my hardware is supposed to do. And they do that by discussing with designers. So they're sort of like this natural language exchange with designers to understand what the hardware is supposed to do. The verification engineers will write some formal properties to express those goodness criteria. And then they're going to go and try and use, I focus on formal verification side of the world. So then they'll go and try and use

Starting point is 00:08:17 these automated formal tools, especially tools called model checkers, which under the covers are basically generating a bunch of satisfiability queries and offloading them to a SAT solver. We're going to try and use these model checkers to assess the design's adherence, so some system barologue RTL's adherence to these formal properties. And what ends up happening is because we're kind of taking this top-down approach where we have a requirement for our hardware. We translate it to a formal specification, and we map that specification down to a property or properties over RTL signals. The resulting properties end up being very complex. So they require these automated formal tools to essentially solve a really challenging problem. So because these automated formal

Starting point is 00:09:08 tools usually can't check those properties directly. Maybe they can't even elaborate the full design that the property is expressed over, or minimally they can arrive at, say, a bounded proof with some sort of shallow cycle depth that the property is upheld by the design. The verification engineers will spend a lot of time manually decomposing both the property and the design itself in order to scale verification. So to give you a memory consistency model specific example, an order for me to prove that some processor implements its memory model correctly, perhaps I have to prove, perhaps this is a total store order, which is the memory consistency model implemented by Intel processors.

Starting point is 00:09:51 So eventually I need to prove that two stores in program order, two stores that are fetched in order, will update the memory in order. And that's sort of a property, even for a simple three-stage risk five processor design, will take the commercial model checkers 10, 11 plus hours and they'll still have bounded proofs of correctness. They won't be able to prove that the hardware actually upholds this property

Starting point is 00:10:18 that two rights in program order, update the memory in order. But instead, if we can decompose that property into a bunch of simpler sub-properties like two rights in program order, update the decode stage pipeline register in order. And two rights that update the decode stage pipeline register in order, update the right-back stage pipeline register in order, update the right-back stage

Starting point is 00:10:36 pipeline, register, and order, and so on. We can decompose into these smaller, finer-grained properties. Then we can actually get a model checker to verify that two rights and program order, update the memory order in a matter of seconds to minutes. And the issue is that for commercial scale designs, not like my three-stage toy processor, this gets much more difficult and much more intractable and ends up requiring a lot of manual expertise. So that's kind of the why is it hard today, I would say it's because of this top-down approach that requires a lot of manual expert effort, both to write properties and make them checkable. One thing that we've been doing along a few different axes is trying to eliminate the manual property writing effort, or at least significantly

Starting point is 00:11:22 reduce it, by automatically generating properties and automatically decomposing them. And the reason we're able to do this is not because of magical reason necessarily, but we're trying to specify the verification procedures by domain. So we're saying, okay, for memory consistency model verification in particular, we know how to design a procedure that can automatically generate properties from some handful of property templates and static analysis of the net list for the design. We can offload those properties to a model checker and check them. And then based on the results of those queries, we can determine if the processor implements the memory model correctly.

Starting point is 00:12:06 And I can go into a little bit more detail there, but I would say that in general, it's hard because of all this manual effort involved. And the fact that most verification is sort of framed as this general purpose problem. And we've been trying to make it a little bit more domain-specific and architecture-aware. And in doing that, we can automate a lot of this process.

Starting point is 00:12:23 Got it. Yeah, I think that was a good distillation of maybe you talked about the traditional top-down approach versus maybe in your style of work, there's a slightly more bottoms-up philosophy to approaching the problem. And there is the philosophy, and that bleeds into the approach itself, which is you're able to take RTL netlists or very log designs along with the templates or recipes that check a bounded set of properties within a domain and sort of expound on that. And I can elaborate maybe on that a little bit more, this bottom-up idea just to clarify that.

Starting point is 00:12:54 So, yeah, I think there's this idea of starting with some specification, automatically decomposing it. The way we've been phrasing this really is a bottom-up approach to verification, and that what we're doing is we're automatically generating a bunch of these really simple, low-level properties to evaluate on some design, kind of of the flavor that I was mentioning to you. And then we compose together the model checker results to yield some abstract description of like, what is the memory consistency model that the processor actually implements? We can basically use formal tools to reverse engineer from an implementation, what is the memory consistency model specification that implements, and we're targeting specifications that are axiomatic

Starting point is 00:13:39 memory model specifications, which essentially you can think of them as a set of rules about the ordering and visibility of updates, accesses to shared memory. And so we reverse engineer using formal tools, a description of what is the memory model. model specification that my processor actually implements. And then we can just syntactically compare that with what we expected and to see if we have bugs in the implementation or not. Okay. So I feel like one thing you were saying before is that the difficulty with the traditional way of doing things is all this manual expertise. And the leap that I want to try and understand better is you're taking it from a slightly different direction and automating it. But still somebody somewhere has to

Starting point is 00:14:26 have the expertise to be able to say this is correct or not so how is that expertise getting sort of removed from the manual effort like when you when you decompose into these smaller pieces like somebody needs to know what the logical smaller pieces are and then like right now you were just saying okay now and then we we have it do this thing like what is it actually doing compare it against what we're expecting somebody has to know what they're expecting as well so net net in the end like i guess what i'm wondering is like How does the expertise divvying up then reduce the total amount of expertise required? Let me mention one thing, and then maybe I can take this from a little bit different angle.

Starting point is 00:15:06 So one is actually no one needs to ever inspect these properties that we're generating or understand their outputs. So there's no human looking at like low level. They're written in temporal logic or system varilogue assertion properties. But maybe I can tell you how this approach was first inspired. and then I, maybe that will make it a little bit more clear. So some of the work that I did during my PhD, which built on work from my PhD collaborators, was verifying hardware with respect to,

Starting point is 00:15:40 verifying that say hardware implements its memory consistency model with respect to an abstract specification of hardware. So a very common technique to try and make any verification problem more scalable is I'm going to take the system I'm verifying and throw away details that are not central to my verification problem. Like, for example, in my processor, I don't really care if the multiplier works correctly if I'm trying to verify that two stores in program order update the memory and order. So I want to throw away implementation details, not central to the verification problem

Starting point is 00:16:14 that I'm trying to solve. And so this work that I did during my PhD that built on other work was verifying that this abstract model of a processor implemented its memory model correctly. And this abstract model of a processor was something we called a new spec model, which stands for microarchitecture specification, originally developed by my PhD lab mate, Dan Lustig. In some of our recent work, we've kind of taxonomized these new spec models as containing two types of things. So these new spec models are axiomatic models of a processor. What that means is they basically contain a bunch of of first order logic rules that describe the state update and ordering behavior of instructions

Starting point is 00:17:01 when they execute on a processor design or a microarchitecture more generally. And so, okay, what do I mean by state update and ordering behavior? One thing these axioms describe is what are the different microarchitectural execution paths that an instruction could take through a design? generally we say an execution path or a microarchitectural execution path is a partial order on the microarchitectural events that are associated with the instructions execution and the micro you can visualize this actually as a directed graph where the events are nodes and the partial order are these directed happens before edges

Starting point is 00:17:40 lamport style happens before edges and so in a lot of our work recently we define a microarchitectural event is a set of state updates in some cycle that an instruction performs. So you could think about my instruction, updates the fetch stage pipeline register, and then the decode stage pipeline register, and so on and so forth that updates some state in a store buffer. Throughout its execution, I might have a load that writes to the L1 cache and the cash miss versus it doesn't write to the L1 cache on a cache hit and so on. So there's different execution paths, instructions can take through the design. So that's one, this Muspec model, one thing it lists, it contains or a set of rules for how to construct these

Starting point is 00:18:22 microarchitectural execution paths for different instructions. The other thing that they contain is information about the interactions between the execution paths of simultaneously executing instructions. And we generally say they can interact in a couple different ways, which are structural interactions. Like they both update the same physical set of state elements. And so they have to do those updates to those same state elements in some order. Maybe it's an order that agrees with program order, for example. Or there can be data flow relationships between instructions. For example, one instruction update some set of state elements that influences the state

Starting point is 00:19:01 update of another instruction to some other set of state elements. We have these new spec models. They essentially tell you how you can. They describe a set of rules for modeling hardware-specific program executions as these directed happens before graphs where nodes are these hardware events. Edges are ordering relationships between the events. And we know that given one of those models, we can prove that some higher level memory consistency model is implemented correctly. This was this prior work that I was alluding to. And so some of the first work I started doing when I started my faculty position was

Starting point is 00:19:38 trying to understand can we bridge the gap between these abstract models we know are so useful for doing memory model verification and RTL, which is like ultimately what we want to prove is implemented correctly at the end of the day. And so what we did is we observed that if you took this top down approach even for like, we have this new spec model that's still finer grained than like two stores update, two stores in program order update memory in order. It's still finer grains than that. But if you take these, because it's abstract enough to enable these top level analyses, it's still too abstract to map down directly to these formal properties and try and check them on even simple designs and get scalable proof results. So we said, what if instead of trying to do

Starting point is 00:20:24 this top-down mapping, we tried to like synthesize or reverse engineer one of these MewSpeck models from RTL? And what we ended up doing was saying, okay, if we want to do that, we can construct a finite set, like a handful of these property templates and use these property templates instantiate them in a bunch of different ways to uncover all the execution paths that each instruction can take through the design and uncover all of the different types of structural

Starting point is 00:20:52 and data flow interactions between pairs of instructions once we've gotten all the execution paths. And so the experts, if you will, that are writing down these property templates are us and we write them down once. And then we know they always work for this new spec model synthesis problem. and they're designed for that problem.

Starting point is 00:21:08 So if you want to verify that the multiplier is correct, then this is not what I'm describing now is not the flow for you, but you want to verify that the memory consistency model is correct, you can essentially supply some inputs to our tools. So we need to know something like, where do instructions first materialize in the processor? Where can I find the instruction fetch buffer

Starting point is 00:21:31 or instruction fetch register that holds instruction encodings when they're first fetch from instruction memory so that my formal tools can recognize a load versus a store and so on. So we need to know a little bit of this information about the input hardware, and so an expert designer will tell us that. But then the idea is that we will alleviate

Starting point is 00:21:49 the burden of writing properties, will automatically generate them from templates in order to synthesize this abstract model that we know top-level memory model verification techniques can be built on top of. And I'll pause here, but essentially, we took that idea and ran with it, for some other application domains besides memory model verification because I'll say like originally

Starting point is 00:22:09 when we started this, it was not a performance optimization. We just felt we wanted to alleviate more manual effort in deriving these MewSpec models from RTF versus trying to map these MewSpec model axioms down to properties ourselves. But what we end up finding was the way we were doing this synthesis was essentially generating like many, like maybe for lowish complexity, like 11,000 flip-flot-bit-ish six-stage pipeline. We might generate like 120 plus thousand properties to synthesize execution paths for every instruction. But each one of those properties is very simple and cheap to check.

Starting point is 00:22:49 And so while we might have some properties, and we can talk about those if we have time, that like model checkers still struggle with because of certain structures that are difficult for them to deal with, we can talk about what those are, but we've kind of distilled the hardest pieces of a very, verification problem into the small fraction, maybe 10% of properties that are trickier for a model checker, but 90% of them are very easy. So it turned out being a way, it turned out that it was a way

Starting point is 00:23:15 for us to decompose these really hard verification problems into many simple little baby steps. And actually, I think my mental model now is somewhat that, because like we're talking about expertise, right? Like in a certain way, you can't get rid of it. Like you're just kind of moving it somewhere else. So in a way, it sounded kind of like rather than record. requiring in the high-level software work, rather than requiring Ninja Code Warriors, like, for everything, you essentially created, like, a middleware layer. Like, you put all your ninjas on the middleware. Correct. And now you can iterate faster on the higher level stuff without requiring that kind of expertise. Yep. Yeah, exactly. Like, we want to, like, require the expertise once

Starting point is 00:23:54 and then be able to amortize the benefits of the expertise multiple times. Makes sense. Okay. Thanks. Yep. Yeah, I mean, I think that's a fascinating trajectory of how you arrived at this abstraction. Like you talked about how initially we're going from the top down, and then add the set of microspec rules, but then sort of reverse engineered the problem saying, okay, fine, we can't translate the microspec down to the hardware level microarchitectural details,

Starting point is 00:24:19 but maybe we can synthesize them from the bottom. So a couple of very important principles. One is translating through different levels of the abstraction stack. That's number one. The second part is sort of sometimes reversing the problem, right? Like you maybe are attacking the problem from one direction, sometimes starting from the solution and working backwards, like what do I need to do in order to make it more tractable

Starting point is 00:24:40 sort of shines the light on a different path? And ultimately you end up with a solution that gets the best of both words where you have an abstraction where the expertise is somewhat localized. You need the local expert to understand that layer of abstraction, but then it composes really well to help you translate it back to the top level abstraction. So I think very useful principles. Yeah, like the way you phrased it. Yeah, exactly it.

Starting point is 00:25:04 Yeah, maybe we can talk about memory consistency models and hardware implementations. Another theme that your work has focused on, and maybe there are some parallels between these two as security and security vulnerabilities and hardware. And this is, I think traditionally people have always known about this, but maybe over the last five to ten years or so, there's been more of a spotlight with attacks like Spector, meltdown, and a variety of others as well. Yeah. Could you talk a little bit about how you think about security, how it sort of relates to the verification problems that you have. seen in other domains. What are the parallels either in the theme of problems or in the techniques and approaches that you have towards tackling these two? Yeah, good question. Yeah. So actually, how I got started on this link between security and memory consistency was, actually it was around

Starting point is 00:25:52 end of 2016, I think or so. There was a 2016 Oakland paper that was studying storage side channel attacks. And basically, they were exploiting what I would kind of call a coherence protocol implementation detail in order to infer cash occupancy. And then they were using the ability to infer cash occupancy to conduct more standard cash side channel attacks and leak cryptographic keys and so on. Essentially, they were exploiting the fact on, I forget if it was maybe, I don't want to, I don't want to misquote which processor it was. But anyways, certain, certain processors, you were able to create two virtual aliases to the same physical address, one which is cashable and the other which was not cashable. And so by manipulating the cashable non-catchable aliases

Starting point is 00:26:41 is in a certain way, you could determine based on the value you loaded for a memory location whether or not it was in the cache or it had been evicted to memory. So essentially I like observed that actually even taking this a step further, you know, it's working on memory consistency model verification. And memory model bugs cause very subtle correctness issues, which perhaps more performance-oriented computer architects might look and say, well, is it really that big of a deal? Like, is it a big problem? And I thought, well, I wonder if there's an instance where memory model bugs could cause security problems. Then that's what kind of, that's how I ended up looking for this sort of research. And then I found this paper that had to do.

Starting point is 00:27:27 be, happened to be published around the same time. So I thought, okay, actually, the same techniques were using to model abstractly model memory consistency model implementations of processors, these MewSpec models, we could use them to model the underlying causes of certain security vulnerabilities, which really kind of boils down to ordering, ordering invisibility of hardware level events for that way. And so that was towards, let's say, two-thirds-ish of the way through my PhD, and I ended up repurposing MewSpec models as an input to a security verification tool and using these MUSPEC models as an input to a security verification tool, we uncovered new meltdown inspector style vulnerabilities. But why were we able to do that? Like, what is it about a MewSpec model that's really useful?

Starting point is 00:28:16 It's the effect that it encodes the variable hardware resource usage that can be exhibited by instructions when they execute. So it's the fact that they encode this multiple execution path behavior for instructions. The fact that one instruction can instantiate sort of like one happens before graph versus another happens before graph. And that variable hardware resource usage by instructions, if that variable hardware resource usage or execution path variability is data dependent, then your ability to distinguish between one path or the other allows you to infer that data. And so more recently, so that was an observation we made back in 2017, 2018. It was very timely with the release of meltdown inspector attack disclosure and so on.

Starting point is 00:29:04 More recently, we used this observation to extend this approach for hardware memory consistency model verification to conduct hardware security verification by the same observation that execution path variability for an instruction. So I can take a hit path versus a myth's path, or I can spend a number of cycles in my serial division unit depending on my operand values. Like this fact that one instruction can exhibit multiple paths is a very strong indicator that there is a hardware side channel in the input design. It doesn't guarantee there is. It could exhibit path variability that's not a function of some instructions operand data

Starting point is 00:29:44 value. But based on this observation, we came up with this way to automatically uncover, also based on this property generation from templates, all of the different execution paths and instruction can take on some RTL processor implementation. And then we had the second level of analysis, also with automated property generation, where we attributed instances of path variability for instructions

Starting point is 00:30:11 or attempted to attribute that path variability to the operands of one or more instructions. And if we were successful in doing so, that instructions whose operands we can blame on the PAC variability, that instruction is an instruction that's not safe for processing secrets because it can leak its operands via some observably distinct execution behavior for instructions. And then we also can pinpoint the why. So we know that this instruction is not safe for processing secrets, and we know it's because it causes a load to either access the store buffer or access

Starting point is 00:30:44 the cache when it reaches this particular step of this execution. So we can do both identifying the unsafe instructions and the side channels through which they leak their operands. Got it. Yeah. I think that's a good analogy between the architectural flows and the microarchitectural flows. They all have these happens before kind of relationships that dictate the orderings. And that's sort of the core property that all of your different verification flows exploit and or target. And so you can purpose them for either memory consistency models or for security. So a related theme is, of course, you want to verify the correctness of a particular memory consistency model or understand whether you have a certain set of security guarantees

Starting point is 00:31:22 of hardware. Now, once you've done that, there's, you need to write code that is still conformant to either a spec or the actual implemented behavior. And you seem like two sides of the same coin. So, of course, in the memory consistency world, if you're writing a compiler or, you know, the Java runtime, you need to understand the memory consistency spec to sort of insert the right fence instructions at appropriate points. Or if you're writing locks or other synchronization primitives, you need to sort of adhere to

Starting point is 00:31:47 the particular spec. What is the equivalence on the security side, especially because maybe these errors can percolate through multiple layers of the stack two? And is there a way to mitigate this, given that you're able to infer what are the potential side channels or leakage surface areas in the underlying hardware? So how do you sort of correlate it back to software that has already potentially been written and now needs to execute on hardware? Maybe future generations of hardware will fix certain side channels or have some mitigation approaches in hardware, but code has to run today and now. So how do we, every time there's a new vulnerability,

Starting point is 00:32:22 or if you discover a new vulnerability, you need some approach to sort of mitigate it. So how do you think about the security resolutions and mitigations and its relationship back to the software that you write? Yeah, good question. So let me maybe break down into a couple pieces, and I can mention like two pieces of work that I think are most relevant to this. So one thing we did a few years ago now,

Starting point is 00:32:46 I feel like this was 20, 2021-2020-ish was also based off of this observation that there's a lot of similarities between how we can verify memory model implementations of processors and side channel security of processors. We said, can we expose to software the leakage behavior of hardware in a similar way that we expose the ordering behavior of hardware? And so we came up with this way to extend these axiomatic memory model specifications, which again, really just state rules about the ordering and visibility of shared memory accesses. Can we sort of extend and repurpose the vocabulary that you use for defining those axiomatic memory model specifications to define these security

Starting point is 00:33:31 specifications? And I think we call them leakage containment models at the time. I think we probably referred to them as a security contract or a side channel security contract in that original paper. Around the same time, maybe a little before, there was a Oakland paper called Hardware Software Contracts for Secure Speculation by Marco Guanieri and others. And they decomposed this idea of a side channel security contract into a leakage contract piece and an execution contract piece. Our leakage containment models really merge both of those into one contract. We didn't differentiate between them so much. We didn't differentiate between the pieces so much. And But anyways, the interesting thing was because we were able to use a very similar specification

Starting point is 00:34:18 language for encoding the different ways in which like instructions may execute speculatively and expose their operands via hardware side channels as they execute. Basically, we gave, we came up with a way to sort of formally encode the speculative control and data flow graph that a program could exhibit at runtime. And this way we did it very closely matched what people have done for memory models. We were able to repurpose a tool. that had been previously designed for take as input an axiomatic memory model specification and a program, and it would rewrite the program with fences inserted, so now it exhibited sequentially consistent execution behavior when it ran on a processor implementing the memory

Starting point is 00:35:00 model that was supplied as input. And we took that tool and we repurposed it to automate, took as input one of our leakage containment models, which were of a very similar flavor and a program, and it automatically found specter vulnerabilities in programs. And we had a fence insertion pass as well, although I don't think that it was the most optimized in that version of the work, and I can tell you why. But it wasn't really central to our claim, but we did find some new specter vulnerabilities doing this in the open SSL and Lib Sodium Crypto Libraries. And the open SSL folks published a blog post about this, which was great. And one of the things, they mentioned, this is not a direct quote because I don't remember exactly,

Starting point is 00:35:43 but something along the lines of we can't patch every single security, like they can't sort of one-off patch everything we found because people will find new things tomorrow, and that's not really the best way to go about repairing your code, which I fully agreed with. And so what we ended up doing next was trying to think a little bit more holistically, like I think you're alluding to, which is like how do we really, rather than trying to just go bug finding,

Starting point is 00:36:05 like how can we repair code so it can now execute correctly on these hardware implementations that they leak stuff, but they maybe don't leak everything that they process. And so one thing we had noticed in the leakage containment models work was that a lot of the specter vulnerabilities we were finding in programs, while legitimate, they could be fixed by sort of modest rewriting of the code. So for example, a lot of specter vulnerabilities, potential specter vulnerabilities arise in programs because we let different functions, different stack layouts that write secrets to different memory location offsets in their stack, we allow them to share the same physical memory, like at different points in time.

Starting point is 00:36:48 And so one function could write a secret to a physical memory location on its stack, and then another function comes along and maybe write some public data to initialize its stack, and then it does a load of that public data, but it actually gets forwarded the value that was stale and was previously there, so it gets to speculatively read a secret, and then it goes on to leak that secret via one of my leaky instructions. There's kind of two ingredients for hardware site channel leakage. I have these instructions, these unsafe instructions, which can leak a function of their operands via side channels,

Starting point is 00:37:22 and I want to prevent those instructions from getting secrets supplied to them as input. And I could write my code. This is the constant time programming discipline. Many cryptographic routines are written in this way, such that in the architectural control and data flow graph of the program, I never allow secret data to reach the unsafe operands of these leaky instructions. But then I also have speculative execution, which allows sort of this speculative control and data flow graph of my program to evolve at runtime.

Starting point is 00:37:52 And now I can steer secrets towards the unsafe operands of these instructions at runtime. So what we did is we said, okay, we want to solve this problem at compile time. what are the ingredients we need from hardware to do that? And our first observation was we can't reason about a totally unconstrained speculative control and data flow graph at compile time. So if I wanted to say defend my program against sort of all hardware side channel attacks

Starting point is 00:38:25 including specter attacks at compile time, I'd need to know, I'd have to conservatively assume, let's say I'm running on an Intel processor just because they have variable length instruction and coding. I'd have to conservatively assume that every instruction could speculatively redirect control flow to any other instruction in my program, even to the middle of an instruction because of this variable length instruction coding. And all of a sudden, I have this totally unconstrained speculative control and data flow graph that's uncontroll flow graph, but there's similar story for data flow that I can't reason about. So we said, what if the programmer cannot disable speculation, but constrain it a little bit? And we essentially observed that there were, existing architectural security features. So this, just a quick digression to this, this Intel control flow enforcement technology we were using in this particular project, but Arm has something similar, which allows you to constrain forward and backward architectural control

Starting point is 00:39:21 flow to specific targets. So for example, for forward edge control flow, the compiler can insert these landing pad end branch instructions, which basically tell hardware, if you ever jump architecturally to a non-end branch, crash the program. So they're designed to mitigate control flow integrity style attacks, like jump-oriented programming, return-oriented programming, and so on. And so what we observed, though, was on a couple recent Intel microarchitectures. They also have some speculative guarantees, which is if you jump to a non-end branch, speculatively, the processor will stall speculation. And so given this observation, we said, okay, actually, there exist some tools already we can use to constrain speculative control flow.

Starting point is 00:40:05 So, for example, if I put an end branch at every function entry point, I know that my call instructions could speculatively jump to the wrong function entry point, but they're going to at least go to some finite, like reasonable set of locations. There can go to any function entry point, but not any instruction in my program. So we came up with this new hardware software contract, essentially, that gave experience, to software, the speculative control and data flow behaviors of a processor that allows you to do this sort of restriction of control and data flow speculation at compile time. And we came up with this compiler defense, which prevents secret data from reaching the leaky operands of these

Starting point is 00:40:48 unsafe instructions on all program executions that are possible on hardware that implements this execution contract, we called it. And the key things that we're doing are some clever program rewriting to say, give each function its own physical memory for its stack, this function private stack compiler pass we called it, or do cleaning of registers before control flow transitions. So if we have dead registers that contain secret data, you could, you know, speculatively jump to an instruction that leaks that data. And so we, we do some register cleaning. And then we also do some fence insertion to basically say, okay, now in my speculative control and data flow graph, I can identify all of these source and sync

Starting point is 00:41:34 instructions between which I need to prevent speculative data flow. And I can essentially formulate this minimum directed multi-cut problem to insert the minimum number of fences between all of the source instructions and the sync instructions. And we can get pretty good performance overhead for protecting, we were targeting cryptographic code in particular. And this is specific to cryptographic code for reasons I can talk about it. It's interesting. The performance guarantees are specific to cryptographic code. We can protect them from specter attacks on a real hardware. But yeah, I guess in summary, it's like, how do we really protect programs at their source? It's like, first we need to understand what are the minimum guarantees we need from

Starting point is 00:42:15 hardware in order to do that. And that's sort of where we started. How can we minimally restrict? In this case, it was speculative execution of software at runtime, but how can we restrict it at compile time? And once we had that minimal set of restrictions, we formalize. it. So we know what's the contract we have to verify hardware against, which we're working on now. And then we can go ahead and build our software analogies on top of that. So we really take this like formal specification grounded approach. Like we really crisply state what are the assumptions that we're making about hardware. And then we try and design something that's provably secure, robust with respect to that. So in this case, it sounded like you were leveraging

Starting point is 00:42:49 some tools that were in existence for Intel processors, right? So or at some form has something. think similar as well. So they sort of have these features where they can they can they can limit the scope of what might happen. Yep. And you leverage that in order to do your do your compile time stuff. So I guess what I wonder is you talked in the beginning about trying to increase the amount of information in the like that traditional ISA, they're not sufficient. Is this a sort of thing that you're speaking of or is there more? Because a lot of a lot of times when Spector and Delta first started happening, and I'm not a security expert by any needs. When they first started happening, the debate was extremely broad, right? It's just like,

Starting point is 00:43:35 well, performance versus security and like, we don't want to tell you about our microarchitecture or nobody wants to tell people about their microarchitecture because that's the secret sauce. And so I guess in this case, it sounds like the ship manufacturers have provided some amount of tooling. Is there more that you wish for? Yeah. Yeah, talk a little bit more about that. Yeah, yeah. So, yeah, all good question. So one thing is, I'll say, like, even though those tools were available, they weren't advertised as such. So the Intel CET, which we used in this work, was advertised as having these architectural guarantees only. We kind of approached collaborators at Intel to better understand if we could make these assumptions during speculation. And they since updated their software security guidance to basically say, actually, these two micro architectures that we assumed have these guarantees do, and they say something about it's the intended direction to continue this sort of thing. So I would say it wasn't actually, so yes, that's one example of, okay, now they've modified, you can say that the ISA or the

Starting point is 00:44:45 architecture to say, okay, we've committed to this sort of ability to constrain speculative control and data flow on Intel hardware, or these versions of Intel Hardware. However, I do think we can do more interesting things than, so one way in which you could change interfaces is give programmers, compiler, compiler, compiler's compiler's the ability to constrain some aspect of your program's execution at runtime, sort of the non-architectural aspect of its execution at runtime. Another thing you can do, this is on the security front, is communicate even finer-grained information to hardware. So, for example, we get good performance, I would say like, you know, 10 to 20 percent performance overhead, which I'm calling

Starting point is 00:45:31 good, for defending cryptographic code from specter attacks at compile time. However, the reason we get good performance for cryptographic code, and I'm not willing to say that for other classes of code, is because we have to insert these heavyweight fence instructions. And these heavyweight fence instructions, the reason we're inserting them is because we want to block speculative data flow from like one instruction before the fence to one instruction after it. But the fence is going to have much stronger semantics than that.

Starting point is 00:46:01 It's going to prevent speculative data flow from all instructions before the fence to all instructions after it. And so in cryptographic code, there's a lot of looping behavior so we can sort of strategically direct our fence insertion algorithm to insert fences outside of loops.

Starting point is 00:46:17 But for more general purpose code, we can't do the same thing. And so what we've been looking into, to since then and have some prototypes of is can we communicate finer-grained information about the program to hardware and let hardware sort of like, can we program a hardware defense, essentially, rather than sort of letting software do all the work or hardware do all the work, can we do a bit more collaboration between the two? And it's kind of inspired by this observation that the compiler knew a lot of information about the types of mitigations that

Starting point is 00:46:53 the program needs, and it's not able to adequately communicate those to hardware and so we get more performance overhead. So the answer to your question is, yes, that is an example of the type. So like this execution contract, like a CET style specification, but extended to speculative execution, is an example of these types of ISA contract modifications. But like for the specter defense part of our work, I think we can do more interesting specifications besides that. And I think it's really a question as to what is the best. Like, what should we do, which we're trying to, yeah, figure out, which a lot of people I think you're trying to figure out as well.

Starting point is 00:47:30 But it's an interesting question. That was a whirlwind tour of security verification, how you think about approaching these problems, but maybe also a sneak peek into how the hardware landscape is also evolving fairly rapidly. You talked about some of the security features that have been coming up in the hardware processors, both Intel, ARM and others as well. So I hope it gave our listeners. a broad overview of not just the techniques and the complexity of this particular domain,

Starting point is 00:47:58 but also how quickly the hardware itself is evolving and maybe how research sort of impacts this landscape here and now as this work is being carried out. Maybe this is a good time to sort of wind the clocks back. If you can tell our listeners, like how did you get interested in computer architecture and formal verification, security and so on? And what was your trajectory that led you to Stanford where you're currently? Maybe I'll go back to, so I did my undergraduate degree at Purdue University in computer engineering. And there was a time when I thought that I might want to go to medical school. So I was filling a lot of the free space in my schedule with pre-med classes.

Starting point is 00:48:37 But then I realized that I was having to remove a lot of the computer science leading courses, just I guess because of the nature of the curriculum in order to do that. And those were the courses I really liked the most in my degree. and so I decided against the med school plan, and I took compilers and computer architecture, and I really enjoyed both of those classes. I actually remember at the time a friend of mine was going to take the AI class, and I was like, I'm going to take computer architecture. It's much more practical. That's what I thought. So that's what I did. And I interned that summer at Intel. actually interestingly on the full equivalence verification and design for test team, which I

Starting point is 00:49:19 think it's probably called something different now, but I didn't realize that I was going to become interested in verification at that time. And so that was the summer before my senior year of undergrad. I worked at Intel and I really loved it. I thought about going back to Intel for job after undergrad, but I thought that surely I did not know enough about computers to really contribute. And so I decided I wanted to go to grad school. And my, My deciding to go to grad school was very not informed in that I didn't really know the difference between a master's degree and a Ph.D. I knew one was longer than the other, but I didn't really know much besides that. And so I applied to some master's programs and some PhD programs.

Starting point is 00:50:01 I really didn't have much research experience. I mean, I had a little bit, but it was from the, I think I want to go to med school days where I was doing some research on cochlear implants, but very, very lightweight, I would say. So I applied to some master's program, some PhD programs. I got very, very lucky because at Princeton, where I applied for a master's, Margaret Martinosi, who became my PhD advisor, happened to look at the master's applicant pool for PhD applicants that year. And I ended up switching my application to a PhD application and getting admitted to the PhD program. And I was very excited. that Margaret had taken so much interest in my application. So it was almost a no-brainer that I decided

Starting point is 00:50:47 to go to Princeton for my PhD. And I learned that it's more than just like six, five or six years of school. It's like a research job. And I absolutely loved it. And I did some internships during my PhD at NVIDIA. And I actually really liked those as well. I really liked the, I had really enjoyed academia. I liked the prospect of being able to continue to continue. to work on my interesting intersection of formal methods and architecture and security and memory models. And so I decided to apply for academic jobs. And I was very lucky again because I feel like the time I was applying for academic jobs. Like the work that I was doing at the time, the security verification was very timely and relevant because of Meltdown Inspector that were announced

Starting point is 00:51:35 like around the time I was submitting PhD applications. And so yeah, I went. I did interview and I had a lot of fun doing my interviews, and then I ended up here. And I think, like, the way that the way that I got involved in security, verification, and memory consistency, verification in general, was sort of by accident. I really liked logic and math, and I happened to work with a more senior grad student. Dan Lustig, who I mentioned earlier, as this sort of creator of U.HB, Muse spec, and these to be graphs that we used to represent program executions. And I worked with him closely on my first project during my PhD. And then I sort of ran with it. And I got interested in better motivating my

Starting point is 00:52:24 work with security applications to the maybe more skeptical about the importance of correctness. And yeah, it turned out to be a lot of really lucky, but good decision along the way. And so that's why I am that's that's really great to hear and I will say two things which is when you said surely I don't know enough about computers to actually I have said that exact same thing that is exactly why I went on to go to grad school too because like right now I talk to a lot of young people and they'll ask me like how did you decide why did you do it and I mean it's the exact same thought process just like why would anybody pay me to do anything I don't know anything yet you know and so I think it's actually pretty interesting that, you know, clearly you have just dug and dug and dug and dug and

Starting point is 00:53:14 kept going and learn more and more and more about your subject, no matter, you know, the, about your subject of choice. And I think there's something to be said for that level of sort of humility when you start. Like, I don't know. I have to find out. I have to find out. I have to find out more and more and more and more. So, A, I thought that was really cool. And then I just talked so much that I forgot when my second thing was going to be. But I really love the whole story. Margaret is, of course, amazing. And I'm glad. Oh, I remember now. When you were applying for jobs, I was at Microsoft and we really wanted you. We really wanted you. I know. I had a hard time actually saying no Microsoft also. That was the only, I think that was the only industry interview that

Starting point is 00:54:06 I did was at Microsoft and I really I really liked it. Yeah. Yeah. Well, you've ended up on a wonderful path. You're obviously doing great. And I think the last actually, so maybe it was three things I was going to say. The last thing we'll say is I was once at a conference and there was a young grad student going around asking people about how they got to where they were. And she asked me and I told her a bunch of stuff and I said, oh, I was really lucky here, really lucky here, really lucky here, really lucky here. And that really came out with what you said too. You said, you're lucky, A, B, and C, at various points in your life, lucky, lucky, lucky. And she looked at me and she said, Lisa, none of the men said they were lucky. Oh, yeah. And I was just like, oh, here's

Starting point is 00:54:46 another way where like we're kind of unified, where it's like a lot of gratitude for the luck. I mean, yeah, the luck that we had in our careers. But then she, she basically was like, hey, acknowledge the work that you put into it. And so I'm going to say to you also acknowledge the work that you put into it, because clearly you've put a ton of work into it. And yeah, luck favors the prepared and you're clearly. Yeah, luck, but I think, I agree with that last point, especially. Luck favors are prepared. So yeah, no, it's been really fun. And I think I do have, I am grateful, actually, to myself for having this. Surely I don't know enough attitude, I think, early on, because it did really, and it still does actually force me to dig in.

Starting point is 00:55:28 to things that I don't know about as well and understand them better. So, yeah, again, I'm very detail-oriented maybe to a fault, but I think it's because of that mentality. That's great. So maybe one thing that we usually also ask our guests is we, Symphony and I keep finding out more and more that, like, people are actually listening to this podcast, which is amazing. Thank you, listeners.

Starting point is 00:55:54 And it does seem like our audience, for a while I thought they skewed, young, which I think it still does, but we do have listeners from a variety. But for our young listeners, you're an esteemed light professor at Stanford. What kind of advice do you have for the people who are listening to this podcast? Yeah, that's a good point. So like grad school or earlier or I think. Yeah, let's say grad students. Okay. Yeah. I mean, I think maybe one piece of advice that I received that I still think about a lot is ask honest questions. I think that when I was a grad student, I was a little bit, I didn't have very high expectations of myself. Maybe that sounds like a weird thing to say, but like I was just trying to learn as much as I could and do the best I

Starting point is 00:56:41 do the best I could and work on interesting problems. And I remember sort of as I was about to do this transition from grad school outside, I was thinking like how much responsibility. Like now I can submit papers that no one has checked like no one has no one has signed off on me like putting things into the universe and some advice i got about being a little apprehensive about that transition was just to keep asking honest questions and so i think that has been a good piece of at least sort of technical but i mean it's partially technical um partially sort of yeah like social i guess advice is yeah if you ask honest questions you'll keep trying to drill down and you'll do interesting work and you'll make interesting connections also from talking to people and

Starting point is 00:57:27 so on. So that's at least one piece. I like that a lot. Go ahead, see what. Yeah. So I was just going to say through both these responses, I think those are golden words. Humility and curiosity are two essential attributes of a successful researcher. And I think it certainly shows over the course of your trajectory and hope to see more of this and very valuable words of wisdom to our listeners as well. Thanks, thanks. Actually, I thought of one more actually. So my one other one is that maybe it's more of a belief, which is I think all research is incremental, but it may be more or less interesting depending on sort of like what pieces are combined in order to contribute that next increment. So I think like me, I'm sure there's exceptions to this of like really huge breakthroughs that are maybe seemingly less incremental. But even like, like, my jump from memory consistency to security, which may have seemed like a stranger leap, like to me actually seemed very natural because of this lower level connection that I was making. So I think if you sort of look in between disciplines or gain as much experience as

Starting point is 00:58:38 you can, like diving into the details and different topics and or just like your own topic, but different aspects of it, I think you will notice interesting connections that seem incremental to you but seem more transformational to other people. So that's, there's my second, my second piece of advice. I don't know if it's wise or not, but. No, I like that too. I like that too because one thing that I have realized the deeper into my career that I go is that there are a lot of interesting problems out there and there are a lot of smart people out there, but not all of them are like staring at the same thing. Right. Like it is truly conceivable that if you really look at, something and look at something adjacent to it, like you really might be one of the only people

Starting point is 00:59:24 who is really looking at it, like really about it. And if you're asking honest questions about it, as you say, and if you really want to know, and if you're really digging, there is definitely terrain out there to search. Particularly in industry, I don't know if you agree with this too many, but I feel like in industry, we have, and actually probably true in academia, we have so much going on, right? You have, in academia, you have your service commitments. And our industry, you know, we have meetings. We have, we have all sorts of stuff and churn that are not necessarily related to the work itself, right? So there's plenty of opportunity to spend a lot of time on things that are not the work itself. So if you do spend the time on the work itself and are

Starting point is 01:00:06 asking honest questions and are really drilling, there is absolutely room, in my opinion, for anything that interests people. Yeah, I totally agree. I love that. You said. Oh, yeah. Thanks. Carolyn, that was such an amazing conversation. I think we ran the gamut from, you know, really, really wonky, deep discussions for consistency models and security, as well as, you know, some lighter affair about your career trajectory

Starting point is 01:00:34 and words of wisdom. And it's been really, really awesome. Thank you so much for talking with us today. Yeah, thanks a lot for having me. It's been a lot of fun. Yeah, I'm going, Lisa. This has been an absolute delight speaking with you. And to our listeners,

Starting point is 01:00:45 thank you for being with us on the computer architecture podcast. Until next time, it's goodbye. from us.

Computer Architecture Podcast - Ep 21: High-assurance Computer Architectures with Dr. Caroline Trippel, Stanford University

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.