Microsoft Research Podcast - Abstracts: November 5, 2024

Starting point is 00:00:00 Welcome to Abstracts, a Microsoft Research podcast that puts the spotlight on world-class research in brief. I'm Amber Tingle. In this series, members of the research community at Microsoft give us a quick snapshot or a podcast abstract of their new and noteworthy papers. Our guests today are Chris Halblitzel and Jay Lorch. They are both senior principal researchers at Microsoft

Starting point is 00:00:31 and two of the co-authors on a paper called Verus, a Practical Foundation for Systems Verification. This work received the Distinguished Artifact Award at the 30th Symposium on Operating Systems Principles, also known as SOSP, which is happening right now in Austin, Texas. Chris and Jay, thank you for joining us today for Abstracts and congratulations. Thank you for having us. Chris, let's start with an overview. What problem does this research address?

Starting point is 00:01:06 And why is Veris something that the broader research community should know about? So what we're trying to address is a very simple problem where we're trying to help developers write software that doesn't have bugs in it. And we're trying to provide a tool with Veris that will help developers show that their code actually behaves the way it's supposed to. It obeys some sort of specification for what the program is supposed to do. How does this publication build on or differ from other research in this field, including your previous Veris-related work? So formal verification is a process where you write down what it is that you want your program to do in mathematical terms. So if you're writing an algorithm to sort a list, for example, you might say that the output of this algorithm should be a new list that is a rearrangement of the elements of the old list,

Starting point is 00:02:09 but now this rearrangement should be in sorted order. So you can write that down using standard mathematics. And now, given that mathematical specification, the challenge is to prove that your piece of software written in a particular language like Java or C-sharp or Rust actually generates an output that meets that mathematical specification. So this idea of using verification to prove that your software obeys some sort of specification, this has been around for a long time. So, you know, even Alan Turing talked about ways of doing this many, many decades ago. The challenge has always been that it's really hard to develop these proofs for any large piece of software. It simply takes a long

Starting point is 00:03:01 time for a human being to write down a proof of correctness of their software. And so what we're trying to do is to build on earlier work in verification and recent developments in programming languages to try to make this as easy as possible and to try to make it as accessible to ordinary software developers as possible. So we've been using existing tools. There are automated theorem provers, one of them from Microsoft Research called Z3, where you give it a mathematical formula and ask it to prove that the formula is valid. We're building on that. And we're also taking a lot of inspiration from tools developed at Microsoft Research and elsewhere, like Daphne and F-Star and so on, that we've used in the past for previous

Starting point is 00:03:54 verification projects. And we're trying to take ideas from those and make them accessible to developers who are using common programming languages. In this case, the Rust programming language is what we're focusing on. Jay, could you describe your methodology for us and maybe share a bit about how you and your co-authors tested the robustness of Verus? So the question we really want to answer is, is Verus suitable for systems programming? So that means a variety of things. Is it

Starting point is 00:04:25 amenable to a variety of kinds of software that you want to build as part of a system? Is it usable by developers? Can they produce compact proofs? And can they get timely feedback about those proofs? Can the verifier tell you quickly that your proof is correct, or if it's wrong, that it's wrong, and guide you to fix it? So the main two methodological techniques we used were millibenchmarks and full systems. So the millibenchmarks are small pieces of programs that have been verified by other tools in the past, and we built them in Verus and compared to what other tools would do to find whether we could improve usability. And we found generally that we could verify the same things, but with more compact proofs and proofs that would give much snappier

Starting point is 00:05:17 feedback. The difference between one second and 10 seconds might not seem a lot, but when you're writing code and working with the verifier, it's much nicer to get immediate feedback about what is wrong with your proof. So you can say, oh, what about this? And it can say, oh, well, I still see a problem there. And you can say, okay, let me fix that, as opposed to waiting 10, 20 seconds between each such cue to the verifier. So the middle of benchmarks helped us evaluate that. And the macro benchmarks, the building entire systems, we built a couple of distributed systems that had been verified before, a key value store and a node replication system to show that you could do them more effectively and with less verification time. We also built some new systems, a verified OS page table, a memory allocator, and a persistent memory append-only log. Chris, the paper mentions that successfully verifying system software has required,

Starting point is 00:06:16 you actually used the word heroic to describe the developer effort. Thinking of those heroes in the developer community and perhaps others, what real-world impact do you expect Verus to have? What kind of gains are we talking about here? Yeah, so I think traditionally verification or this formal software verification that we're doing has been considered a little bit of a pie-in-the-sky research agenda, something that people have applied to small research problems but has not necessarily had a real-world impact before. something that people have applied to small research problems, but has not necessarily had a real-world impact before. And so I think it's just recently in the last 10 or 15 years that we started to see a change in this

Starting point is 00:06:57 and started to see verified software actually deployed in practice. So on one of our previous projects, we worked on verifying the cryptographic primitives that people use when, say, they browse the web or something and their data is encrypted. So in these cryptographic primitives, there's a very clear specification for exactly what bytes you're

Starting point is 00:07:26 supposed to produce when you encrypt some data. And the challenge is just writing software that actually performs those operations and does so efficiently. So in one of our previous projects that we worked on called Hacklestar and EverCrypt, we verified some of the most commonly used and efficient cryptographic primitives for things like encryption and hashing and so on. And these are things that are actually used on a day-to-day basis. So we kind of took from that experience that the tools that we're building are getting ready for prime time here. We can actually verify software that is security critical, reliability critical, and is in use. So some of the things that Jay just mentioned, like verifying persistent memory

Starting point is 00:08:22 storage systems and so on, those are the things that we're looking at next for software that would really benefit from reliability and where we can formally prove that your data that's written to disk is read correctly back from disk and not lost during a crash, for example. So that's the kind of software that we're looking to verify to try to have a real-world impact. The way I see the real-world impact is going to enable Microsoft to deal with a couple of challenges that are severe and increasing in scale. So the first challenge is attackers, and the second challenge is the vast scale at which we operate. There's a

Starting point is 00:09:03 lot of hackers out there with a lot of resources that are trying to get through our defenses, and every bug that we have offers them purchase. And techniques like this that can get rid of bugs allow us to deal with that increasing attacker capability. The other challenge we have is scale. We have billions of customers. We have vast amounts of data and compute power. And when you have a bug that you've thoroughly tested, but then you run it on millions of computers over decades, those rare bugs eventually crop up. So they become a problem and traditional testing has a lot of difficulty finding those. And this technology, which enables us to reason about the infinite possibilities in a finite

Starting point is 00:09:52 amount of time and observe all possible ways that the system can go wrong and make sure that it can deal with them, that enables us to deal with the vast scale that Microsoft operates on today. Yeah. And I think this is an important point that differentiates us from testing. Traditionally, you find a bug when you see that bug happen in running software. With formal verification, we're catching the bugs before you run the software at all. We're trying to prove that on all possible inputs, on all possible executions of the software at all. We're trying to prove that on all possible inputs, on all possible

Starting point is 00:10:25 executions of the software, these bugs will not happen. And it's much cheaper to fix bugs before you've deployed the software that has bugs before attackers have tried to exploit those bugs. So Jay, ideally, what would you like our listeners and your fellow SOSP conference attendees to tell their colleagues about Veris? What's the key takeaway here? I think the key takeaway is that it is possible now to build software without bugs, to build systems code that is going to obey its specification on all possible inputs always. We have that technology. And this is possible now because a lot of technology has

Starting point is 00:11:06 advanced to the point where we can use it. So for one thing, there's advances in programming languages. People are moving from C to Rust. They've discovered that you can get the high performance that you want for systems code without having to sacrifice the ability to reason about ownership and lifetimes concurrency. The other thing that we build on is advances in computer-aided theorem proofing. So we can really make compact and quick to verify mathematical descriptions of all possible behaviors of a program and get fast answers that allow us to rapidly turn around proof challenges from developers. Well, finally, Chris, what are some of the open questions or future opportunities for

Starting point is 00:11:52 formal software verification research? And what might you and your collaborators tackle next? I heard a few of the things earlier. Yes, I think despite the effort that we and many other researchers have put into trying to make these tools more accessible, trying to make them easier to use, there still is a lot of work to prove a piece of software correct, even with advanced state-of-the-art tools. And so we're still going to keep trying to push to make that easier, trying to figure out how to automate the process better. There's a lot of interest right now in artificial intelligence

Starting point is 00:12:35 for trying to help with this, especially if you think about artificial intelligence actually writing software, you ask it to write a piece of software to do a particular task, and it generates some C code or some REST code or some Java code. And then you hope that that's correct

Starting point is 00:12:55 because it could have generated any sort of card that performs the right thing or does total nonsense. So it would be really great going forward if when we ask AI to develop software, we also expect it to create a proof that the software is correct and does what the user asked for. We've started working on some projects

Starting point is 00:13:18 and we found that the AI is not quite there yet for realistic code. It can do small examples this way. But I think this is still a very large challenge going forward that could have a large payoff in the future if we can get AI to develop software and prove that the software is correct. Yeah, I see there's a lot of synergy between,

Starting point is 00:13:44 potential synergy between AI and verification. Artificial intelligence can solve one of the key challenges of verification, namely making it easy for developers to write that code. And verification can solve one of the key challenges of AI, which is hallucinations, synthesizing code that is not correct, and Veris can verify that that code actually is correct.

Starting point is 00:14:06 Well, Chris Haublitzel and Jay Lorch, thank you so much for joining us today on the Microsoft Research Podcast to discuss your work on Veris. Thanks for having us. Thank you. To our listeners, we appreciate you too. If you'd like to learn more about Veris, you'll find a link to the paper at aka.ms slash abstracts, or you can read it on the SOSP website.

Starting point is 00:14:32 Thanks for tuning in. I'm Amber Tingle, and we hope you'll join us again for Abstracts. Thank you.

Microsoft Research Podcast - Abstracts: November 5, 2024

Researchers Chris Hawblitzel and Jay Lorch share how progress in programming languages and verification approaches are bringing bug-free software within reach. Their work on the Rust verification tool... Verus won the Distinguished Artifact Award at SOSP ’24.Read the paper

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Microsoft Research Podcast - Abstracts: November 5, 2024

Researchers Chris Hawblitzel and Jay Lorch share how progress in programming languages and verification approaches are bringing bug-free software within reach. Their work on the Rust verification tool... Verus won the Distinguished Artifact Award at SOSP ’24.Read the paper

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.