Microsoft Research Podcast - Abstracts: November 5, 2024
Episode Date: November 5, 2024Researchers Chris Hawblitzel and Jay Lorch share how progress in programming languages and verification approaches are bringing bug-free software within reach. Their work on the Rust verification tool... Verus won the Distinguished Artifact Award at SOSP ’24.Read the paper
Transcript
Discussion (0)
Welcome to Abstracts,
a Microsoft Research podcast that puts
the spotlight on world-class research in brief.
I'm Amber Tingle.
In this series, members of the research community at Microsoft give us
a quick snapshot or a podcast abstract of their new and noteworthy papers.
Our guests today are Chris Halblitzel and Jay Lorch.
They are both senior principal researchers at Microsoft
and two of the co-authors on a paper called Verus,
a Practical Foundation for Systems Verification.
This work received the Distinguished Artifact Award
at the 30th Symposium on Operating
Systems Principles, also known as SOSP, which is happening right now in Austin,
Texas. Chris and Jay, thank you for joining us today for Abstracts and
congratulations. Thank you for having us. Chris, let's start with an
overview. What problem does this research address?
And why is Veris something that the broader research community should know about?
So what we're trying to address is a very simple problem where we're trying to help
developers write software that doesn't have bugs in it.
And we're trying to provide a tool with Veris that will help developers show that their code actually behaves the way it's supposed to.
It obeys some sort of specification for what the program is supposed to do.
How does this publication build on or differ from other research in this field, including your previous Veris-related work? So formal verification is a process where you
write down what it is that you want your program to do in mathematical terms. So if you're writing
an algorithm to sort a list, for example, you might say that the output of this algorithm should be a new list that is a rearrangement of the elements of the old list,
but now this rearrangement should be in sorted order.
So you can write that down using standard mathematics.
And now, given that mathematical specification, the challenge is to prove that your piece of software written in a particular language
like Java or C-sharp or Rust actually generates an output that meets that mathematical specification.
So this idea of using verification to prove that your software obeys some sort of specification,
this has been around for a long time. So, you know, even Alan
Turing talked about ways of doing this many, many decades ago. The challenge has always been that
it's really hard to develop these proofs for any large piece of software. It simply takes a long
time for a human being to write down a proof of correctness of their software.
And so what we're trying to do is to build on earlier work in verification and recent
developments in programming languages to try to make this as easy as possible and to try to make
it as accessible to ordinary software developers as
possible. So we've been using existing tools. There are automated theorem provers, one of them
from Microsoft Research called Z3, where you give it a mathematical formula and ask it to prove that
the formula is valid. We're building on that. And we're also taking a lot of inspiration from tools developed at Microsoft
Research and elsewhere, like Daphne and F-Star and so on, that we've used in the past for previous
verification projects. And we're trying to take ideas from those and make them accessible to
developers who are using common programming languages. In this case, the Rust programming language is what we're focusing on.
Jay, could you describe your methodology for us
and maybe share a bit about how you and your co-authors
tested the robustness of Verus?
So the question we really want to answer is,
is Verus suitable for systems programming?
So that means a variety of things. Is it
amenable to a variety of kinds of software that you want to build as part of a system?
Is it usable by developers? Can they produce compact proofs? And can they get timely feedback
about those proofs? Can the verifier tell you quickly that your proof is correct, or if it's wrong, that it's wrong, and guide you to fix it?
So the main two methodological techniques we used were millibenchmarks and full systems.
So the millibenchmarks are small pieces of programs that have been verified by other tools in the past,
and we built them in Verus and compared to what other tools
would do to find whether we could improve usability. And we found generally that we could
verify the same things, but with more compact proofs and proofs that would give much snappier
feedback. The difference between one second and 10 seconds might not seem a lot, but when you're
writing code and working with the verifier,
it's much nicer to get immediate feedback about what is wrong with your proof. So you can say,
oh, what about this? And it can say, oh, well, I still see a problem there. And you can say,
okay, let me fix that, as opposed to waiting 10, 20 seconds between each such cue to the verifier.
So the middle of benchmarks helped us evaluate that. And the macro benchmarks, the building entire systems, we built a couple of distributed systems that had been verified before, a key value store and a node replication system to show that you could do them more effectively and with less verification time.
We also built some new systems, a verified OS page table, a memory allocator, and a persistent memory append-only log.
Chris, the paper mentions that successfully verifying system software has required,
you actually used the word heroic to describe the developer effort. Thinking of those heroes in the developer community and perhaps others, what real-world impact do you expect Verus to
have? What kind of gains are we talking about here?
Yeah, so I think traditionally verification or this formal software verification that we're doing
has been considered a little bit of a pie-in-the-sky research agenda,
something that people have applied to small research problems
but has not necessarily had a real-world impact before. something that people have applied to small research problems,
but has not necessarily had a real-world impact before.
And so I think it's just recently in the last 10 or 15 years that we started to see a change in this
and started to see verified software actually deployed in practice.
So on one of our previous projects,
we worked on verifying the cryptographic primitives
that people use when, say, they browse the web or something
and their data is encrypted.
So in these cryptographic primitives,
there's a very clear specification
for exactly what bytes you're
supposed to produce when you encrypt some data. And the challenge is just writing software that
actually performs those operations and does so efficiently. So in one of our previous projects
that we worked on called Hacklestar and EverCrypt, we verified some of the most commonly used and efficient
cryptographic primitives for things like encryption and hashing and so on.
And these are things that are actually used on a day-to-day basis.
So we kind of took from that experience that the tools that we're building are getting ready for prime time
here. We can actually verify software that is security critical, reliability critical, and is
in use. So some of the things that Jay just mentioned, like verifying persistent memory
storage systems and so on, those are the things that we're looking at next
for software that would really benefit from reliability
and where we can formally prove that your data that's written to disk
is read correctly back from disk and not lost during a crash, for example.
So that's the kind of software that we're looking to verify to try
to have a real-world impact. The way I see the real-world impact is going to enable Microsoft
to deal with a couple of challenges that are severe and increasing in scale. So the first
challenge is attackers, and the second challenge is the vast scale at which we operate. There's a
lot of hackers out there with
a lot of resources that are trying to get through our defenses, and every bug that we have offers
them purchase. And techniques like this that can get rid of bugs allow us to deal with that
increasing attacker capability. The other challenge we have is scale. We have billions of customers. We have vast amounts of data and
compute power. And when you have a bug that you've thoroughly tested, but then you run it
on millions of computers over decades, those rare bugs eventually crop up. So they become a problem
and traditional testing has a lot of difficulty finding those.
And this technology, which enables us to reason about the infinite possibilities in a finite
amount of time and observe all possible ways that the system can go wrong and make sure that it can
deal with them, that enables us to deal with the vast scale that Microsoft operates on today.
Yeah. And I think this is an important point that differentiates
us from testing. Traditionally, you find a bug when you
see that bug happen in running software.
With formal verification, we're catching the bugs before
you run the software at all. We're trying to prove that on all possible
inputs, on all possible executions of the software at all. We're trying to prove that on all possible inputs, on all possible
executions of the software, these bugs will not happen. And it's much cheaper to fix bugs
before you've deployed the software that has bugs before attackers have tried to exploit those bugs.
So Jay, ideally, what would you like our listeners and your fellow SOSP conference
attendees to tell their colleagues about Veris?
What's the key takeaway here?
I think the key takeaway is that it is possible now to build software without bugs, to build systems code that is going to obey its specification on all possible inputs always.
We have that technology.
And this is possible now because a lot of technology has
advanced to the point where we can use it. So for one thing, there's advances in programming
languages. People are moving from C to Rust. They've discovered that you can get the high
performance that you want for systems code without having to sacrifice the ability to
reason about ownership and lifetimes concurrency. The other thing that we
build on is advances in computer-aided theorem proofing. So we can really make compact and quick
to verify mathematical descriptions of all possible behaviors of a program and get fast
answers that allow us to rapidly turn around proof challenges from developers.
Well, finally, Chris, what are some of the open questions or future opportunities for
formal software verification research? And what might you and your collaborators tackle next?
I heard a few of the things earlier.
Yes, I think despite the effort that we and many other researchers have put into trying
to make these tools more accessible, trying to make them easier to use, there still is
a lot of work to prove a piece of software correct, even with advanced state-of-the-art
tools. And so we're still going to keep trying to push to make that easier,
trying to figure out how to automate the process better.
There's a lot of interest right now in artificial intelligence
for trying to help with this,
especially if you think about artificial intelligence
actually writing software,
you ask it to write a piece of software
to do a particular task,
and it generates some C code or some REST code
or some Java code.
And then you hope that that's correct
because it could have generated any sort of card
that performs the right thing or does total nonsense.
So it would be really great going forward
if when we ask AI to develop software,
we also expect it to create a proof
that the software is correct
and does what the user asked for.
We've started working on some projects
and we found that the AI is not quite there yet
for realistic code.
It can do small examples this way.
But I think this is still a very large challenge going forward
that could have a large payoff in the future
if we can get AI to develop software
and prove that the software is correct.
Yeah, I see there's a lot of synergy between,
potential synergy between AI and verification.
Artificial intelligence can solve
one of the key challenges of verification,
namely making it easy for developers to write that code.
And verification can solve one of the key challenges of AI,
which is hallucinations,
synthesizing code that is not correct,
and Veris can verify that that code actually is correct.
Well, Chris Haublitzel and Jay Lorch,
thank you so much for joining us today on
the Microsoft Research Podcast to discuss your work on Veris.
Thanks for having us.
Thank you.
To our listeners, we appreciate you too.
If you'd like to learn more about Veris,
you'll find a link to the paper at aka.ms slash abstracts, or you can read it on the SOSP website.
Thanks for tuning in. I'm Amber Tingle, and we hope you'll join us again for Abstracts. Thank you.