Disseminate: The Computer Science Research Podcast - Dominik Winterer | Validating SMT Solvers for Correctness and Performance via Grammar-based Enumeration | #63

Episode Date: July 25, 2025

In this episode of the Disseminate podcast, Dominik Winterer discusses his research on SMT (Satisfiability Modulo Theories) solvers and his recent OOPSLA paper titled "Validating SMT Solvers for Corre...ction and Performance via Grammar Based Enumeration". Dominik shares his academic journey from the University of Freiburg to ETH Zurich, and now to a lectureship at the University of Manchester. He introduces ET, a tool he developed for exhaustive grammar-based testing of SMT solvers. Unlike traditional fuzzers that use random input generation, ET systematically enumerates small, syntactically valid inputs using context-free grammars to expose bugs more effectively. This approach simplifies bug triage and has revealed over 100 bugs—many of them soundness and performance-related—with a striking number having already been fixed. Dominik emphasizes the tool’s surprising ability to identify deep bugs using minimal input and track solver evolution over time, highlighting ET's potential for continuous integration into CI pipelines.The conversation then expands into broader reflections on formal methods and the future of software reliability. Dominik advocates for a new discipline—Formal Methods Engineering—to bridge the gap between software engineering and formal verification tools. He stresses the importance of building trustworthy verification tools since the reliability of software increasingly depends on them. Dominik also discusses adapting ET to other domains, such as JavaScript engines, and suggests that grammar-based enumeration can be applied widely to any system with a context-free grammar. Addressing the rise of AI, he envisions validation portfolios that integrate formal methods into LLM-based tooling, offering certified assessments of model outputs. He closes with a call for the community to embrace pragmatic, systematic, and scalable approaches to formal methods to ensure these tools can live up to their promises in real-world development settings.Links:Dominik's HomepageValidating SMT Solvers for Correctness and Performance via Grammar-Based Enumeration Hosted on Acast. See acast.com/privacy for more information.

Transcript
Discussion (0)
Starting point is 00:00:00 Disseminate the Computer Science Research Podcast. Hey folks and welcome to another episode of Disseminate the Computer Science Research Podcast. Today, we're going to be talking about a paper from OOPSLA, which is my favourite acronym of all the conferences, by the way. So yeah, that stands for those who don't know. It stands for Objects Oriented Programming Systems, Languages and Applications. And the specific paper we're going to be talking about is validating
Starting point is 00:00:27 SMT solvers for correction and performance via grammar-based enumeration. Now this paper was written by Dominic Winterer and his colleague, Zheng Dongsu. And I'm really happy to say I've got Dominic with me today on the podcast. So welcome Dominic. Oh, thank you, Jack, for the nice introduction. Cool. So yeah, I guess we'll go into your journey in a second, Dominic. Thank you, Jack, for the nice introduction. Cool. So yeah, I guess we'll go into your journey in a second, Dominic, but you finished your PhD in your postdoc and you're going to be an incoming lecturer at the University of
Starting point is 00:00:54 Manchester in September. And you're hiring, right? So we can get this plugging straight away as well. So to any listeners who are interested in formal methods and whatnot, do reach out to Dominic. But yeah, cool. So welcome to the show. And it's always customary on the podcast for the guests to start off telling us about their story. So yeah, tell us about your background and how you became interested in formal methods and your research area. Yeah, basically. I grew up around computers in the early 2000s. So that sparked a great interest when my father was building a computer inside our living
Starting point is 00:01:34 room. And then I started doing my studies in the University of Freiburg in Germany, which is a beautiful and enjoyable place. And their logic and math and these things were really the focus. And in order to pass a computer science degree, you had to master these things. And during this time, I also learned a lot of several amazing mentors that taught me research and wanted me to get interested in their research on symbolic AI where Freiburg at the time was still a big player and a great place to be. And so the pivotal moment essentially which tied me to research career was essentially when I
Starting point is 00:02:29 saw a research lab come together in front of a deadline. The kind of energy that was there was really amazing and that really started everything. I did a research internship at the IBM in the US. And then as you said, I did my PhD at ETH Zurich and postdoc. And now I'm happy to join the University of Manchester in September. That's fantastic. Yeah. So you said that Frye Beggs is a beautiful place. And that's the one right on the edge of the Black Forest, right?
Starting point is 00:03:00 All around that part of the world, right? That's right. Yeah. I remember going there as a kid and yeah, I really loved it. It was a beautiful part of the world, right? That's right. Yeah. Yeah. I remember going there as a kid and yeah, I really loved it, Gupa part of the world. Cool. So yeah, and obviously Manchester next, that's going to be fantastic as well.
Starting point is 00:03:11 And sort of building your own lab, right? And sort of fleshing that out. And that maybe, I guess some of the experience you mentioned there that was inspiring, I guess you're going to try and put that into practice again for your, on your own path, right? So that's really cool. And I hope that goes well for you.
Starting point is 00:03:25 So next, I guess, let's ground ourselves for this chat today and set some background and some context and some of the key terms. So yeah, I guess, I mean, for the uninitiated, yeah, what are SMT solvers and what's a grammar? And yeah, tell us about all these really, these key words we're gonna need to understand to have for this chat today, don't we?
Starting point is 00:03:46 Yeah, I may add to the previous question maybe that I got interested in this formal methods software engineering world by combining essentially what I had in Freiburg, which was formal methods, symbolic AI research was kind of similar. And when I got to ETH, a very practical software engineering and I built those two, I merged and that's how I got interested in SMT servers, essentially. SMT servers are, you can think of powerful theorem provers that take formulas and decide whether these formulas are satisfiable. That means whether there's an assignment to the variables in the formulas or whether they are unsatisfiable.
Starting point is 00:04:34 And so these SMT servers are nowadays used as a critical infrastructure for software research and industry, specifically in testing and verification, for example, and symbolic execution or also for verification of whatever is necessary to verify. So, these are SMT servers. SMT servers are complex pieces of software. They have a lot of logic and intricate algorithms inside of them. They solve NP-hard problems and they are very important for software industry and research.
Starting point is 00:05:18 And so, yeah, maybe a bit of background on grammars. All you need to know is pretty much computer science 101. Context-free grammar is essentially very much represented as you would write it down on a paper. It's similar in a file. They are used for what I also should give some background off is a black box testing, which is the goal is to create as many test inputs for a particular software to fire at and using these inputs, we want to find box. And grammar-based testing is we use a grammar to describe the inputs that we want to generate and throw it at the software on the test, essentially. Okay, cool. Sorry, Donnie. Can you continue, please?
Starting point is 00:06:17 No, I just wanted to add that we aim to test SMT servers in that case. Cool. Yeah. So you mentioned that fuzz testing is sort of a traditional way of testing these types of solvers or these solvers. So what are the limitations of those approaches and that sort of make the grammar based approach to be more advantageous? So you can think of the different ways of...there are several ways of testing. You can do mutations on existing inputs or you can basically describe the inputs from the ground up, which you can do by grammar-based.
Starting point is 00:06:57 So with grammar-based, I wouldn't say that one is superior to the other. It's more that they are orthogonal. We use grammar-based testers in this case because we wanted to enumerate inputs from the ground up from smallest to larger inputs. Okay, cool. That's a nice segue then into the elevator pitch of the paper and the motivation of the paper. What was the overarching goal of the paper and the motivation of the paper. So yeah, what was the overarching goal of the paper? Maybe there's some sub-goals in there as well. So the overarching goal of the paper was to... Because at the time, SMT servers were tested quite a lot and there were several buffeting campaigns have been done, and many bugs have been found and many bugs have been found
Starting point is 00:07:46 and many fixes have been done by the developers. But all these SMT solver testers were unsystematic or essentially random based. And with random based fuzzers, it's unclear when to stop. Usually the bugs that they trigger were very sizable such that you need a lot of time to reduce these bug triggers to a size where developers can even think about dry aging these bugs. And so with all these tests and all these bugs, it was also not quite clear whether SMT solvers do even become better over time. Although there was a big academic effort to find more and more bugs in SMT solvers. And so the motivation for us was to try and first of all twofold. Make it very simple. Try to find the most
Starting point is 00:08:50 simplest bugs that are not causing too much trouble on the developer's side. So we want super small formulas and we also want to, with these super small formulas, we want to judge the evolution of SMT servers. And we can do that because we have done an evaluation before. I found that 85% of all bug triggers in SMT server reduce indeed to small bugs. So we're quite confident that we find a lot of important bugs with this method. Yeah, that's cool. I really liked the sort of the goal of sort of trying to evaluate this
Starting point is 00:09:32 sort of correctness and performance over time. It was a really nice, nice, nice aspect of a really nice goal to have. So let's get into how you went about achieving these goals then. So walk us through the sort of the design of the key components. And the thing you designed is called ET, right? So we need to know why it's called ET as well at some point as well. And yeah, tell us about how you went about designing ET and what the key components were. Yeah.
Starting point is 00:09:58 So maybe first on the name, ET should be shortened for enumerative or exhaustive tester or also extraterrestrial. Because when we were initially finding the first bugs with this tool, it was quite magical that they were super small and we did not have to do anything to reduce them further. We could just not have to do anything to reduce them further. We could just give them away to developers. How ET works is pretty simple. We formulate context-free grammars for the different SMT theories. We have grammars for the core theories, essentially Boolean logic, then we have for arithmetic, strings, and so on. And then we compile these grammars into algebraic data types in Haskell. And then we use an off-the-shelf tester called Feet that can then enumerate the inputs. So it's more about the insight of finding small bugs rather than the technology or the key idea. So in a sense, the real contribution here is really to
Starting point is 00:11:17 keep the bug small and investigate evolution rather than making a technical or conceptual approach contribution. Yeah, just for the uninitiated here, Dominic, when you say a small bug versus a large bug, what could you give us a concrete example of that and what you actually mean there by the difference between small and large? Yeah, so essentially what I was talking about or was referring to as bug is not quite a bug. It's a bug trigger. It's an input that can manifest a bug in a software. If that input is very large in size, which means, for example, in SMT, it means like a nested formula with lots of variables, lots of nested components.
Starting point is 00:12:13 That's difficult for the developers to find out why this formula triggered a bug and what the root cause of it is. On the contrary, you have a very small bug trigger. It's minimal even. Then you can at least think about which of the components that you see in the input could be causal for the root or could cause this bug, which makes the difference whether developers can even look at bugs or not. If they're small enough, they might just straight out reject two large bugs. Okay, Cool. I get it. So I guess, um, when, when you were, um, building the, the, the, the, the ET pipeline and these various different steps in there, you said that you just, you grabbed kind of an off the shelf tool for, for doing the enumeration.
Starting point is 00:13:14 And what made you, what made you choose that specific tool? I think it's called feet, right? It is. What was the reasons for choosing that versus something else? Or is that not really too important in the overall pipeline? You just basically needed something to enumerate. Yeah. I needed something to enumerate for sure. And initially I was building this tool myself and essentially used a search procedure. But then I discovered this tool feed, which is just faster. And then I just replaced that component, which I previously designed myself.
Starting point is 00:13:50 But overall, I think the key that I wanted to discover is whether I can judge the evolution of SMT servers with this approach. Cool. So let's talk about the results and what you found then. So first of all, walk us through how you actually go about actually evaluating something like ET. Like what's the setup? Like how are you actually going about to prove your research questions? Yeah. So there are essentially two aspects here. Like ET is a powerful testing tool and it's a way to judge the evolution of SMT solvers. For the first goal, we essentially tried to do it in a traditional way, try to do a bug-hunting campaign and find bugs in the SMT solvers, essentially bugs that were not found before. For the...
Starting point is 00:14:45 Yeah, and we found a hundred bugs and most of them got fixed. And for the second part, which I was very excited about or am still super excited about the evolution, we essentially looked at all SMT solvers, all SMT solver releases from the last six years and stack them up. And then we ran these different grammars on each of them representing a different SMT theory.
Starting point is 00:15:16 And by that, because we cover all eight SMT theories, at least the official ones. This gave us a way of approximating the evolution over time, essentially. How for example, we found that in old versions of SMT solvers, there were many bugs triggered while the number of bugs and unique bugs went down as the versions got more recent. So that was quite stunning to see that we could even see that the fuzzing campaigns that have been conducted by other tools have affected this or have basically caused part of this downward curve to the new releases. So indeed we could at least in part validate that the fuzzing has benefited the SMT servers.
Starting point is 00:16:16 Yeah, that's really cool. So let's focus on the results some more then. So you kind of headlined it there that you found 100, 100, over 100 bugs. What was the breakdown there between sort of correctness and performance? So we essentially found quite a few correctness bugs and so many soundness bugs essentially and also performance bugs. So and also performance bugs. So it was pretty much half-half. So we found half of the bugs that we found are correctness bugs and the other half is performance bugs. That said, this tool is the first one that has enabled us to even file these performance bugs. Because as I mentioned earlier, it's very hard to triage
Starting point is 00:17:08 bugs in such complicated systems when the triggers are large. But it's almost impossible to do that for performance bugs because then you don't have a way to actually differentiate. When you have a crash or something, you can easily reduce the trigger to something that's smaller because there are tools for that. But if you have a timeout, each time you shrink down your initial bug trigger, you again have to wait to see whether the timeout happens. So it's not easy to shrink down triggers. That makes it even more important that these performance bugs have small triggers. And ET is, in my opinion, the first tool that can even do this. SMT is, in my opinion, the first tool that can even do this. And that's why we could use it to file a performance box in SMT solvers. Yeah, it's really cool.
Starting point is 00:18:15 Of the bugs that you found, Dominic, was there any one that was really surprising? Which one maybe has had the most impact, I guess? You uncovered something quite, yeah, not scary, but I'm not sure what the quite well I'm looking for as well. That was really surprising and that could have been a really critical issue for something. That's difficult to pin down. We found quite a few soundness issues, each of which could cause something like a false verified program if it's hitting that particular bug. So it's hard to say. Of these soundness bugs we found, I think it was about 13.
Starting point is 00:19:02 And there were quite a few stunning ones. I think out of these Sonos bugs, what impressed me the most is that the triggers were super small. Essentially, you see quite critical bug in SMT servers that's super small, which you thought would have been detected earlier, maybe even by the regression test suite, but it wasn't. And so that was a bit of a shock to see these small bugs that appear. Yeah, you often wonder like how, if they've had any sort of real world knock-on effects,
Starting point is 00:19:39 right? Like they've been lurking in there for a while and they might have, they could be the cause of something. But yeah, it's amazing that they should, that they exist now and that they've finally sort of been caught in the act. And what's the proportion of ones that have been fixed? Did you say? Sorry, I missed that.
Starting point is 00:19:53 And the proportion of the fixed ones is probably, it's larger now, but it was at the time, it was 40%. It's probably more like 70 now. The average of all the SMT solver bugs that I have ever filed is something like 80% of them get fixed. That's pretty good. What's the process like for getting them fixed? Do you ever go in there and fix them yourself? You're like, I've got to go and fix this damn thing. You're like, I'm going to go and fix this damn thing. I should actually do that. I've filed way too many bugs in SMT Solver.
Starting point is 00:20:28 That's why I haven't fixed bugs in SMT Solver. I should go ahead and do that. But in general, we try to shrink the bugs down as much as possible, then we file these bugs and we also try to not file duplicate bugs because that would just cause too much issues to the developers. Another point you mentioned a second ago that you can kind of see over time this sort of downward trend and that the fuzzing campaigns have actually had some positive impacts and you can sort of you can see this. Were there any cases of regressions between versions popping up where you kind of there was like it was
Starting point is 00:21:15 the overall trend was down but between I don't know version 5 and 6 there was actually an increase. Yeah yeah we have seen this quite a bit. We tested two solvers, solver families, Z3 and CVC5, which are the two biggest SMT solvers. In Z3, we haven't seen that very much, except for one theory. But for CVC4 or 5, we saw that because there was a big version shift from CVC4 to CVC5, where they essentially re-engineered the tool completely. And so this caused bugs in two or three theories out of eight. So we could see these issues both in terms of correctness and also in terms of performance bugs.
Starting point is 00:22:03 Yeah, that's really interesting to see that we don't write bug-free code. Yeah, we don't. Cool. So looking forward to the future now, how does ET evolve in the future to find more bugs in these? And I'm guessing there must be more bugs in there, right? So yeah, how do we go about realigning that and what does the evolution look like of this direction of research going forward? Yeah, I guess one crucial thing we are working on at the moment is to extend the evolution analysis to more solvers beside the
Starting point is 00:22:36 two big ones, because there are more than a dozen SMT solvers and we need to know the evolution of and we need to know the evolution of each of them. And on top of that, what would be super important is to go beyond academic bug hunting campaigns. What do I mean by that? Essentially, we need something like continuous integration of fuzzing pipelines for SMP servers because currently, essentially it's the case, once somebody wants to write a paper, has identified an approach that finds bugs effectively in SMT solvers or other tools, then spends time and compute, finds these bugs, files them, gets them fixed and writes the paper and essentially
Starting point is 00:23:25 publishes the tool in GitHub and that's it. Right. But what we need instead is something like continuous integration that we effectively integrate these files in a continuous integration pipeline. And so find bugs continuously. So that's the second point. Sure, yeah. On that, so the main SMT solvers like Zen 3, they are open source projects, right? And so how are they actually managed? So basically you need a way to get things
Starting point is 00:24:00 like ET into their pipeline essentially. And do they do that at the moment as you've got any communication with the developers of that project and sort of how you can get your stuff in there. So it is running continuously and is finding bugs as they're developing rather than waiting for them to release a version, then you can test it. Yeah, but I think the integration of this is also not necessarily as straightforward, right? Because there's some questions there. For example, when should we stop? Like with ET here, as a research effort, I just let ET generate a lot of test cases, one million per theory. And so that makes eight million if you have eight theories. And that's not feasible for a CI at the moment. So we have to do something smarter.
Starting point is 00:24:50 We might need to work with coverage to reduce the number of test cases that we have to run. Another problem that ET currently has is the fact that it sometimes creates alpha equivalent inputs, essentially just with different variable, same formulas with different variable names. And we also have one to tackle this issue. Cool. Yeah. Have you got any plans for sort of adapting ET to some other domains as well?
Starting point is 00:25:24 Outside SMT Solvers? Yeah. We already found some issues in JavaScript engines with it, but it's work in progress and it's basically connected to the idea of making it connected to the idea of reducing some of the alpha equivalent programs. But ET can be adjusted or applied to anything that has a context-free grammar. There are these big antler repositories where you have lots and lots of grammars for different tools. You can use ET out of the box and use these grammars. I think one caveat there is you have to limit the number of variables.
Starting point is 00:26:20 Currently for formulas, we use two and we also have to fix the number of constants. Otherwise, if we don't, then the enumeration would just go enumerate the integers or something. Cool. Yeah. The next section of the podcast, I want to focus on impact. We touched on it slightly a second ago about if you want to see it being integrated into CI pipelines more and things like that, but kind of with ET and obviously your wider work in this area, what impact do you think it can have on
Starting point is 00:26:56 the community and on industry as well? Oh, that's a big question. I mean, I hope the underlying vision of the research or the underlying vision of my research specifically is to do something like formal methods engineering. Formal methods engineering should be a discipline between formal methods and software engineering. And what we want to do essentially there is we want to make sure former methods tools are correct and reliable, because the promise of former methods tools is to make software reliable.
Starting point is 00:27:35 And if these tools are not reliable themselves, that's not a trustworthy endeavor itself. So that's one prerequisite. And the second is to be able to combine formal methods because currently they are often used in isolation. There are so many form and methods out there with lots of different specification languages which have overlaps and all that. I hope to make a contribution to consolidating some of these as well. Of course, the third big challenge I see is the specification crisis. Essentially we still write software largely without specifications.
Starting point is 00:28:29 And so I also want to make a contribution towards that, such that we have better tools that incentivize people to write as specifications. And why do we need this? In order to apply these And why do we need this in order to apply these techniques from methods to us which need specifications? Otherwise it's not possible to verify or even test something. Yeah, I definitely agree with you. You kind of, the guarantee with follow methods that this is, if this says that it's correct, is that yeah, it's actually correct. Right?
Starting point is 00:29:02 So you kind of want, if you don't have that guarantee, then it undermines confidence in those tools. Right yeah, it's actually correct. Right. So you kind of want you, if you don't have that guaranteed, then sort of you undermines confidence in those tools. Right. So it's super important. I agree with that principle and the other ones you mentioned there as well. And I'm going to ask you, the next question I'm going to ask is a bit of topic, but it's only because I was on a, I was on a panel yesterday for databases in, and the role of databases in AI and AI is everywhere at the minute, everyone's talking. Right.
Starting point is 00:29:24 So I'm going to, unfortunately, I'm going to have to ask you the role of databases in AI. And AI is everywhere at the minute, Dom and Craig, everyone's talking about it. So I'm gonna, unfortunately, I'm gonna have to ask you the question of what are the implications for the new way that Gen.AI and AI are, what are there any effects of that on the formal methods community? And are there any things that we can learn from that or from formal methods engineering,
Starting point is 00:29:40 this sort of the idea of combining these two disciplines, can that be applied to the AI domain as well? Because obviously having confidence in the output of some of these LLMs and stuff like that, it's very hard to have confidence in them. So I don't know if there's lessons that can be learned from one community to the other. Yeah, there's definitely lots of lessons that might be learned from that. Although I cannot specifically comment on the big picture, because if I could, then the whole problem would be resolved and we could just trust our chat GPT or whatever language model one uses. But I think one particular interesting aspect related to
Starting point is 00:30:26 formal methods engineering would be to have something like software validation portfolios which take a prompt, a natural language prompt, and then apply whatever possible software validation technique could be testing, could be verification, could be whatever it is. And then as an output, either provide bugs or if there's no bugs, provide some certificate for the rigor of the methods that have been applied. So that would be like a grand vision to use part of this AI, mostly language-related, LLM-related AI as a front end to a verification back end. Yeah, I can see that being really useful.
Starting point is 00:31:29 Yeah, that'd be a nice, interesting crossover between the two. I now want to change gears a little bit. We spoke a lot of surprises across the chat a little bit, but whilst, there's going to be two questions here, Dom,. The first one is going to be this localized to ET. And I said, like whilst working on the ET project, what's been the biggest surprise? And then I want to take that question and then expand that to the sort of the wider field of follow methods. And so I've get your take on both.
Starting point is 00:31:56 And what's been your biggest thing? Like, Oh my God, I did not expect this talk. Call me totally off guard. Yeah, I guess one of the biggest surprises was indeed that I could find so many small bugs in SMT servers still after all these years of testing. And the other big surprise was that indeed it's possible to see some evolution. I was shocked to see that, I mean, I shouldn't be shocked, but I was shocked to see that. I mean, I shouldn't be shocked, but I was shocked to see that there's a clear cut correlation between where the fuzzing campaign started and where the bugs went down.
Starting point is 00:32:33 I wasn't expecting to see that in the evolution plot. So those were the two biggest surprises, I should say. Yeah. And like visually to kind of just look at it, but you're thinking, oh my God, this just maps perfectly. Yeah. That must've been a really just look at it. But you're thinking, oh my God, this just maps perfectly. Yeah, that must have been a really nice experience. Yeah. And I guess some question B of this question for the wider field then, I guess, as well in your wider time in research. Yeah.
Starting point is 00:32:58 So for the wider time in research, of course, right, I refer, I probably will refer to the time that I spend in my PhD, which I was always fascinated by these proving tools such as SAS solvers, SMT solvers, or optimization solvers and was playing around with these three in particular, which I fell in love. And so one thing I did when I essentially arrived in the new lab, I found a colleague of mine was testing SAT solvers. And we were kind of surprised how reliable these tools are. There were very, very little bugs we could find even though we would throw millions of test inputs at these tools. And then we came
Starting point is 00:33:59 up with the idea of testing SMT servers and we weren't taking it too seriously. Then we applied one of our methods to SMT servers and we were surprised that we could actually find soundless issues in SMT servers automatically. That was a really big eureka effect, I would say. It's probably the biggest one I had in my research career when there was essentially this tool spitting out a soundness issue in Z3 and we were validating it. We were basically super excited and shouting in the hall of ETH, which has this old former chemistry building. So that was really crazy, I should say, that indeed we could find bugs in SMT solvers, which before was not clear. There were just a few soundless bugs that were found by applications for the most part. But
Starting point is 00:35:03 the fact that we could find them automatically and then we could find so many more that that was really, really exciting. Yeah. I bet that must've been a great experience. That's what it's all about, right? We all want those sort of eureka moments. That's sort of the, that's what it's all about. That's what it's all about.
Starting point is 00:35:17 So yeah, that's a, that's really cool. So I guess now we've, we've, we've spoken with ET and what not. I, it'd be interesting to hear about some of the other research you've got going on as well and sort of to dig deeper into the vision you have for your new lab at the University of Manchester as well. So yeah, tell us about the other cool stuff I know you're working on Dominik. Yeah, so I think it's a bit unusual, but most of my research is centered about testing these SMT solvers.
Starting point is 00:35:51 For the most part, I've devoted my entire PhD on attacking this one problem. And now, instead of just finding bugs in SMT solvers, I'm also trying to make SMT solvers better in the sense that I want to design approaches how we can bypass bugs. Because there's always a lot of bugs and we cannot, at least at the moment, we cannot produce large-scale verified systems. One thing I take advantage of or one thought that really caught my attention is we can use instability in big software to our advantage. So essentially what we can do with something like an SMT server or compiler, we can massage the input in equivalent ways such that we have lots and lots of different equivalent
Starting point is 00:36:57 inputs and then let these inputs process in the tools in parallel. So the idea is a bit similar to N version programming, but it's not like that we have N versions of the software, but we have N versions of the input to the software. And essentially then use an oracle at the output to decide which output we take. And I found that this ends instability, because instability is currently a big issue in the former methods community. Imagine if there's a change to an SMT solver. It could prove some fancy proof yesterday, but now there was a commit that changed a little bit of logic in SMT solver. Now it no longer can
Starting point is 00:37:54 prove the same theorem. So that issue can be, I would claim, partially tackled by parallelism can be, I would claim, can be partially tackled by parallelism and basically turning the testing idea around. Instead of producing these equivalent or different inputs to trigger bugs, we essentially correct the software by that. So that's the current thought I'm a bit addicted to at the moment. That's really cool. Yeah. I kind of, that's a very nice segue into it. So I guess my penultimate question for you Dominic
Starting point is 00:38:34 and that is how you actually go about arriving at that point, right? And having this sort of creative inspiration and the creative process. And so do you have a systematic way of attacking the creativity and generating ideas? And then obviously once you've generated those saying this is the one that's going to be the one for me worth pursuing for a significant amount of time.
Starting point is 00:38:57 Oh, that's a great question. I love to talk about it with people all the time because I have thought about this a lot, how to develop, but I don't actually have such a systematic way. I would love to, but I don't have that. Unfortunately, these eureka moments or these insight moments are quite rare. moments are quite rare. And this one on instability, I had one when I was basically under extreme pressure in front of my PhD defense. So the night before, or the day before, essentially my advisor told me that my presentation is a bit off. And then I had to redo a bunch of things. And so when I was done, essentially, once I was done, I thought, okay, what else should I do? Because the script changed a little bit.
Starting point is 00:39:55 So I thought, what is the meanest questions people could ask about this research? people could ask about this research. And so I came up with the question, why essentially is this all useful? Why would you even spend so much time to find all these bugs when indeed what you could do, you just reformulate the end versions of the input formula, use a bit of more compute or substantially more because nowadays at least workstations have hundreds of cores. Use those cores and then who cares about the bugs, right? If you find one particular input where the solver produces correct result, then just take that. And so essentially that was an answer to this very mean question.
Starting point is 00:40:55 Yeah, that's a really interesting way to approach it, right? So flip it on its head and sort of look at it from, I guess it's almost like putting yourself in somebody else's shoes in that sense and then thinking of sort of the possible ways they can ask you that. And yeah, that's a really interesting way to sort of tackle it and a nice heuristic or a nice tool in the toolkit for thinking about a problem. But it was not deliberate at all, right? It was totally accidental, right? It was kind of as a rehearsal for the actual defense the next day. But it's cool that you kind of learned that and you kind of had the observation. I wonder if it has the same sort of efficacy if you are doing it not under pressure. Your brain probably isn't
Starting point is 00:41:42 working quite a different way. Maybe you need the pressure element as well for it to work. I don't know. I think at least I do. Yeah. Cool. Well, we're arrived at the time for the last word now. And so yeah, what's the one thing you want the listener to leave this podcast episode with today, the one takeaway?
Starting point is 00:42:05 Yeah. leave this podcast episode with today, the one takeaway. Yeah, I think the one takeaway is essentially I want to have people embrace the idea of combining or at least bridging the gap between software engineering and form and efforts. And in that sense, we need something like formal methods. My claim is we need something like formal methods engineering, which takes the nice aspects of the software engineering community and applies it to formal methods. It systematizes formal methods, makes the tools better, pragmatic approaches, how to combine the tools, consolidate them and so on. So essentially it's a call for formal methods researchers to publish more of their work
Starting point is 00:42:55 in software engineering venues where people like me certainly would appreciate it. It's a good call to action there. And at the seminar, we're also a big fan of bridging the gap. So nice use of that phrase as well. So yeah, that's fantastic, Dominic. Thank you very much. It's been an absolute pleasure to talk with you today. Any of the relevant materials you've mentioned, we'll drop links in the
Starting point is 00:43:16 show notes as well, and so people can, we'll put you, are you on social media Dominic anywhere? Are you on the? Yeah, all over the place. Okay, cool. And we'll link... Yeah, all over the place. Okay, cool. And we'll, we'll, we'll link those in the show notes as well. And yeah, thank you very much. It's been a lovely chat.
Starting point is 00:43:31 Yeah, thank you, Jack. I'm very appreciated. Cool. And we'll see you all next time for some more awesome computer science research.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.