Disseminate: The Computer Science Research Podcast - Dominik Winterer | Validating SMT Solvers for Correctness and Performance via Grammar-based Enumeration | #63
Episode Date: July 25, 2025In this episode of the Disseminate podcast, Dominik Winterer discusses his research on SMT (Satisfiability Modulo Theories) solvers and his recent OOPSLA paper titled "Validating SMT Solvers for Corre...ction and Performance via Grammar Based Enumeration". Dominik shares his academic journey from the University of Freiburg to ETH Zurich, and now to a lectureship at the University of Manchester. He introduces ET, a tool he developed for exhaustive grammar-based testing of SMT solvers. Unlike traditional fuzzers that use random input generation, ET systematically enumerates small, syntactically valid inputs using context-free grammars to expose bugs more effectively. This approach simplifies bug triage and has revealed over 100 bugs—many of them soundness and performance-related—with a striking number having already been fixed. Dominik emphasizes the tool’s surprising ability to identify deep bugs using minimal input and track solver evolution over time, highlighting ET's potential for continuous integration into CI pipelines.The conversation then expands into broader reflections on formal methods and the future of software reliability. Dominik advocates for a new discipline—Formal Methods Engineering—to bridge the gap between software engineering and formal verification tools. He stresses the importance of building trustworthy verification tools since the reliability of software increasingly depends on them. Dominik also discusses adapting ET to other domains, such as JavaScript engines, and suggests that grammar-based enumeration can be applied widely to any system with a context-free grammar. Addressing the rise of AI, he envisions validation portfolios that integrate formal methods into LLM-based tooling, offering certified assessments of model outputs. He closes with a call for the community to embrace pragmatic, systematic, and scalable approaches to formal methods to ensure these tools can live up to their promises in real-world development settings.Links:Dominik's HomepageValidating SMT Solvers for Correctness and Performance via Grammar-Based Enumeration Hosted on Acast. See acast.com/privacy for more information.
Transcript
Discussion (0)
Disseminate the Computer Science Research Podcast.
Hey folks and welcome to another episode of Disseminate the Computer Science Research
Podcast.
Today, we're going to be talking about a paper from OOPSLA, which is my favourite acronym
of all the conferences, by the way.
So yeah, that stands for those who don't know.
It stands for Objects Oriented Programming Systems, Languages and Applications.
And the specific paper we're going to be talking about is validating
SMT solvers for correction and performance via grammar-based enumeration.
Now this paper was written by Dominic Winterer and his colleague, Zheng Dongsu.
And I'm really happy to say I've got Dominic with me today on the podcast.
So welcome Dominic.
Oh, thank you, Jack, for the nice introduction.
Cool. So yeah, I guess we'll go into your journey in a second, Dominic. Thank you, Jack, for the nice introduction. Cool.
So yeah, I guess we'll go into your journey in a second, Dominic, but you finished your
PhD in your postdoc and you're going to be an incoming lecturer at the University of
Manchester in September.
And you're hiring, right?
So we can get this plugging straight away as well.
So to any listeners who are interested in formal methods and whatnot, do reach out
to Dominic. But yeah, cool. So welcome to the show. And it's always customary on the podcast for the
guests to start off telling us about their story. So yeah, tell us about your background and how you
became interested in formal methods and your research area. Yeah, basically. I grew up around computers in the early 2000s.
So that sparked a great interest when my father was building a computer inside our living
room.
And then I started doing my studies in the University of Freiburg in Germany, which is
a beautiful and enjoyable place.
And their logic and math and these things were really the focus. And in order to pass
a computer science degree, you had to master these things. And during this time, I also learned a lot of several amazing mentors that taught me research and wanted me to get
interested in their research on symbolic AI where Freiburg at the time was still a big
player and a great place to be.
And so the pivotal moment essentially which tied me to research career was essentially when I
saw a research lab come together in front of a deadline. The kind of energy that was there
was really amazing and that really started everything. I did a research internship at the IBM in the US.
And then as you said, I did my PhD at ETH Zurich and postdoc.
And now I'm happy to join the University of Manchester in September.
That's fantastic.
Yeah.
So you said that Frye Beggs is a beautiful place.
And that's the one right on the edge of the Black Forest, right?
All around that part of the world, right?
That's right.
Yeah.
I remember going there as a kid and yeah, I really loved it. It was a beautiful part of the world, right? That's right. Yeah. Yeah. I remember going there as a kid and yeah,
I really loved it, Gupa part of the world.
Cool.
So yeah, and obviously Manchester next,
that's going to be fantastic as well.
And sort of building your own lab, right?
And sort of fleshing that out.
And that maybe, I guess some of the experience
you mentioned there that was inspiring,
I guess you're going to try and put that into practice again
for your, on your own path, right?
So that's really cool.
And I hope that goes well for you.
So next, I guess, let's ground ourselves
for this chat today and set some background
and some context and some of the key terms.
So yeah, I guess, I mean, for the uninitiated,
yeah, what are SMT solvers and what's a grammar?
And yeah, tell us about all these really,
these key words we're gonna need to understand
to have for this chat today, don't we?
Yeah, I may add to the previous question maybe that I got interested in this formal methods
software engineering world by combining essentially what I had in Freiburg, which was formal methods,
symbolic AI research was kind of similar. And when I got to ETH, a very practical software
engineering and I built those two, I merged and that's how I got interested in SMT servers,
essentially. SMT servers are, you can think of powerful theorem provers that take formulas
and decide whether these formulas are satisfiable.
That means whether there's an assignment to the variables in the formulas
or whether they are unsatisfiable.
And so these SMT servers are nowadays used as a critical infrastructure
for software research and industry, specifically in testing and verification,
for example, and symbolic execution or also for verification of whatever is necessary
to verify.
So, these are SMT servers.
SMT servers are complex pieces of software.
They have a lot of logic and intricate algorithms inside of them.
They solve NP-hard problems and they are very important for software industry and research.
And so, yeah, maybe a bit of background on grammars. All you need to know is pretty much computer science
101. Context-free grammar is essentially very much represented as you would write it down
on a paper. It's similar in a file. They are used for what I also should give some background off is a black box testing,
which is the goal is to create as many test inputs for a particular software to fire at
and using these inputs, we want to find box.
And grammar-based testing is we use a grammar to describe the inputs that we want to generate
and throw it at the software on the test, essentially.
Okay, cool. Sorry, Donnie. Can you continue, please?
No, I just wanted to add that we aim to test SMT servers in that case.
Cool. Yeah. So you mentioned that fuzz testing is sort of a traditional way of testing
these types of solvers or these solvers.
So what are the limitations of those approaches and that sort of make the
grammar based approach to be more advantageous?
So you can think of the different ways of...there are several ways of testing.
You can do mutations on existing inputs or you can basically describe the inputs from
the ground up, which you can do by grammar-based.
So with grammar-based, I wouldn't say that one is superior to the other.
It's more that they are orthogonal.
We use grammar-based testers in this case because we wanted to enumerate inputs from the ground up from smallest to larger inputs. Okay, cool. That's a nice segue then into the elevator pitch
of the paper and the motivation of the paper. What was the overarching goal of the paper and the motivation of the paper. So yeah, what was the overarching
goal of the paper? Maybe there's some sub-goals in there as well.
So the overarching goal of the paper was to... Because at the time,
SMT servers were tested quite a lot and there were several buffeting campaigns have been done,
and many bugs have been found and many bugs have been found
and many fixes have been done by the developers. But all these SMT solver testers were unsystematic
or essentially random based. And with random based fuzzers, it's unclear when to stop.
Usually the bugs that they trigger were very sizable such that you need a lot of time
to reduce these bug triggers to a size where developers can even think about dry aging
these bugs.
And so with all these tests and all these bugs, it was also not quite clear whether SMT solvers do even become better
over time. Although there was a big academic effort to find more and more bugs in SMT solvers.
And so the motivation for us was to try and first of all twofold. Make it very simple. Try to find the most
simplest bugs that are not causing too much trouble on the developer's side. So we want
super small formulas and we also want to, with these super small formulas, we want to judge the evolution of
SMT servers.
And we can do that because we have done an evaluation before.
I found that 85% of all bug triggers in SMT server reduce indeed to small bugs.
So we're quite confident that we find a lot of important bugs with this method.
Yeah, that's cool.
I really liked the sort of the goal of sort of trying to evaluate this
sort of correctness and performance over time.
It was a really nice, nice, nice aspect of a really nice goal to have.
So let's get into how you went about achieving these goals then.
So walk us through the sort of the design of the key components.
And the thing you designed is called ET, right?
So we need to know why it's called ET as well at some point as well.
And yeah, tell us about how you went about designing ET and what the key components were.
Yeah.
So maybe first on the name, ET should be shortened for enumerative or exhaustive tester or also extraterrestrial.
Because when we were initially finding the first bugs with this tool, it was quite magical that
they were super small and we did not have to do anything to reduce them further. We could just
not have to do anything to reduce them further. We could just give them away to developers. How ET works is pretty simple. We formulate context-free grammars for the different SMT
theories. We have grammars for the core theories, essentially Boolean logic, then we have for arithmetic, strings,
and so on. And then we compile these grammars into algebraic data types in Haskell. And
then we use an off-the-shelf tester called Feet that can then enumerate the inputs. So it's more about the insight of finding small bugs rather than
the technology or the key idea. So in a sense, the real contribution here is really to
keep the bug small and investigate evolution rather than making a technical or conceptual approach contribution.
Yeah, just for the uninitiated here, Dominic, when you say a small bug versus a large bug,
what could you give us a concrete example of that and what you actually mean there by
the difference between small and large?
Yeah, so essentially what I was talking about or was referring to as bug is not quite a bug.
It's a bug trigger. It's an input that can manifest a bug in a software. If that input is
very large in size, which means, for example, in SMT, it means like a nested formula with lots of
variables, lots of nested components.
That's difficult for the developers to find out why this formula triggered a bug and what
the root cause of it is. On the contrary, you have a very small bug trigger. It's minimal even. Then you can
at least think about which of the components that you see in the input could be causal for the root or could cause this bug, which
makes the difference whether developers can even look at bugs or not.
If they're small enough, they might just straight out reject two large bugs.
Okay, Cool. I get it. So I guess, um, when, when you were, um, building the, the, the, the, the ET
pipeline and these various different steps in there, you said that you just, you
grabbed kind of an off the shelf tool for, for doing the enumeration.
And what made you, what made you choose that specific tool?
I think it's called feet, right?
It is.
What was the reasons for choosing that versus something else?
Or is that not really too important in the overall pipeline? You just basically needed something to enumerate.
Yeah. I needed something to enumerate for sure. And initially I was building this tool
myself and essentially used a search procedure. But then I discovered this tool feed, which
is just faster. And then I just replaced that component, which I previously designed myself.
But overall, I think the key that I wanted to discover is whether I can judge the evolution
of SMT servers with this approach. Cool. So let's talk about the results
and what you found then. So first of all, walk us through how
you actually go about actually evaluating something like ET. Like what's the setup?
Like how are you actually going about to prove your research questions?
Yeah. So there are essentially two aspects here. Like ET is a powerful testing tool and it's a way to judge the evolution of SMT solvers. For the first goal, we essentially
tried to do it in a traditional way, try to do a bug-hunting campaign and find bugs in the SMT
solvers, essentially bugs that were not found before. For the...
Yeah, and we found a hundred bugs
and most of them got fixed.
And for the second part, which I was very excited about
or am still super excited about the evolution,
we essentially looked at all SMT solvers,
all SMT solver releases from the last six years and stack them up.
And then we ran these different grammars on each of them representing a different SMT
theory.
And by that, because we cover all eight SMT theories, at least the official ones. This gave us a way of approximating the evolution over time, essentially.
How for example, we found that in old versions of SMT solvers, there were many bugs triggered
while the number of bugs and unique bugs went down as the versions got more recent.
So that was quite stunning to see that we could even see that the fuzzing campaigns
that have been conducted by other tools have affected this or have basically caused part
of this downward curve to the
new releases. So indeed we could at least in part validate that the fuzzing has benefited
the SMT servers.
Yeah, that's really cool. So let's focus on the results some more then. So you kind of
headlined it there that you found 100, 100, over 100 bugs. What was the
breakdown there between sort of correctness and performance? So we essentially found quite a few
correctness bugs and so many soundness bugs essentially and also performance bugs. So
and also performance bugs. So it was pretty much half-half.
So we found half of the bugs that we found are correctness bugs and the other half is
performance bugs.
That said, this tool is the first one that has enabled us to even file these performance bugs. Because as I mentioned earlier, it's very hard to triage
bugs in such complicated systems when the triggers are large. But it's almost impossible to do that
for performance bugs because then you don't have a way to actually differentiate. When you have a crash or something, you can easily
reduce the trigger to something that's smaller because there are tools for that. But if you
have a timeout, each time you shrink down your initial bug trigger, you again have to wait to see whether the timeout happens. So it's not easy to
shrink down triggers. That makes it even more important that these performance bugs
have small triggers. And ET is, in my opinion, the first tool that can even do this.
SMT is, in my opinion, the first tool that can even do this. And that's why we could use it to file a performance box in SMT solvers.
Yeah, it's really cool.
Of the bugs that you found, Dominic, was there any one that was really surprising?
Which one maybe has had the most impact, I guess? You uncovered
something quite, yeah, not scary, but I'm not sure what the quite well I'm looking for as well.
That was really surprising and that could have been a really critical issue for something.
That's difficult to pin down. We found quite a few soundness issues, each of which could cause something like a
false verified program if it's hitting that particular bug.
So it's hard to say.
Of these soundness bugs we found, I think it was about 13.
And there were quite a few stunning ones.
I think out of these Sonos bugs, what impressed me the most is that the triggers were super
small.
Essentially, you see quite critical bug in SMT servers that's super small, which you
thought would have been detected earlier, maybe even by the regression test
suite, but it wasn't.
And so that was a bit of a shock to see these small bugs that appear.
Yeah, you often wonder like how, if they've had any sort of real world knock-on effects,
right?
Like they've been lurking in there for a while and they might have, they could be the cause
of something.
But yeah, it's amazing that they should, that they exist now and that they've finally
sort of been caught in the act.
And what's the proportion of ones that have been fixed?
Did you say?
Sorry, I missed that.
And the proportion of the fixed ones is probably, it's larger now, but it was at the time, it
was 40%.
It's probably more like 70 now.
The average of all the SMT solver bugs that I have ever filed is something like 80% of them get fixed.
That's pretty good. What's the process like for getting them fixed? Do you ever go in there and
fix them yourself? You're like, I've got to go and fix this damn thing.
You're like, I'm going to go and fix this damn thing. I should actually do that.
I've filed way too many bugs in SMT Solver.
That's why I haven't fixed bugs in SMT Solver.
I should go ahead and do that.
But in general, we try to shrink the bugs down as much as possible, then we file these bugs and we also try to
not file duplicate bugs because that would just cause too much issues to the developers.
Another point you mentioned a second ago that you can kind of see over time this sort of
downward trend and that the fuzzing campaigns have actually had some
positive impacts and you can sort of you can see this. Were there any cases of
regressions between versions popping up where you kind of there was like it was
the overall trend was down but between I don't know version 5 and 6 there was
actually an increase. Yeah yeah we have seen this quite a bit.
We tested two solvers, solver families, Z3 and CVC5, which are the two biggest SMT solvers.
In Z3, we haven't seen that very much, except for one theory.
But for CVC4 or 5, we saw that because there was a big version shift from CVC4 to CVC5,
where they essentially re-engineered the tool completely.
And so this caused bugs in two or three theories out of eight.
So we could see these issues both in terms of correctness and also in terms of performance bugs.
Yeah, that's really interesting to see that we don't write bug-free code.
Yeah, we don't.
Cool.
So looking forward to the future now, how does ET evolve in the future to find more
bugs in these?
And I'm guessing there must be more bugs in there, right?
So yeah, how do we go about realigning that and what does the evolution look like of this direction of research going forward? Yeah, I guess one crucial thing we
are working on at the moment is to extend the evolution analysis to more solvers beside the
two big ones, because there are more than a dozen SMT solvers and we need to know the evolution of
and we need to know the evolution of each of them. And on top of that, what would be super important is to go beyond academic bug hunting campaigns.
What do I mean by that?
Essentially, we need something like continuous integration of fuzzing pipelines for SMP servers
because currently, essentially it's the case,
once somebody wants to write a paper, has identified an approach that finds
bugs effectively in SMT solvers or other tools, then spends time and compute,
finds these bugs, files them, gets them fixed and writes the paper and essentially
publishes the tool in GitHub and that's it.
Right.
But what we need instead is something like continuous integration that we
effectively integrate these files in a continuous integration pipeline.
And so find bugs continuously.
So that's the second point.
Sure, yeah. On that, so the main SMT solvers like Zen 3, they are open source projects,
right? And so how are they actually managed? So basically you need a way to get things
like ET into their pipeline essentially. And do they do that at the moment as you've got any communication with the developers of that project and sort of how you can get your
stuff in there. So it is running continuously and is finding bugs as they're developing rather than
waiting for them to release a version, then you can test it. Yeah, but I think the integration of
this is also not necessarily as straightforward, right? Because there's some questions there. For example,
when should we stop? Like with ET here, as a research effort, I just let ET generate
a lot of test cases, one million per theory. And so that makes eight million if you have
eight theories. And that's not feasible for a CI at the moment.
So we have to do something smarter.
We might need to work with coverage to reduce the number of test cases that we have to run.
Another problem that ET currently has is the fact that it sometimes creates alpha equivalent inputs,
essentially just with different variable, same formulas with different
variable names.
And we also have one to tackle this issue.
Cool.
Yeah.
Have you got any plans for sort of adapting ET to some other domains as well?
Outside SMT Solvers?
Yeah.
We already found some issues in JavaScript engines with it, but it's
work in progress and it's basically connected to the idea of making it
connected to the idea of reducing some of the alpha equivalent programs. But ET can be adjusted or applied to anything that has a context-free grammar. There are these big antler repositories
where you have lots and lots of grammars for different tools.
You can use ET out of the box and use these grammars.
I think one caveat there is you have to limit the number of variables.
Currently for formulas, we use two and we also have to fix the number of constants.
Otherwise, if we don't, then the enumeration would just go enumerate the integers or something.
Cool.
Yeah.
The next section of the podcast, I want to focus on impact.
We touched on it slightly a second ago about if you want to see it being integrated into
CI pipelines more and things like that, but kind of with ET and
obviously your wider work in this area, what impact do you think it can have on
the community and on industry as well?
Oh, that's a big question.
I mean, I hope the underlying vision of the research or the underlying vision
of my research specifically is to do something like formal methods engineering. Formal methods
engineering should be a discipline between formal methods and software engineering. And
what we want to do essentially there is we want to make sure former methods tools
are correct and reliable, because the promise of former methods tools is to make software
reliable.
And if these tools are not reliable themselves, that's not a trustworthy endeavor itself.
So that's one prerequisite.
And the second is to be able to combine formal methods because currently they are
often used in isolation. There are so many form and methods out there with lots of different specification
languages which have overlaps and all that.
I hope to make a contribution to consolidating some of these as well.
Of course, the third big challenge I see is the specification crisis.
Essentially we still write software largely without specifications.
And so I also want to make a contribution towards that, such that we have better tools that
incentivize people to write as specifications. And why do we need this? In order to apply these
And why do we need this in order to apply these techniques from methods to us which need specifications?
Otherwise it's not possible to verify or even test something.
Yeah, I definitely agree with you.
You kind of, the guarantee with follow methods that this is, if this
says that it's correct, is that yeah, it's actually correct.
Right?
So you kind of want, if you don't have that guarantee, then it undermines confidence in those tools. Right yeah, it's actually correct. Right. So you kind of want you, if you don't have that guaranteed, then sort of you undermines confidence in those tools.
Right.
So it's super important.
I agree with that principle and the other ones you mentioned there as well.
And I'm going to ask you, the next question I'm going to ask is a bit of topic, but
it's only because I was on a, I was on a panel yesterday for databases in, and
the role of databases in AI and AI is everywhere at the minute, everyone's talking.
Right.
So I'm going to, unfortunately, I'm going to have to ask you the role of databases in AI. And AI is everywhere at the minute, Dom and Craig, everyone's talking about it. So I'm gonna, unfortunately,
I'm gonna have to ask you the question
of what are the implications for the new way
that Gen.AI and AI are,
what are there any effects of that
on the formal methods community?
And are there any things that we can learn from that
or from formal methods engineering,
this sort of the idea of combining these two disciplines,
can that be applied to the AI domain as well?
Because obviously having confidence in the output of some of these LLMs and stuff like that,
it's very hard to have confidence in them. So I don't know if there's lessons that can be
learned from one community to the other. Yeah, there's definitely lots of lessons
that might be learned from that. Although I cannot specifically comment on the big picture,
because if I could, then the whole problem would be resolved and we could just trust
our chat GPT or whatever language model one uses. But I think one particular interesting aspect related to
formal methods engineering would be to have something like software validation portfolios
which take a prompt, a natural language prompt, and then apply whatever possible software validation technique could be testing, could
be verification, could be whatever it is.
And then as an output, either provide bugs or if there's no bugs, provide some certificate
for the rigor of the methods that have been applied. So that would be like
a grand vision to use part of this AI, mostly language-related, LLM-related AI as a front end
to a verification back end.
Yeah, I can see that being really useful.
Yeah, that'd be a nice, interesting crossover between the two.
I now want to change gears a little bit.
We spoke a lot of surprises across the chat a little bit, but whilst, there's
going to be two questions here, Dom,. The first one is going to be this localized to ET.
And I said, like whilst working on the ET project, what's been the biggest
surprise? And then I want to take that question and then expand that to the sort
of the wider field of follow methods.
And so I've get your take on both.
And what's been your biggest thing?
Like, Oh my God, I did not expect this talk.
Call me totally off guard.
Yeah, I guess one of the biggest surprises was indeed that I could find so many small
bugs in SMT servers still after all these years of testing.
And the other big surprise was that indeed it's possible to see some evolution.
I was shocked to see that, I mean, I shouldn't be shocked, but I was shocked to see that. I mean, I shouldn't be shocked, but I was shocked to see that there's a
clear cut correlation between where the fuzzing campaign started and where the bugs went down.
I wasn't expecting to see that in the evolution plot. So those were the two biggest surprises,
I should say. Yeah. And like visually to kind of just look at it, but you're thinking, oh my God,
this just maps perfectly. Yeah. That must've been a really just look at it. But you're thinking, oh my God, this just maps perfectly.
Yeah, that must have been a really nice experience.
Yeah.
And I guess some question B of this question for the wider field then, I guess, as well
in your wider time in research.
Yeah.
So for the wider time in research, of course, right, I refer, I probably will refer to the time that I spend
in my PhD, which I was always fascinated by these proving tools such as SAS solvers, SMT
solvers, or optimization solvers and was playing around with these three in particular, which
I fell in love.
And so one thing I did when I essentially arrived in the new lab, I found a colleague
of mine was testing SAT solvers.
And we were kind of surprised how reliable these tools are. There were very, very little bugs we could
find even though we would throw millions of test inputs at these tools. And then we came
up with the idea of testing SMT servers and we weren't taking it too seriously. Then we applied one of
our methods to SMT servers and we were surprised that we could actually find soundless issues in
SMT servers automatically. That was a really big eureka effect, I would say. It's probably the
biggest one I had in my research career when there
was essentially this tool spitting out a soundness issue in Z3 and we were validating it. We were
basically super excited and shouting in the hall of ETH, which has this old former chemistry building. So that was really
crazy, I should say, that indeed we could find bugs in SMT solvers, which before was not clear.
There were just a few soundless bugs that were found by applications for the most part. But
the fact that we could find them
automatically and then we could find so many more that that was really, really exciting.
Yeah.
I bet that must've been a great experience.
That's what it's all about, right?
We all want those sort of eureka moments.
That's sort of the, that's what it's all about.
That's what it's all about.
So yeah, that's a, that's really cool.
So I guess now we've, we've, we've spoken with ET and what not.
I, it'd be interesting to hear about some of
the other research you've got going on as well and sort of to dig deeper into the vision
you have for your new lab at the University of Manchester as well.
So yeah, tell us about the other cool stuff I know you're working on Dominik.
Yeah, so I think it's a bit unusual, but most of my research is centered about testing these
SMT solvers.
For the most part, I've devoted my entire PhD on attacking this one problem.
And now, instead of just finding bugs in SMT solvers, I'm also trying to make SMT
solvers better in the sense that I want to design approaches how we can bypass bugs.
Because there's always a lot of bugs and we cannot, at least at the moment, we cannot
produce large-scale verified systems.
One thing I take advantage of or one thought that really caught my attention is we can
use instability in big software to our advantage. So essentially what we can do with something like an SMT server or compiler, we can massage
the input in equivalent ways such that we have lots and lots of different equivalent
inputs and then let these inputs process in the tools in parallel.
So the idea is a bit similar to N version programming, but it's not like that we have
N versions of the software, but we have N versions of the input to the software.
And essentially then use an oracle at the output to decide which output we take.
And I found that this ends instability, because instability is currently a big issue in the
former methods community.
Imagine if there's a change to an SMT solver. It could prove some fancy proof yesterday,
but now there was a commit that changed a little bit of logic in SMT solver. Now it no longer can
prove the same theorem. So that issue can be, I would claim, partially tackled by parallelism
can be, I would claim, can be partially tackled by parallelism and basically turning the testing idea around. Instead of producing these equivalent or different inputs to trigger bugs, we essentially
correct the software by that. So that's the current thought I'm a bit addicted to
at the moment.
That's really cool.
Yeah.
I kind of, that's a very nice segue into it.
So I guess my penultimate question for you Dominic
and that is how you actually go about arriving
at that point, right?
And having this sort of creative inspiration
and the creative process.
And so do you have a systematic way of attacking the
creativity and generating ideas?
And then obviously once you've generated those saying this is the one that's going
to be the one for me worth pursuing for a significant amount of time.
Oh, that's a great question.
I love to talk about it with people all the time because I have thought about this a lot, how to develop,
but I don't actually have such a systematic way. I would love to, but I don't have that.
Unfortunately, these eureka moments or these insight moments are quite rare.
moments are quite rare. And this one on instability, I had one when I was basically
under extreme pressure in front of my PhD defense. So the night before, or the day before, essentially my advisor told me that my presentation is a bit off. And then I had to redo a bunch of things.
And so when I was done, essentially, once I was done, I thought, okay, what else should I do?
Because the script changed a little bit.
So I thought, what is the meanest questions people could ask about this research?
people could ask about this research. And so I came up with the question, why essentially is this all useful?
Why would you even spend so much time to find all these bugs when indeed what you could do, you just reformulate the end versions of the input formula, use a bit of more compute
or substantially more because nowadays at least workstations have hundreds of cores.
Use those cores and then who cares about the bugs, right?
If you find one particular input where the solver produces correct result, then just
take that.
And so essentially that was an answer to this very mean question.
Yeah, that's a really interesting way to approach it, right?
So flip it on its head and sort of look at it from, I guess it's almost like putting
yourself in somebody else's shoes in that sense and then thinking of sort of the possible ways they can ask you that. And yeah, that's a really
interesting way to sort of tackle it and a nice heuristic or a nice tool in the toolkit for
thinking about a problem. But it was not deliberate at all, right? It was totally accidental,
right? It was kind of as a rehearsal for the actual defense the next day.
But it's cool that you kind of learned that and you kind of had the observation. I wonder if it
has the same sort of efficacy if you are doing it not under pressure. Your brain probably isn't
working quite a different way. Maybe you need the pressure element as well for it to work.
I don't know.
I think at least I do.
Yeah.
Cool.
Well, we're arrived at the time for the last word now.
And so yeah, what's the one thing you want the listener to leave this podcast
episode with today, the one takeaway?
Yeah. leave this podcast episode with today, the one takeaway. Yeah, I think the one takeaway is essentially I want to have people embrace the idea of
combining or at least bridging the gap between software engineering and form and efforts.
And in that sense, we need something like formal methods.
My claim is we need something like formal methods engineering, which takes the nice
aspects of the software engineering community and applies it to formal methods.
It systematizes formal methods, makes the tools better, pragmatic approaches, how to combine the tools,
consolidate them and so on.
So essentially it's a call for formal methods researchers to publish more of their work
in software engineering venues where people like me certainly would appreciate it.
It's a good call to action there.
And at the seminar, we're also a big fan of bridging the gap.
So nice use of that phrase as well.
So yeah, that's fantastic, Dominic.
Thank you very much.
It's been an absolute pleasure to talk with you today.
Any of the relevant materials you've mentioned, we'll drop links in the
show notes as well, and so people can, we'll put you, are you on social media
Dominic anywhere?
Are you on the?
Yeah, all over the place.
Okay, cool. And we'll link... Yeah, all over the place. Okay, cool.
And we'll, we'll, we'll link those in the show notes as well.
And yeah, thank you very much.
It's been a lovely chat.
Yeah, thank you, Jack.
I'm very appreciated.
Cool.
And we'll see you all next time for some more awesome computer science research.