Disseminate: The Computer Science Research Podcast - Andrew Quinn | Debugging the OmniTable Way | #16
Episode Date: January 2, 2023Summary: Debugging is time-consuming, accounting for roughly 50% of a developer's time. In this episode Andrew Quinn tells us about the OmniTable, an abstraction that captures all execution state as a... large queryable data table. In his research Andrew has built a query model around an OmniTable that supports SQL to simplify debugging. An OmniTable decouples debugging logic from the original execution, which SteamDrill, Andrew's prototype, uses to reduce the performance overhead of debugging (SteamDrill queries are an order-of-magnitude faster than existing debugging tools). Links: Andrew's HomepageDebugging the OmniTable Way OSDI'22 PaperStreamDrill GitHub Repo Hosted on Acast. See acast.com/privacy for more information.
Transcript
Discussion (0)
Hello and welcome to Disseminate, the computer science research podcast. I'm your host, Jack Wardby.
I'm delighted to say I'm joined today by Andrew Quinn, who will be talking about his OSDI paper, Debugging the Omnitable Way.
Andrew is an assistant professor at the University of California, Santa Cruz, and his research focuses on the intersection between operating systems, architecture, programming languages and software engineering. Andrew, thanks for joining us on the intersection between operating systems, architecture, programming languages, and software
engineering. Andrew, thanks for joining us on the show. Cool. Yeah, happy to be here.
So let's dive in. So to quote your homepage, you say that most of your work is about finding bugs,
or is about bugs, finding them, understanding them, and fixing them. So how did you end up
researching in this area? Yeah, that's an interesting question. You know, I think there's obviously sort of multiple causes
for why somebody is interested in a specific area. And it's really a confluence of a bunch of stuff.
I think if I had to rule it down and say like, okay, what main one thing really spurred me looking in this direction?
Right after undergrad, I worked in industry for a little over a year. And I was a,
I guess we called it L3 support. And so I spent a lot of time interacting with customers who had
a problem with this legacy application I worked on. And every once in a while, I'd do occasional
enhancements to that application. So maybe I add some new feature that that some customer wanted and so I was kind of underemployed in
this role I had a lot of time to think about you know how I could do this task better not just from
a how would I be a better engineer but like systematically how could we design better tools
for this job and that really got me started thinking a lot reading papers and trying to understand what
the research community said was the best way to do these tasks. And, and so then when I started
grad school, I mean, I didn't go in saying, Oh, I'm going to work on debugging, but I definitely
went in. And I was very probably predisposed to think of these problems as the interesting
problems. And so, you know, the first papers I read that I really liked were about debugging or testing and just sort of
steamrolled from there. You know, a couple other, I guess, small factors or, you know,
just kind of like lucky scenarios, like the first project I worked on in grad school was something
my advisor suggested to me and we worked on together.
And it wasn't strictly about debugging, but it was debugging adjacent.
And so it was easy for me to take that project and turn towards more thinking about debugging as a topic.
Yeah, and I just had a lot of freedom.
I mean, I don't know, I worked with the right people who gave me a lot of freedom to kind of define that type of research agenda, especially as I got more senior.
So, yeah, I think these types of questions can be really hard to answer. It's definitely not one cause.
It's definitely a bunch of things leading into what your interests are and why you work on them.
But there's, I guess, a little bit of my story describing how I got here.
Fantastic. So let's get stuck into the paper then. So can you describe to the listeners
what an Omnitable is? Yeah. So an Omnitable is an abstraction for an execution of your software. So
in particular, it models a particular execution of your program. so a particular instance of the process, as this gigantic database-style table.
And the aim is to include everything you could possibly want to know about that program. That's
our goal, is to include in this one giant table. So what that looks like is we have a column in
this database-style table for every single byte of memory, and a column for, let's say, every single
register, and some additional metadata, like a time column that tells you when did some event happen.
And then there's a row in this table for every single instruction that's executed by your
program. And so you can imagine it's this sort of gigantic entity that represents everything, all of the state that your program ever reached and ever had.
And the idea then is when you want to debug that application, when you want to say
something about what happened, you write a SQL query on top of that table to pull out the
particular state you're interested in observing, whatever that state might be.
Kind of what's the motivation behind this approach?
I know you touched on it a little bit there, but why did you kind of end up on this, taking
this approach to debugging?
And how does it compare to other debugging tools, maybe?
That's interesting.
So, you know, when I actually, before I worked on the Omni table, I had this paper, it's
called cluster-fueled debugging. I worked on the Omnitable, I had this paper. It's called Cluster Fueled Debugging. And Cluster Fueled
Debugging looked at how you could use really large scale compute clusters. So imagine sort of
thousands of machines in order and tap into all of that power in order to understand what went wrong
when you had a particularly hairy or complex bug. And I won't bore you with the details of exactly
how that worked. It turns out we're
using some of those ideas in this Omnitable paper as well. But essentially, what wound up happening
is you had to rewrite the way you would specify debugging code. And you had to think really hard
as a developer about the fact that your code was going to be parallelized. So in essence, we were able to make
debugging much faster. But the trade-off, the cost was that you as the developer had to think a lot
about complexity. You had to deal with the complexity of thinking about parallel programming.
And I was always very dissatisfied with that as the trade-off. We were aiming to improve the
latency, so I was fine with
it. But I was always dissatisfied in the back of my mind thinking we have to be able to do better,
right? Like we can do better than this. And the Omni table was really kind of my attempt to do
better, right? So how can we get the improvements in terms of performance, maybe even do better
than what I had done before, but also retain sort of the simplicity of debugging we might have today, or maybe even do
better in terms of simplifying the complexity that it takes to debug programs. As I got more and more
involved in it, I realized that this trade-off actually is fairly fundamental. Lots and lots of
tools either choose to have high overhead or high complexity, And sort of you can taxonomize
and I taxonomize in the paper briefly
prior tools that fall into this space.
So there's things like, you know,
any of the tools that I've worked on
paralyzing debugging tasks.
So I brought up cluster field debugging.
There's other examples.
There's also other examples of tools
that use really low level program analyses.
And I would classify those as
low overhead, but very high complexity, because as a developer, you have to contend with thinking
really hard, how do I make this analysis work? And then on the other hand, there's a lot of tools
that kind of like the OmniTable have proposed higher level languages for debugging. So these
are things like Faye or Endoscope or pivot tracing. And a lot of those tools are giving you this higher way of thinking about debugging, but
they often don't do anything about the performance overhead.
And so some of them either like to sort of contend with performance overhead, they either
just expose that you're sort of stuck with paying really high costs for your debugging,
or they just don't let you ask expensive questions.
Right.
And so that's sort of where we wound up eventually leading to the Omnitable was how do we balance
these two things?
How do we balance complexity and performance overhead and get the best of both worlds?
Can you kind of walk us through maybe then the kind of the query model for Omnitable?
And so you said that you mentioned that you have like a SQL style language.
What was the extension you made to kind of SQL to better support the query?
Yeah.
So one of the things that we had was we were sort of, I don't want to, we were a bit dogmatic almost about this idea that everything should boil down to being a query over this Omni table. And in particular, we didn't want to add other sort of,
I won't call them hacks, but for lack of a better word,
hacks into the query model.
So we didn't want to have this collection of tables
that were sort of magically deduced.
Like I didn't want to have a functions table
that included all the functions executed in your program
where that was derived externally from
the OmniTable. I wanted that to be a view on top of the OmniTable. The rationality being that
from a systems perspective, if everything boils down to a query on top of an OmniTable,
then it's much easier for us to build. The optimization story is straightforward. Everything
gets optimized to the same optimizer. So there's the sort of nice systems things that happen. If you stick to this idea, everything is going to
be built on top of this low level table. Okay. So that's where we started from.
And that's a little bit contrary. Like a lot of people instead defined views they thought were
interesting. And that's kind of the way they would try to design their system. So this is a little
bit of a radical approach in terms of systems design. But then you ask, well, okay, so the problem then is like, well, there's certain
things that I do, like, I often find myself wanting to ask a question about, you know,
if I have the start of a function defined by some particular instruction in my Omni table,
and I have the return or the end of that function defined by another instruction, how do I match those together when I'm dealing with things like
recursion, right? I need to somehow model a function stack, but especially your listeners
who maybe come from more of a database background, they probably understand, well, it's very hard to
model recursion with standard SQL. We don't have the ability to do that. So that was one of the things
we did. So I built this particular type of join. I called it a stack join that allows you to express
the idea of sort of rows being pushed onto a stack and rows being popped off of a stack as they match.
And that allows you to model the function stack, which is really critical for debugging.
That was probably the, maybe one of
the few extensions to the language that was, that is required from a, what you can compute
perspective. There's a lot of other additions we made that are just convenient. So there were,
we made a way to say, okay, if I had some event, I want to know the next event that happened that matches in some
way. Well, with SQL and sub queries, you can make that happen. It's just really awkward. And so we
said, hey, we're doing this all the time. Let's give you some syntactic sugar so that you can
model this notion of the next event or the previous event. And then the final thing, we added some
nice syntactic sugar around some user-defined
functions where we allowed you to do traversals of data structures and things. And there we're just
using kind of pre-existing user-defined function interfaces, but we're giving you nice ways of
expressing that around like, hey, here's a pointer-based data structure. I want you to
calculate the transitive closure of that pointer-based data structure. I want you to calculate the transitive closure of that
pointer-based data structure so I can traverse my entire B-tree when I'm doing this query,
whatever that might be. In your paper, you give a really nice case study of a developer
that's trying to diagnose some performance problem in Redis. Could you maybe walk us
through how the developer would approach solving this issue or debugging this issue, should I say, with Omnitable and what the sort of debugging queries would look
like to solve this issue? Yeah, so I think this is a good opportunity to describe the usage model
and to talk about one thing I haven't talked about at all, which is
Omnitable queries are designed to be offline as opposed to online.
And so what the Redis developer in this use case, you know, starts with, or what you,
the person, the listener who wants to debug their code, you start by capturing an instance
of your program that has the problem that you want.
You can think about that as like, you're basically getting
the Omni table for some problem that you had initially, whatever it might've been.
And we use a technique called deterministic record and replay in order to do this.
But that's sort of step one as a developer. And then once you do that, you have this Omni table
that's exposed to you. And you can sort of think of as, you know,
just querying that existing Omni table at all times. So that's actually really convenient,
because it means when you have multiple queries, and there'll be multiple queries in this use case,
each of those queries occurs over the same existing Omni table. You're not sort of calculating a new execution
and performing sort of this extraction,
analyzing a new Omni table.
You're always analyzing the initial Omni table
that showed whatever that bug was.
So that's kind of the, I guess, step one.
But let me talk about the use case a little bit,
this Redis bug.
So in particular, we're imagining a scenario,
and it's based on a bug that was reported in Redis, where a developer is deploying Redis as
an LRU cache in front of a backend service. So just imagine in front of some sort of database,
whatever it might be. And the bug is that over time, they're seeing the end-to-end latency of
that deployment, not necessarily just of Redis, but of the they're seeing the end-to-end latency of that deployment,
not necessarily just of Redis, but of the whole deployment, that end-to-end latency
is slowly increasing.
And the developer using existing telemetry, so just existing logs or things you might
have, notices that the increase in latency or an average end-to-end latency is correlating
with sort of a higher and higher percentage of requests being processed by that backend service. So in other words, the cache is sort
of getting worse at caching things over time. And okay, if you think about these types of
problems, right, it's such a high-level symptom to start with. And if you were trying to debug this,
it actually might not even necessarily be a bug for starters, right?
Like in principle, the workload could be changing in certain ways such that you would expect this to happen.
So it's a really hard problem to start staring at when you have a symptom in mind and you're not even totally certain that that symptom is indicative of this particular bug. So anyhow, so the developer starts
and their first query is just trying to approximate
the working set size of their cache.
They're trying to get a sense of how many unique items
are being accessed by the cache over time.
And they write this query that uses a windowed aggregation
on top of all of the queries against their Redis cache.
You can see that paper for more details.
What's interesting about this query is we show in the paper a comparison between how you would maybe write this query if you were to use a tool like GDB or any sort of like logging based-like tool that you might use today. And you find that this one query, seemingly innocuous,
has a lot of kind of interesting trade-offs in it,
dealing with the performance complexity trade-off.
So it's easy to make this query in GDB run really slowly,
but it's very hard to make this query run quickly.
And so there's a lot of complexity in making the query
run reasonably fast. So it's kind of a poster child for the trade-off we were talking about.
So that's the first query. You know, what the developer learns at this time is, oh, actually,
the working set size isn't really changing. And so this is starting to point, oh, there is a bug
here, right? So the first query is just kind of saying like, well, is there actually a bug in Redis or not?
OK, so then once they figure out that there's a sort of fairly constant working set size, the second thing the developer wants to figure out is, OK, well, how many items are actually in my cache over time?
Right. So they're just trying to track like maybe there's some internal problem that's going on.
Who knows exactly why this is happening?
But maybe I'm sort of losing items.
Maybe the size of each item is increasing.
And so I'm able to store fewer.
You know, who knows?
And there, the developer learns is that the number of items in the cache is actually decreasing over time, which is kind of surprising.
Because the same query that they wrote also shows the number of bytes in each item is not decreasing.
So you shouldn't see this decrease in size if the number of bytes in each item is not decreasing.
That means that basically the size of your cache is decreasing over time.
And that doesn't really make any sense.
So, you know, the next thing the developer does is they write a query that essentially looks at, it's kind of like a use
after free query, but it's customized specifically for this Redis scenario. And so because Redis and
lots of servers today don't clean up memory on termination, you can't just use an off-the-shelf
use after free detection tool because everything is a leak.
And so if there were a memory leak that was causing this problem in the cache, you sort of would say, okay, well, all of my memory is leaking.
A good portion of my memory is leaking in Redis.
So they write this customized query that uses this new view.
It's sort of complex.
It actually ends up being like roughly 13 lines
of a query, but essentially they're tracking for each allocation site that eventually leaks memory
at each point in time, how much memory has it leaked by that point in time. And they're using
that as a way of basically saying, okay, which allocation sites are leading to a growth in the
amount of memory that I leak over time, as opposed to constant in the amount of memory that I leak over time, as opposed to constant
in the amount of memory that they leak over time. The idea being, okay, if it's growing,
then that's the leak, right? And if it's constant, then that's something that, okay,
you technically are leaking, but it's not getting worse. So finally, the sort of fourth query,
and then eventually the fifth query are really just about looking at the third query.
It shows you that the problem is these objects that should be reference counted are what's being leaked.
And so the fourth and fifth query, and I won't get into the details there, are really just then tracking basic invariance about those reference counted objects.
So, you know, when I see, do my reference counts ever get larger than
I expect them to get? Do they ever get small? You know, do they ever get negative? Those types of
invariants and they're just tracking those types of things. And they eventually discover that
there's this bug in their reference count when certain objects are shared across two threads.
So it's kind of continues to be a hairy bug all the way down.
But, you know, hopefully that gives you somewhat of a sense of the types of queries you run
where you're thinking about how do I think about my problem in terms of these types of
SQL aggregations?
We found that those are super, super powerful tools.
And so almost all of the queries I described are using some form of SQL, excuse me, aggregation
to make reason about what's
happening inside of my program. In the general case, what's happening in this exceptional case,
you know, tracking how things progress over time, that sort of thing.
Awesome. Yeah, I really like the idea of bringing SQL into this sort of domain and this sort of
help out how we can like leverage that and the kind of language that people are probably more
familiar with. Yeah, so next kind of question is is we've touched on it so far this kind of this
this trade-off in complexity versus performance so obviously kind of naively materializing an
omni table seems like it has a significant performance overhead with it so how do you
go about kind of mitigating against this potential performance overhead? Yeah, when I describe this paper to people, I describe this problem as like the
elephant in the room. You know, usually we get about to this point in the conversation,
and they sort of sit back and they're like, well, it's great what you're telling me.
But you told me that it was too expensive to use my debugging tools today. And then you told me
you're going to record everything that ever happened in the program. That surely is more expensive than what I started with. And so,
yeah, absolutely. Right. So the Omni table, you can just sort of back of the envelope,
calculate this. You're talking about probably exabytes of data that you're recording for most
reasonable applications, because it's the whole memory space on the, you know, in terms of number of columns. And there's, you know, it's the number of instructions executed, right, per row. So
do a couple of seconds of computation, you know, go write it down. It's petabytes and approaches
exabytes within a minute of computation or something like that. So, yeah, so we can't
store it, right? So everything I've been telling you is impossible. So what we do instead is we have this technique that we call lazy materialization.
And the idea behind lazy materialization is that we're going to guess, byte of Omnitable State before we
realize and actually calculate that byte. How do we make that possible? Well, the idea is to turn
to this technique I alluded to called deterministic record and replay. And so what we do is during
this initial execution, when you first discover or first capture the bug, whatever that
bug might be, this technique called deterministic record and replay just records a log of all the
non-deterministic events that are passed to your program. And that log of non-deterministic events
is much smaller than all of the state of your program. In fact, a prior work out of
my lab group at Michigan, when I was at Michigan, had showed that you could store,
depending on the exact usage model of the user and the computers, but at least for like desktop
applications they had showed, you could store an average year's worth of these logs for the
average user on like a single terabyte drive. So we're just building
on top of that existing work. That system was called Arnold. And then what happens is when you
actually query an Omnitable, we then turn your query into dynamic instrumentation that we execute
on top of that recording. And so that dynamic instrumentation, essentially, if you asked for
a particular state at a particular point in time, you can kind of imagine that as like saying,
we create a printf statement that extracts that particular piece of bite of information you wanted
at the particular points in time that you specified. And then we rerun that. And that's
how we then actually calculate the Omnitable Qu after the fact and sort of lazily, as we say.
Oh, fantastic. So are there any other additional techniques you use?
I mean, the prototype is called Steam Drill, right? That's how this is implemented.
Are there any other techniques that you kind of pair up with the determinacy recovery plan. Yeah. And that's really where one of the, I think,
really interesting consequences of doing this lazy materialization is that the process of
deciding sort of where to introduce this instrumentation, it's very open, right? So
if you think from a database perspective, you can think of the
re-execution and all of this as being basically a query planning problem where you're deciding
Steam Drill, which is our prototype, is deciding when do I want to run this dynamic instrumentation?
How do I actually want to instrument the program to do this dynamic instrumentation? Do I want to
leverage many machines? How exactly do I want to do this? Right. And so in particular, we basically used two techniques. Okay. So the first one is we reused my work on cluster fuel debugging. So we used thousands of cores to split this instrumentation task across many, many cores and use lots of cores to try to figure out what this state was that you're interested in.
That was more or less just using the state of the art that's out there.
And the novelty here, the probably novel approach was we took this idea that we called
sort of multi-replay query resolution, which is yet another mouthful. But the idea was rather than, you know, take your query and calculate it all
over a single execution of the replay, right? So you could imagine converting your query,
running it all on top of one replay, all in one go, and then being done. Okay. What we found,
though, is that for these queries, especially some of the more interesting ones that are
joining tables that are maybe saying, I some of the more interesting ones that are joining
tables that are maybe saying, I want to know all the instructions that were executed only in slow
invocations of your functions or something, right? In those types of queries, what we find is there's
these different parts of the query logic that are way more expensive than other parts of the query
logic. And if we can calculate the less expensive parts of your query
sort of early on, then we can often filter the amount of the more expensive thing that we have
to go calculate. And so the idea is, okay, we'll split your query into these different
chunks basically, and we'll execute the less expensive things first and use those to try to
filter how much of the more expensive stuff we actually have to calculate and extract later on. And we do that in part that
only works because we have this replay. And every time you execute the replay, the idea of
determinism is you get exactly the same Omni table out. And so as a result, we can always delay,
you know, we can always sort of delay and recalculate exactly
the state that we would have gotten earlier because we're building on top of that deterministic
record and replay as a primitive. So that's the novel technique that we used to accelerate these
tasks. Yeah. Awesome. So how did you actually implement Steam Drill? What's it built on top of?
Is it totally sort of handmade from kind of nothing? Or did you built on top of is it was it totally sort of hand handmade from kind of
nothing or did you build on top of existing infrastructure right right right so uh you know
there's a lot of components of course uh the record and replay is built on top of arnold which
was this record and replay system that i mentioned earlier the and then there's a component that's an
instrumentation specific component uh That instrumenter is actually
custom for Steam Drill, basically. So that was sort of our own building. And then the query
language, the optimizer, most of this stuff that looks like a database is all built on top of Spark.
And so we had some extensions we had to make to Spark to support our new query operators I talked
about earlier. We had to hook into how the Catalyst optimizer works, et cetera.
But we were able to re-leverage a lot of what they had in Spark to support a good number
of operators and things like that.
How long did it take you to develop it?
Oh, man, that's a loaded question.
Years. years. So I had started this project and I don't know if I, I wasn't probably developing
in earnest in the very beginning, but there was a HataWest version of this paper
that came out in 2019. And just before that was in May of 2019, just before that,
I had started enough sort of hacking and building
versions of the prototype that eventually became Steam Drill. I had started working on that at
around that time. As it turned out, there were a lot more changes that I had to make that I
initially sort of didn't expect. So I mentioned, for example, we have our own instrumenter for
Steam Drill. I didn't think I was going to have to do that. And I thought I was going to be able to use something off the shelf.
It turned out that the existing tools either came with really high overhead or they couldn't
support arbitrary queries like what we wanted to support.
And so, yeah, I mean, it wound up being a really large implementation effort, in part
just because things I didn't expect or things I didn't expect or, you know, problems I didn't
expect to have. And there were a lot of sort of features I started to add and I realized after a
couple of months, so maybe I didn't really need that or, you know, that doesn't really help the
story. So it was a long process, but but in the end, you know, I think I'm pretty happy with the
work and, and the implementation is open source. You can go find it on GitHub. Both the IDETIC system is out there.
There's like a standard, that's the Arnold work.
There's a standard repo for that that I sort of support.
I have support.
And then there's the Steam Drill repo is also out there.
You can find that through my GitHub page as well.
Yeah, we'll link it all to it in the show notes and everything
so that you can go page as well. Yeah, we'll link it all to it in the show notes and everything so the listeners can go and find those.
I guess my next question is,
how did you go about evaluating Steam Drill
and what were the questions you were trying to answer
in your evaluation specifically?
Yeah, that's a good question, right?
So we had these two claims.
We were going to try to improve complexity
and we were going to try to improve latency.
And so that naturally kind of leads to the two types of questions that you asked. Right.
How did we improve complexity? How did we improve latency to evaluate the complexity question?
We went through a bunch of case studies, not unlike the reddest one I described earlier. So we actually looked at five different case studies,
starting from a bug where we found the bug in like a bugzilla type bug report. And then we
walked through what the Omnitable queries would be. And so from those five case studies, we wound
up with 14 queries themselves. So because each bug has more than one query, right? Like the reddest
example I described earlier at five, many of them had two or three, something like that. And we then looked
at those queries and we said, okay, so what do we compare against? And we thought, okay, the de facto
debugging tool is GDB. And so we went on, we said, okay, well, how can you model, if we can easily say this is the Omnitable query, what's the equivalent in
GDB?
And we looked at GDB has these Python bindings that you can think of as being scriptable
versions of GDB statements.
So you can say set a breakpoint on certain points in the program.
You can perform some computation, like use a Python dictionary to count the number of times
you got certain spots, whatever it might be. And so we're like, okay, that kind of models this idea
of what if I use GDB to solve these bugs instead? So we took the same queries we implemented for
the Omni table and we implemented them in GDB. And then we compared using metrics from software engineering community, what's the
complexity gain, sorry, from using the Omnitable instead of GDB. And we used a few different
metrics. So the first thing we did is just count the number of lines of code. And we found, okay,
Omnitable queries are on average, like 3.7 times, require 3.7 times fewer lines than their GDB script
counterparts. And in certain cases we see, in fact, you know, way higher than that. So there's
at least one query out there where it's more than an order of magnitude fewer lines. We also counted
the number of terms in an abstract syntax tree. That's kind of like saying how many sort of units
of whatever units of thing did I have to
think about when I built my system or built my query. And there we saw again, improvements this
time on average, about five times fewer terms. And then we also use this aggregate metric to
measure complexity that's called the Halstead complexity, which looks at properties of your
abstract syntax tree to estimate how long it would take a developer to write that correctly.
And those metrics that are using are things like how often is a term reused versus how many times do I have unique terms?
So things like that. And there we found that on average, the Omnitable scripts are 2.7 times or would require 2.7 times less development time,
according to this Halstead metric. So together, any one of those points in the space is maybe not
the best argument, but the totality of information tells us that these are in fact less complex than
their GDB counterparts. And we also show in the paper, I mentioned earlier this
example, like the first Redis query, the side-by-side comparison. So we show a couple of
those comparisons and we kind of let the reader be the judge. Do you think these seem more or less
complex? And certainly the PC and most of the people I've talked to seem to think, okay, this
does look less complex. So that's the complexity side. For latency side, you just calculate these things,
you measure them. And again, we compared against GDB and we just showed, okay, we didn't actually
calculate them over all 14 of those queries. We chose a subset of representative ones.
But for that subset of interesting queries, how much faster is the Omni table than GDB. And we found that on average,
it's almost two orders of magnitude faster
to calculate your queries using the Omni table
instead of using these GDB scripts.
So yeah, we saw massive gains in efficiency
and still, you know, I think reasonable gains
in complexity against what's the de facto tool
that most of us are using,
as far as I can tell uh when we develop fantastic there's some good sound bites in there some good quotes i can take out for so yeah brilliant um yeah i guess you've covered
kind of what the key results are there but is there anything else you like specific things
you wanted to emphasize i don't know i mean i I, I'm not sure if I have any other like key results.
I mean, I think, I think there's a broader, I guess, you know, potentially when I told people
I wanted to build this tool, right. Like, you know, in 2019 or whatever, you know, there were
a lot of people that were like, Oh, I mean like, and even when we started this conversation and you describe
this model, people are like, oh, that seems impossible. So maybe one of the key results of
the paper that I sort of think back about is, you know, it's actually possible. And then sometimes
it's even more efficient to support this like massive model of what your computation is that in some ways there's a one
key result might be that like hey this query model is actually realizable and that's in of itself
kind of a key result that like there exists a system that does this um might itself be uh
you know a key result of the work fantastic the yeah the next question i'm going to ask is are there any
situations in which this approach is the wrong way to go or is it always better than the existing
kind of approaches people use and what are the general limitations of it that kind of yeah i
guess kind of what are the limitations of it right so yeah So, yeah. Right, right. Yeah. So there's two things I would say. You know, if you're faced with a really simple bug,
like it's not that uncommon for me to have a bug in my software
where like adding a line or adding one or two printf statements
to my code kind of shows me what went wrong, you know.
And for those types of really simple problems,
I'm not sure if it's worth it to get out the whole query up the whole shebang of how this works some of that could be improved by integrating better
into ide's we didn't know if we made no effort to integrate an ide this is a command line tool
you're writing your your your queries on the command line you're sort of interacting with it that way.
If we made a better effort to integrate this into Visual Studio or something,
it's possible some of that burden,
that this might be useful for simpler bugs,
but at least right now, you know,
don't reach for this when you've got your, like,
easy to understand problem.
So that's, like, from a practicality perspective.
There's also a set of limitations, and again, I think your listeners, assuming they're coming from a database background or they have some database knowledge, you know, even though we added this like stack join, right, to handle functions, there's say you're in a program and you want to say, all right,
I'm at some particular point. I have some, you know, X equals two. Okay. And then I want to say, well, what are the, what's all of the future instructions that that assignment ever impacted?
This is like a transitive closure information flow question.
And SQL can't do that type of unbounded traversal
over a database table like that.
So we need some new language construct,
either by adding extensions to what we have
or by starting with something like Datalog, I don't know,
something totally different than SQL
that allows us to express those types of transitive closures
to support those types of questions.
So there's definitely things that you probably can't do.
Now, we haven't found that those are, like,
drastically standing in the way of our ability to debug,
but certainly they could.
Yeah, a point I just wanted to add is that I know there's an upcoming new,
well, there's also an extension to SQL that's going to make it better
for doing graph-style queries, SQL PGQ, which is due to come out, right?
And I think that there is also some effort to have, like,
a standardized graph query language called GQL, which is kind of in progress.
So maybe there might be something kind of there that might be more appropriate for this problem, right?
Yeah, that's really interesting. I mean, you could imagine modeling the like assignments
of variables throughout your program as a graph, and then using some sort of like graph query
language on top of that, right? So I and I think that's a great example of where, you know,
I can imagine continuing to work in this space
by trying to add back, you know, some of these features
as views or embeddings on top of the Omni table, for sure.
Are there any applications that you kind of can't use it for,
or is it pretty much you can use it for any application, right?
Yeah, so we developed this entirely for binary, it for or is it pretty much you can use it for any application right like there's no yeah yeah so
we developed this entirely for binary uh or for c c++ so everything is designed around supporting
um uh c and c++ programs i you know being able to there's this side of the omni table and i
didn't talk a lot about this earlier but we have this notion of these views that you build on top of the Omni table.
And those views allow you to express your debugging in terms of like higher level language constructs instead of on top of like memory addresses.
And our tooling right now, like I mentioned, sort of only supports C++ for those things.
It's unclear to me at this time, and there's a little bit of a research question, whether or not you could build those types of views for dynamic languages, where the mapping from something like, what is the line of source code?
Like, where does that line of source code exist?
What instruction at the machine level
does that exist at? That connection can change over time if you're running in the JVM or something.
And it's not immediately obvious to me whether or not we could support that type of interface
and where the limitations would be. Mostly because I'm not an expert in how the JVM does that type of, what's the word, optimization and sort of re-optimization.
So I don't really know exactly how that works.
So I could imagine needing a set of extensions to SQL
to support those types of support dynamic languages, essentially.
Cool.
As a software developer, we touched on usability a second ago. How easy would it be to integrate Steam Drill into my development workflow?
Right. So I think, you know, we designed it and I've seen some work that's trying to use SQL-like tools for doing debugging. And that gives me hope that this style of debugging at the level of the Omnitable
contribution would be something that developers would be fairly well, you know, they'd be ready,
hopefully, to add that into their debugging today. In terms of the tech stack that builds Steam Drill, probably the hardest initial hurdle to get over is using deterministic record and replay.
And the system we're building on top of now is a custom operating system.
It's a custom kernel to support this feature.
And so you'd probably have to make pretty invasive changes to use that custom kernel to really, you know, make that work. So because of
that, the implementation we have today, the actual artifact is great. If you're a PhD student,
it's maybe great. If you're like an enthusiast, you hack your own kernel, you're happy to get in
and, and sort of recompiling a kernel and running a custom kernel doesn't scare you.
But I think for a lot of people,
that's kind of outside the boundary
of what they would normally be using
for standard workflows.
And so I think, unfortunately,
where we are right now with the tech stack
is probably not, you know,
it's not quite there yet
for your average run-of-the-mill software developer
to just download it and roll with it.
Cool. So what was the most interesting thing run-of-the-mill software developer to just download it and roll with it.
Cool. So what was the most interesting thing or maybe perhaps the most unexpected lesson that you learned while working on this project? Yeah. So I think to me where the stuff I learned most actually was sort of wildly non-technical.
And so, you know, I think retrospectively, you know, I mentioned I worked on this project, obviously, for years.
And I mean, at least two years and probably closer to three.
When I started this project right in the beginning, my advisor had left for sabbatical, and then he sort of never came back. I mean,
the paper has a set of, you know, there's four authors on the paper. And at one point in time,
you could with a straight space, say all of them were my advisor at different points of this
project. And there actually was a fourth person who you could have said was my advisor at a certain point in this project. So it was a really, I learned a lot about leading a project in a way that I had never really fully
had to lead before. And I learned a lot about, you know, managing to stay on top of a schedule,
like a lot of the things that maybe I think we don't pay enough attention to in the research community as really important parts of like doing great research.
And a lot of my own sort of soul searching and working through like having bumpy different people, you know, finding a way to get feedback from a lot of different people over time.
Finding a way to make the 30 minute meeting that I had with my advisor who left for industry.
I only got 30 minutes of his time every week.
How do I make the most of that 30 minutes?
And those types of things, I think, were, you know, really made me a better researcher.
And that's part of why I think I'm able today to do what I can do.
But it's very bumpy during the project.
I think I learned more about that than
I think I learned technically, probably in this project, which isn't to say I didn't learn
technical things, but just that's what I would focus on. Did that experience sort of shape your
decision to kind of become an assistant professor? Or did that kind of something you always wanted
to do and kind of continue in academia? did that make your experience and sort of the resilience and the things you learned across it and how to kind of manage
a complex project over a long period of time did that sort of kind of decide oh yeah i want to
stick at this and continue in this sort of line i think it certainly was a trial i mean i i came
into grad school thinking i wanted to be an assistant professor so it's not the reason i
decided to go be one.
It certainly would have been a really great time to learn that I didn't want to be an
assistant professor.
And, you know, instead, here I am.
So obviously, it told me something about, you know, sticking it out.
I think it's made me a much better assistant professor for no other reason than the fact
that I've worked with like four different styles of advising.
So I've seen a lot more things than I would have otherwise.
And I think I'm more prepared than I would have been otherwise to lead projects and help students in different ways.
So I view the whole process as a big part of of really shaping my style, really shaping where I am.
And so, yeah, I mean, I think it was really instrumental in certainly in what I do now,
if not instrumental in my career path, certainly instrumental in that side of how I run a lab and
how I interact with students, et cetera. Brilliant. So you mentioned that it's very
bumpy. What were the things that you tried that kind of failed across this journey?
Maybe you could touch on those a little bit.
Well, I mentioned this slightly earlier,
but I spent a lot of time sort of adding features that wound up being like totally useless.
So I spent a lot of time trying to add new features
that seemed like they'd be really convenient to have.
And in hindsight, just really didn't matter.
And in some cases, they didn't matter because I was kind of approaching the problem backwards
in the beginning. I was thinking from the perspective of build the model and then find
the use cases. And if you make the model good enough, then you'll have all the use cases, right? And especially for this work where it wound up being that like naive SQL wasn't
enough, I had to do other things. What I eventually learned after like a year of working on this
project was I needed to totally flip my way of thinking. I needed to start from saying,
here are the things I want to do. Here are the things I need this model
for. What are the implementation techniques and features that I need to make those things work
well? And so you had to start from the perspective of use cases rather than starting from the
perspective of the system. And I actually knew that ahead of time. My advisor always started
from the perspective of use cases. When I started my PhD,
I just didn't, I just hadn't internalized that until I was the one really leading the project. Right. So that's a great example. I spent so much time working on features that you won't find in
the paper and you won't find in the artifact because they just didn't matter. And those,
I would say those are kind of like probably the biggest things. The other thing that I did is, and I mentioned this, that Steam Drill has its own instrumentation framework for how we instrument code. And I needed that because the existing frameworks were either really slow, or they couldn't support adding arbitrary instrumentation to an execution. I dragged my feet for a really long time on that
task and I've tried really hard not to do it. And I think if I would have just from the beginning
accepted the reality that I needed to build my own, it really wouldn't have been that painful
and I would have gotten through it much faster. So that's a little bit of like a, you know, I think
don't be afraid of hard work. And I was really afraid of spending the week
really banging my head against the wall because that code is really hard. You're dealing with
low level assembly code, you're adding functions to look very challenging work. But sometimes you
just have to kind of sit down and be willing to put your nose to the grindstone and do the work.
Again, ironically, this is something my first advisor taught me. And I just did not know it until I really had to lead a project and do it. So
that's so funny. Anyhow, so I think those are like probably the two best examples of places where I
really was wrong, and then I eventually became right. And the nice thing is I kind of learned
both technically things that I needed to do and then also kind of
like life lessons afterwards so it was sort of really great spots in the in the project looking
back awesome so where do you where do you go from here in this sense what's next for
Omnitable for Steam Drill and I know you're working on many other projects as well what's
next for you if like sorry your research agenda yeah so we'll start with like What's next for your research agenda? Yeah, so we'll start with what's next for Omnitable and Steam Drills. So there's a few
things that I want to do. One of the things about the paper as it stands today, we're
looking specifically at debugging a single execution of your program and all the programs
are also, they're multi-threaded, but they're all single machine programs. And so what I'd like to do is take this model and sort of two straightforward directions
for extending the Omnitable, I think. One is, what if I extend this to a distributed system?
What if I have a collection of Omnitables for each of the nodes? And how do I debug that? How
do I think about what's a query over a distributed Omni table look like? What am I
looking for? Those types of things. So that's one thing we've been thinking about as a group.
Another thing is, you know, maybe I'm not looking for distributed systems, but instead I have lots
of different Omni tables. If each one's an execution of a program, I now have, let's say,
thousands of times I've run my program. I hold on to all
those Omni tables and now I'm writing queries over all of the past executions of my program.
So what do those queries look like, right? Like what can I learn and how do I make that efficient
to query the universe of times when I've executed a program? So those I think are like what's next
for the Omni table. One extra thing I'll mention,
we're also thinking about, I mentioned that it's very hard today, you have this custom kernel,
it's very hard for you to start using Steam Drill the tool. We've been spending a fair bit of time
thinking about how we can make that easier, that burden easier to get into using record and replay
in the particular way that we want to be able to use record and replay.
And we're not the only ones thinking about that.
There's like dozens and dozens of papers, and I'm skipping citing all of them and not doing my academic due diligence.
But the point is, I think that there's more work to be done there to ease the burden of
adopting some of these tools and for looking into doing that as well.
So that's what's next for the Omni table.
More broadly, like what am I working on today?
In some ways, I've sort of pulled back a little bit
from just working on software reliability.
I'm working a lot on,
I'm doing some work on disaggregated computing
where people are thinking about what happens
when some of my memory for my system might be far away,
you know, like literally far away from the rest of my compute and maybe some other memory. How do
I take advantage of that to make systems still perform well? And one of the things I've enjoyed
about that work is being able to use some of the stuff I learned about inspecting programs,
maybe not debugging them, but certainly inspecting them. A lot of the tooling that the Omni table winds up being about
is about profiling, understanding what happens inside applications. And a lot of what I'm doing
in this aggregated space is about profiling and understanding what's going on inside an execution,
inside a program to understand how to take better advantage of them. So I'm actually surprised at
this point, how much carryover there is
across the techniques and the ideas from the two spaces.
So that's like kind of the other thing that I've been spending
a fair bit of time on recently.
Awesome.
Just rolling it back to the distributed system sort of limitation.
I would find that tool super useful in my day-to-day life
because trying to debug, we have like the RAF consensus protocol
is kind of a core component of the stack here.
And trying to debug that is an absolute nightmare.
So any tool that could make that process less soul-destroying
would be gladly appreciated from me.
Absolutely.
Yeah, I mean, I think to sort of build on that,
I think we're moving towards a world in which, you know, most applications, first of all, most applications already are distributed. Not most, I shouldn of hardware. And those kind of look like distributed systems as well. And so over time, I think adding that component
of being able to use this system across multiple memory spaces and understand how those things
interact is a really useful and important contribution. So definitely I see that as a,
yeah, definitely something we're looking at.
So cool.
Kind of penultimate question.
Now, what do you think is the biggest challenge in your research area now?
Yeah, this is a really interesting question.
I don't know about the biggest challenge, right? That's such a tough question, but I'll describe a challenge that I think is important.
And that has to do with understanding what's called like a software oracle. So a lot of work
in testing, and I would argue, and you know, the OmniTable would benefit from this as well.
A lot of work in testing relies on some notion of correctness in order to say whether a program
execution was, you know, correct. Did I find some bug with this test? Or did that test
fail? And that process of defining oracles today is typically pretty complex, and it's often a bit
ad hoc. And so the way that works today, a lot of times is we go out and we survey a bunch of bugs,
we find common patterns of what those
bugs look like, and then we use that to derive our oracles. And I think we can do a better job.
I think we can look at potentially languages or structures to do a better job of letting developers
specify correctness of their programs a little bit easier and maybe a little bit more of a
straightforward way so that we can then,
as testing tools or debugging tools, find those problems for them and help give them more clues
in terms of what went wrong. So I think that's maybe the problem. I don't want to say the biggest
challenge, but a big challenge and a big problem in the reliability space today. That's what I'd say.
Brilliant. Right. So it's time for the last word now. What's what I'd say. Brilliant, right.
So it's time for the last word now.
What's the one key thing you want the listeners today
to take away from Omnitable
and from your research in general?
Oh, man.
You know, it's really hard.
I think what I would say,
and this gets back to you,
it asked me like,
what's the key result of the Omnitable?
And I sort of was like, oh, you know, and you mentioned, I had already said numerically,
right?
Like, what are all the results from, oh, you know, we're this much better.
The, the key result in some ways, like when I tell people the story is that like, okay,
when I started telling you this, you didn't think it was possible.
And by the end, you kind of believed it was possible.
And that, that tells you that there's this like big vision that seems not quite realizable. And then
you realize it at the end. And in an abstract sense, I would like to read a lot more research
papers. Oh man, now it sounds like I'm so conceited, but I would like to read a lot
more research papers where you start the paper thinking like this doesn't seem possible.
And you finish the paper being like, doesn't seem possible and you finish the
paper being like wow that was possible and so the key takeaway i guess if i was to encourage like
grad students or people who are interested in computer systems research go do stuff that doesn't
seem possible you know and i think that's like kind of where i i don't know i i do distinctly
remember at this conference i presented an early version of this paper at and a colleague and I won't name who it was, but someone I very much deeply admire.
And they're a really awesome person. They said, wow, this seems really, really ambitious.
And at the time I said to them, like, I don't know, it seems pretty straightforward. Like, it's just building on my work, you know, it'll be fine. Right. And I was totally wrong, of course. But, but, but yeah,
and looking back, I'm like, I'm really glad I'm really proud that I went through that and still
sort of stuck with the problem that seemed kind of really ambitious to start as your last,
the last part of your thesis when presumably you want to graduate soon.
So yeah, I think, I think that's maybe the, maybe that's like the key takeaway from the research, which I know isn't a technical idea at all. It's a bigger idea than that. But, but I think that's
the impact research can have is sort of non-technically, right? Go do stuff that seems
not impossible. That's amazing. Well, I think we will end it there.
Thanks so much, Andrew.
That was brilliant.
And if you're interested in knowing more about Andrew's work,
all the links to all the relevant materials
we've put in the show notes,
and we will see you next time
for more awesome computer science research.
Yeah, thanks, Jack. you