Disseminate: The Computer Science Research Podcast - Lessons Learned from Five Years of Artifact Evaluations at EuroSys | #64
Episode Date: July 30, 2025In this episode we are joined by Thaleia Doudali, Miguel Matos, and Anjo Vahldiek-Oberwagner to delve into five years of experience managing artifact evaluation at the EuroSys conference. They explain... the goals and mechanics of artifact evaluation, a voluntary process that encourages reproducibility and reusability in computer systems research by assessing the supporting code, data, and documentation of accepted papers. The conversation outlines the three-tiered badge system, the multi-phase review process, and the importance of open-source practices. The guests present data showing increasing participation, sustained artifact availability, and varying levels of community engagement, underscoring the growing relevance of artifacts in validating and extending research.The discussion also highlights recurring challenges such as tight timelines between paper acceptance and camera-ready deadlines, disparities in expectations between main program and artifact committees, difficulties with specialized hardware requirements, and lack of institutional continuity among evaluators. To address these, the guests propose early artifact preparation, stronger integration across committees, formalization of evaluation guidelines, and possibly making artifact submission mandatory. They advocate for broader standardization across CS subfields and suggest introducing a “Test of Time” award for artifacts. Looking to the future, they envision a more scalable, consistent, and impactful artifact evaluation process—but caution that continued growth in paper volume will demand innovation to maintain quality and reviewer sustainability.Links:Lessons Learned from Five Years of Artifact Evaluations at EuroSys [DOI] Thaleia's HomepageAnjo's HomepageMiguel's Homepage Hosted on Acast. See acast.com/privacy for more information.
Transcript
Discussion (0)
Disseminate the Computer Science Research Podcast
Hello and welcome to disseminate the computer science research podcast.
Today's episode is going to be slightly different to the usual episode,
and we're going to be exploring a topic that's really important
to how we build trust in scientific results and the scientific process,
and that is artefacts evaluation.
Specifically, we'll be talking about a recent paper that is titled Lessons Learned
from Five Years of Artifacts Evaluation at EurSys.
And I'm sure a lot of our listeners know what EurSys is as a conference, but for those who don't,
EurSys is a leading European conference that covers a whole wide range of computer systems,
of aspects of computer systems, from operating systems all the way up to embedded systems,
databases, networks and storage, a whole range of topics at Euris.
And yeah, so the paper was co-authored by several of the guests I've got on the show today,
actually, and who were the artefacts evaluation co-chairs from the years of 2021 to 2025.
And they're going to be talking today about their collective experience, reflects on what
work for them, what hasn't over that time period, and how looking forward, how the process
can be improved and we can evolve things to better support artifact evaluation in systems
research. So welcome, guys. I'll let you introduce yourselves if you want to go around one by one,
if you want to kick things off failure.
Yeah, absolutely. Hi, my name is Thalia Dudali. I am an assistant professor at IMDA Software Institute in Spain, and I was one of the Artifact Evaluation co-chairs in 2025.
Miguel, do you want to jump in now?
Yes, sure. So, I everyone, thanks for having us, Jack. So I am Miguel Matus. I'm an associate professor at ISD Lisbon and the researcher at Ineshk ID.
and I was the artifact evaluation co-shared in 2024.
And Joe, you're up.
Hi, Hermannio, I'm a researcher at Interlamps
and I was the co-chair in 2022.
Fantastic.
Well, I just want to say again, thank you all,
three of you taking time out of your busy days
to come and talk about artefacts evaluation.
So I'm going to kick things off for the nice softball
and ask you to explain what artifact evaluation is
and why it's so important.
And then I'm going to get out of the way
and let you three guys discuss the topic.
So, yeah, what is artefacts evaluation?
Why is it important?
Awesome.
I will take upon that.
So artifact evaluation is a voluntary process
that promotes their reproducibility
and reusability of scientific work.
So the process is that we submit papers in conferences,
like the one in Eurosis.
And alongside the paper, we also submit the artifact,
the software that was built to create the scientific
results produced in the paper. So the software, the data, the documentation, how to run it, how to
execute the code, how to replicate some of the results in the paper. And this artifact evaluation
process benefits a lot the scientific community because it really encourages the reproducibility of
the results, open sourcing the software, and enables researchers to build upon, to compare
against, and to extend the prior work. And this is something extremely useful.
in systems research.
And for authors in particular, it increases their visibility
and their impact of the contribution
by making it easier to reuse and validate their scientific results
in their paper.
Yes, exactly.
So we have been working on kind of refining this process
over the different editions.
And currently, so this is a voluntary process.
No, authors are highly encouraged to participate,
but it's completely voluntary.
And the process essentially goes in three phases.
So after the authors know that the paper is accepted at the conference,
they can apply to the artifact evaluation.
So they submit their artifact,
which is typically the source code plus maybe benchmarks and data set if relevant.
They submit also the main paper that has been already approved
by the PC committee of the main conference.
And then they also add an appendix that explains how to reproduce the main claims of the paper.
So artifact evaluation does not entail necessarily having to reproduce every single result
because sometimes for many different reasons that we can go on and discuss a little bit,
it's not possible.
But essentially the offers identify.
So these are the main claims.
key results that should be reproduced and they submit this to the artifact evaluation process.
And then there is an intermediate phase which we call the kick the tires phase.
And this is essentially a warm-up phase where the reviewers can check the artifact for basic
functionality. So is this properly documented? Is there a RITME? Does these compiles? Can I run
very simple demonstration case? And this allows us to flag early issues.
and interact with the healthers, if necessary, to ask them, please, this is not running, fix this before we're trying.
And then the final phase is the evaluation itself, where the reviewers are essentially going to attempt to verify, to assess the artifacts for completeness, for documentation, the building process, and the ability to reproduce the main claims of the paper.
usually artifacts are reviewed by three or four evaluators and then this goes as an unusual conference
right there is a discussion phase and then based on the discussion of the reviewers and the
viewers feedbacks and comments we will award them the relevant badges and I I will leave the
floor to want you to explain those those badges yes so the entire goal of this process is to
eventually award badges.
There are various standards for our badges
because we are in, or yours is an ACM conference,
we follow their policies and guidelines.
And early on, the systems community
has sort of agreed to three badges for now.
That's the artifact available badge,
the artifact's functional badge,
and then there is a result you produce badge.
And they sort of stack in
sort of complexity of evaluation after
each badge.
The available badge is basically that there is a UI,
so a tracker for the artifact
and that it's actually publicly released
just helps it make it easy to find, I guess.
But it does not say much about the functionality.
That's what the functional batch is about.
So there we're checking for,
or the evaluators are checking for the completeness,
whether it's well documented.
and it's actually working, right?
So it's the first batch that the artifact is executed on.
And then the most complex batch to receive is the reproduced badge
where basically the authors have to describe
what the main claims of the papers are,
and the evaluators then have to perform the same or similar experiments
to identify whether the results in the paper
are actually able to be reproduced.
It might be on a different system.
It might be with support from the authors and the same system.
It depends on how the evaluation goes.
Yeah.
And based on the evaluation, the respective baddest would be awarded.
And then they're typically displayed on the conference side in the ACNBritory
or on various websites, author websites in the paper,
to demonstrate that this paper has a.
I've been awarded those badges, right?
So it's a really nice byproduct of other paper as well.
Cool.
So that really sets the background well then for kind of what artifacts evaluation is
and all the badges and learn about all the various phases
and how we go and accumulate those badge and get those badges.
So as I said at the very top of the show,
you guys have been working on this for like around five years
between 2021 and 2025.
So tell us about your insights then.
So I'll come to you first on this, Miguel.
Give us an overview of the trends you've seen over the last five years.
Sure.
So this essentially involves all of the co-shares of these past five editions.
So it was a lot of work.
And the way we approached this was to be as systematic as possible,
even the data we have available.
and to support our analysis and the main conclusions that we have on the paper and that all of us will be happy to discuss here.
So we collect the data from multiple sources over these past five editions.
So this includes the official conference proceedings to get what badges that are,
the CSRTFacts.com.I.O. website, which is the official site for the CSRTFacts evaluation,
not only for Euroses, but also for other systems conferences.
So this is probably interesting to link to this site in the show notes as well for the interested listeners.
And then we also developed some internal tooling to crunch and collect and scrape all of these data.
And we did things like check on GitHub, what are the usage metrics, the stars,
how many pools and pull requests and those kinds of things from GitHub.
GitHub has been used for source code,
but typically we also strive to have a permanent place,
a digital repository for this.
Zenodo is a good example of such a repository.
And we also use these data sources to understand
how these artifacts are being used and shared with the community,
how many downloads we have over the years and so on.
So the data set is not tons of tons of data,
but we already have some reasonable numbers
that allow us to support our conclusions
and to identify mostly, and this is one of the things
that I find interesting about these work that we did together
to identify the trends and the recurring patterns.
So to give you just a very brief overview of the numbers,
we are talking about.
So roughly
60% of
the papers that have been
accepted at the conference
participated in the
artifact evaluation process.
So this is a number we want to
increase, but let's discuss
this later. And of
these papers that apply
again 60% roughly
of all the papers that have been accepted,
we have awarded
161
artifacts available. So this means that the code and data sets and so are available for the
community to try. Out of these 161, 136, they are deemed functional. So they got the functional
patch, as Anjo explained before. And 75 artifacts were awarded the results reproduced
which means that we were able, or the reviewers rather, were able to reproduce the claims made
on the paper.
So this is lots of work.
It's very challenging because we have to deal with many different things.
And maybe Thalia, you want to talk a little bit about that?
Yeah, so it's interesting to see that we have had an increasing interest in people participating
in the Artifact Evaluation Committee.
So every year there is a call for participation, so we encourage self-nominations from primarily
PhD students, junior, senior ones to participate in the committee of the evaluators.
And over the years, we've seen an increasing interest, and especially last year, 2025,
this year, but, you know, it started from last year the preparation.
We had double the size of the committee.
So the committee was consisting of 98 members, which is great to see the increasing interest,
but it's also challenging to handle this large number of committees.
But regardless, it is a testament to the increased interest in the researchers wanting to participate in the evaluation.
We had reviewers from institutions across the world, Europe, USA, Asia,
even though the conference is, you know, the European conference on computer systems,
but it's a testament to how much of a leading conference it is in systems research.
So this shows that the number of evaluators is large,
and the interest in participating has increased as evaluation,
as evaluators.
Yeah, and then we had some interesting facts.
Andrew, do you want to go ahead and talk about those?
Yeah, so I was mainly interested in finding out more about the artifacts itself and how and if they're being used.
So for that, we sort of looked at all the submitted artifact URLs, which initially were GitHub only, so repositories maybe.
And later on, it used to be GitHub, the DUIs typically from Zenozo or Ficture.
Just to find out, are those artifacts, for example, still available?
And we basically found out that nearly all of them are still in one form or another are still available.
So even five years down the road, people are not deleting their repositories or removing their own entries.
we found one entry
but it's a single
instance right out of 160
61 so it's
a relatively minor I would
I would say and then in terms of
usage we looked at GitHub stars
forks
Zanoto has sort of downloads
and I think use
is the other category and
there are really varies right
you have artifacts that are barely accessed
but then some of them
are downloaded hundreds of times
have hundreds of forts and really quite a bit of traffic and are still maintained.
So they receive still updates, also something that we have been looking at.
So it's quite interesting to see sort of that kind of data and how research artifacts
continue to be used. Of course, the older the artifacts, the more likelihood there is
that there is more actually more views, more forks, more stars.
on GitHub.
So generally, I think
sort of a good study of
impact of these artifacts.
Yeah, definitely.
I'm going to say then that
when they come around to
determining sort of the most
impactful, a test of time sort of award
for a given paper for me. This would be a good
useful input into that, right?
It's still been used 10 years later, right? And it still
works. So that's quite a good indicator that
it's had some
impact, right? So I don't think that we
will have an impact
on the test of time paper, because that's typically based on sort of research value,
not necessarily the artifact, but something that at least among us we have been discussing
is whether we need a test of time award in five years of time for an artifact, right?
It's not even incentivize people to keep maintaining it as well, right?
But yeah, cool. I guess given that, let's talk about challenges and some like proposals maybe
then for future. That's a nice segue into the next section of the conversation.
So, Athalia, I'll let you take the lead on this section.
Yeah, it was interesting that we, you know, all of, most of the chairs,
we had this discussion about what was the most challenging part over our service.
And there were things that were coming up every year, the same challenge again and again.
And the primarily one is that we have a very, very tight timeline to work on.
And this is part of many reasons.
So essentially the whole process of evaluating submitting,
and evaluating those artifacts happens between the paper acceptance when the decisions are out
and the camera ready deadline because the final version of the artifact and the artifact appendix
need to be part of the final version of the paper.
So everything needs to be done between those two dates.
And this essentially is like two, three weeks of a time.
So it's very, very tight.
And it leaves very little room for, you know, things to be.
be done properly. But regardless, this is particularly the case because Euris as a conference recently
switched to a dual submission deadline. So essentially during one year, there are two submission
deadlines, one in the fall and one in the spring. So for Eurosis 2025, we had a spring deadline
in 2024 and a fall deadline again in 2024 for the papers to appear in 2025. So essentially,
This is the timeline.
It's hard to extend it because of all the other deadlines, you know, submission deadline for the program committee to review the papers.
So we have to work with that tight timeline.
So essentially what we propose, because this is a consistent challenge across the years,
we propose that the authors essentially start preparing their artifact early on, even upon submission time.
Essentially, we would like to have authors have their artifact ready when the decisions are out,
even though these are negative or positive.
Even in the case that we have a negative, like a rejection, this process of preparing the artifact
is very useful because the paper will be submitted in another venue and the artifact will be
ready regardless.
So short term, we really want to motivate the authors to have their artifacts ready because
there is no time to properly prepare an artifact during this very tight timeline.
There is only time to communicate with the reviewers if things are not working properly
and there are like small fixes that need to be done, but in one week you cannot properly
prepare an artifact.
And then long term, there is discussion among the chairs and this is something that we want
to discuss further with the steering committee and the community about making the
process mandatory so that every author knows that I submit a paper, I should also submit an artifact
to reproduce the claims. And this is something that would be extremely beneficial. And of
course, it's a very complicated decision. So this is something that we propose to the community to
discuss. Yes. And another challenge that we identified and picking these issue of
communication that Thalia raises related also with the communication between the Committee of the
Main Conference and the Committee of the Artifact Evaluation Conference. So as we said before,
these are separated processes and the committees are built independently. So on the main
conference, usually there are very senior researchers. On the Artifact evaluation, there are junior
researchers. Sometimes, I would say, a big chunk of those reviewers are PhD students, and
therefore there are all sorts of differences in experience and expectations about what the
artifact evaluation should be about. And this is, in fact, a challenge that we identified
over the past few editions in a quite consistent manner, which is, so there are many or several
mismatched expectations about what results should be reproduced. For instance, the authors might
think that those are our key results. The PC members of the main conference should think that
there might be some differences there. And then the reviewers of the fact evaluation can also
have a different understanding of this. And this is something we find out. And then this creates all sorts
of friction that we also want to address over the coming years.
So in the short term and what we want to propose is introduce an informal communication channel
between those two committees that again have different experience, different seniorities and
different timelines and periods where they work to have the reviewers, for instance, flag
very clearly what claims should be reproduced and at which level of detail and then this
information could be passed down to the reviewers of 35 evaluation. So this is something that
we believe should be achievable in the short term and do not create too much overhead on either
side of the process. Over the long term, our goal is to kind of formalize these
disconnection and required alphors early on, so at submission times say, to declare which
claims are the main claims and which claims they plan to be supported experimentally by
the artifact. And this would allow us, allow the reviewers of the main conference to validate
that list. And then, of course, this is always an iterative process and maybe propose adjustments
to the authors, that the authors can react accordingly when preparing their artifact.
And the goal here is essentially for us to have a good match between the expectations of both
committees, because we believe this is good for the authors that they know already beforehand
what they should target in terms of reproducibility of the results, and also for the reviewers
on their de facto evaluation, because they have a clear list of things to be.
checked against.
Let me take the next
point. So based on the numbers
that Miguel mentioned earlier,
only half of the submissions
actually are
reproduced.
Other results are reproduced.
One of the major drivers
of that is actually the use
of specialized hardware in our community.
So you may need a special server
with certain capabilities,
but it can be much, much worse, right?
You may need an Android phone.
It needs to be physically present.
And you can imagine sort of all sorts of craziness going on in the systems,
just because we are building new things.
And I think it gets even worse if hardware is involved,
because then sometimes export control issues also arise,
depending on which country and students are coming from.
So all of this makes it very hard to sort of scale reproducibility.
And we've sort of seen this over and over again.
I think the current solution is on the sort of a short-term fix
where we try to ask authors for what the requirements are,
whether they can share the hardware through SSH.
But it's sort of very limiting to then assign only a few subset reviewers
that may have able
the same hardware available.
So it's a bit problematic.
In the long term,
we need to find, I think,
better systems to share
and be able to reproduce
those results,
which is certainly not easy.
Absolutely.
And moving on to the next challenge
that we identified
is the fact that the
artifact evaluation chairs and the reviewers they change every year and this is not the case for
example for the technical program committee you know in the program committee the chairs change every
year but reviewers repeat especially senior reviewers typically they repeat in the committee for many
years but this is not the case for active art evaluation the chair change every year and also the
reviewers, usually PhD students may be participating in those committees for one, two times, but then
they don't. So we have to, you know, redo a lot of things and relearn a lot of knowledge. So what we
propose is to kind of mimic the program committee and establish some sort of steering committee.
That could be consisting of people that have served as artifact evaluation chairs in the past.
This could be also done potentially across systems conference to help transfer knowledge
because essentially it's the same process across the different systems conferences.
Another proposal would be to prolong the service duration, for example, having chairs to
serve for more than one year, but of course that is complicated and puts more load to them.
But essentially, we need some sort of solution to be able to transfer knowledge, to maintain
best practices, to also work on the challenges that we mentioned so far so that essentially
we get solutions. So we need some sort of steering committee or some sort of having a group
of people to transfer the knowledge across years.
And again, with the committee, it is important that the chairs create evaluation committees
with both junior and more senior members, with people that have expertise in reviewing
artifacts and also allowing, of course, younger researchers to enter the service.
Exactly.
So another thing that we also identified, and it was a two,
honestly quite a surprise to, I guess, to all of us, is that even if we have the artifacts on
the DIY-backed platforms, which are designed to be long-term, ensuring that the artifacts remain
there over the long-term is not trivial. And in fact, we found a few cases where some
artifacts were removed after they get the badges, which is, of course, something we don't
want to happen, right, because the badges should mean something very, very precise.
So one of the, so this is more a technical aspect, but over the short term, what we plan to
do and to propose is to require that all artifacts are indeed stored in these BOI-backed
platforms such as they know or fixed share, but also restrict deletions without previous approval
from either this steering committee that Alia mentioned or the committee that is active during
that year or the original committee when the paper, when the award was given.
So this is still, the details are not very clear to us, but.
This is definitely something that we want to make sure that artifacts remain available.
And this gives credence and credibility to the badges, right?
And of course, over the long term, we also want to refine these.
And when authors submit the paper, to also declare explicitly,
and this is related with a point we already discussed earlier,
which results are reproducible when the paper is submitted,
and then this would get in the camera ready as well.
So this would allow us a better quality control
and making sure that if that is a badge,
it's meaningful when the badge was awarded,
which is the case nowadays,
but should also be meaningful over the long term.
The last two points that Talia and Miguel raised,
I think are making the point that I'm going to raise even sort of words, right?
So we have sort of the short-term stewardship of A.E. Chairs and even evaluators.
And then some of the sort of long-term or descriptions in the artifact are not quite ideal yet.
But in combination with that, the badge definitions themselves, the language is a bit
in precise. So
it needs quite a bit
of guidance, especially for
new evaluators, to understand
especially for the reproducibility
badge or also for the function badge,
to understand
what they are supposed to check.
And sort of to
ideally you want
evaluations between two
artifacts to be equally
harsh on them, right?
Because otherwise
one paper may receive a badge
the other doesn't, but was not properly vetted.
So, neither the authors of the paper, of the artifact,
nor the ability to sort of have a positive experience in that case.
Yes, exactly.
And one thing that we can also discuss is the fact that so all of these
in definitions hurt, but on the bright side,
I think we can say that the artifact evaluation has become kind of standard practice in systems research.
So in the C's artifacts site that we mentioned before, besides Eurocis, we also have there, for instance, SOSP and OSDI that also participate in these process with the differences that are relevant for each community, of course.
but this what we also identified and discussed in our internal meetings when thinking about this and building these results
is that this practice is still very widely across different computer science subfields so on some some some domains priorize availability over reproducibility and there are certain certain league merits on that other
like machine learning and HBC,
they have separate reproducibility challenge,
so it's kind of a different way to do this.
What we have tried to do over the past few years
is to develop in EurSIS, kind of a blueprint
to allow the systems-oriented community
to improve in this artifact evaluation process,
because all of us believe that it is very relevant and critical for science in general
and also for the industry as well.
And so my feeling, and I think my colleagues share this feeling as well is that,
so this is a step in the way that is still lots of work to do, as we discussed earlier.
And our end goal, I would say, is to make these more.
inclusive, more consistent, and more aligned with realities of modern research requirements
that have different challenges, and I think we will discuss this, but when we go deeper and
think about this, there are many challenges that are still present nowadays, that we want
to do our small contribution to improve upon. Yeah, definitely. From what you guys are saying
there, I mean, there are a lot of challenges outstanding still, but the direction of travel is going
in the right place. Things are going in the right direction. Even from afar, you can sort of
see that the things are moving towards a better future. And a lot of the proposals you mentioned
there are really sort of appealing as well. And you can see how they would help. And things like
standardisation across different subfields and CS definitely would help as well. So yeah, I think
the next section of the podcast, let's do some reflecting, more broader reflection on your
time working with artefacts evaluation. So let's focus on solving what your advice is. So let's focus on
So what your advice would be for future authors, reviewers or chairs or people involved with
artefact evaluation in the community. So Thalia, yeah, you kick us off. For authors, maybe,
what would authors, what advice have you got for them? Absolutely. I think for authors, my advice
is to start preparing the artifact early, ideally before you even know if the paper is accepted or not,
because this is a process that will be useful regardless of the outcome.
And think of your artifact as part of the contribution, just an afterthought.
So write code, ideally open source it, after the acceptance, of course, if needed.
Try to write clear documentation, automate scripts that will help the evaluators run seamlessly the code.
A well-structured appendix really, really helps and make sure that your work is reusable and reproducible.
And really this effort put into preparing the artifact will pay off tremendously after that.
Yeah, definitely. I echo that sentiment.
Miguel, how about reviewers? What advice have you got for them?
Yeah, so even if the artifact is very well-prepared, reviewing is a lot of work.
But I think it's very valuable work.
So my advice would be for reviewers willing to participate in this process in the future editions
to approach these with a collaborative mindset.
So the goal is not just to put a check on the CV that I did these.
And checking boxes, this plot is exactly the same in the paper.
This is not what we aim to do, right, but also be helpful to outforce and help them improve the quality
and the impact of the research, because today you are a reviewer, you will learn in the process,
and next year, hopefully you will be an author going under this process.
So this is something we also try to encourage reviewers to think about, be constructive,
to communicate clearly, and sometimes remembering that small fixes, small suggestions in the artifact
can make a big difference towards our goal of having better reproducibility.
yeah definitely mindset is very important
and Joe yeah for future chairs
what would you say to them
I mean get in touch with us
that's one thing
I think having continuity
is sort of
growing right
the knowledge that that is instilled in
this process is important
similar to how
people have learned how to run
over hundreds of years, right?
I think generally, over the past few years,
we have invested quite a bit of time in templates
for the artifact appendix,
in guides for authors, for reviewers, chat lists,
just to sort of help make this process as easy
and as similar as possible.
But of course, those documents aren't perfect, right?
We are trying to improve them over time
and I hope future chairs will sort of pick up on that work and continue to evolve the process,
but also the documentation to, yeah, make it make it even easier for the evaluating community at the end.
Yeah, definitely keep iterating right towards a better future.
Cool. So yeah, speaking of the future then, that's a nice segue into the next question I've got.
And that's right. We get our crystal ball, aren't we'll want to look five years in the future.
What does artifacts evaluation look like, failure?
Yeah, it's interesting because already five years have passed.
So I'm very curious to see in the next five years what will happen.
But I really believe that the artifact evaluation process will be more standardized
and more of the authors will be incentivized to participate.
Because right now, to be honest, the percentage of participation is quite low.
Only half of the accepted papers, around 60% of the accepted papers,
participate in the process of artifact evaluation, and even less, even half of them actually get the reproducible badge.
So I would really, really like this number to increase, and I really hope it will do.
And it's more of like a mentality change.
It's more of a mentality where we write code, we document it well, we make it runable, we make it reproducible, we ideally open source the code and the community can build upon and extend that work.
And this is very, very important in systems research because we, as program committee members,
we always ask the reviewers to compare their system with all the other prior systems that existed
and show that it's best.
So we need code to do that.
We need open source code and runable code to properly evaluate the new systems that we propose.
So I hope that we will get to that point in the next five years.
Yes, so this is my wish as well and something I would really like to see happening to go from this roughly 60% to a big, big number.
So of course, there are always very valid and legitimate reasons that some papers or some systems cannot go undergo this process.
So there are industry-related restrictions that is this amazing paper.
and the authors want to do a startup with it.
So this is a perfectly valid reason as well.
But I believe that increasing substantially the number of papers that go through this process
would benefit not only the authors themselves today,
because they will have better code, better documented and so on,
that they can themselves build upon a better foundation,
but also picking the point that Thalia was mentioning.
So this makes it very, or not very easy,
but easier for other people to reuse the work and build upon it.
And one of the pains that, well, I love working in systems, right?
That's why I'm here in part.
But one of the pains, and I think this is shared with every one that I mention
is how hard sometimes is to make someone else's system run in a consistent and fair way.
And I think this is a big challenge in our community.
And having most of the artifacts available, I think would be very, very available.
To be clear, to go through this process and get the badges that they merit,
would very much help this effort and would benefit everyone, in my opinion.
So from my side, I think I'm a bit more pessimistic.
So I think both of your points are very wed at them.
I hope that it happens in this way.
But I think especially in the last two years,
Sirius has seen an explosion of 50, 60% of more submissions
and acceptance has gone up in a similar range for papers.
So this leads sort of to me worrying about how we skate.
Talia was already mentioning
we have a hundred
evaluator
committee
huge size
and every one of them
is spending to reproduce
an artifact
probably tens
20, 30 hours
of their time
to evaluate that
artifact and sometimes
maybe even more.
So I think
what I would like to see
for years is
a way of
sort of automating,
scaling the artifact evaluation process to be able to cope with yours is accepting 200 papers
and us evaluating 200 papers because I think the current system is not completely set up for that
and it leads to quite a bit of work both on the chair and the evaluator side to sort of steer that process
so ideally we find better ways to do it. Yeah, fantastic. Well, I think that's what
brings our podcast to an end.
We should probably reconvene in five years
and do another state of play
and see where things are at
and any of the predictions
have come true.
But yeah, thank you very much joining me today,
folks. It's been a really insightful chat
and I'm sure the listener will have absolutely
love the conversation as well.
I'll drop links to everything in the show notes
so you can go and check everything out.
And yeah, thanks again, guys.
I hope you enjoyed it.
Thank you.
Thanks. It was very fun.
Bye-bye.