The Changelog: Software Development, Open Source - Reproducible builds and secure software (Interview)
Episode Date: February 3, 2017Chris Lamb joined the show to talk about his project Reproducible Builds — which is funded by The Linux Foundation's Core Infrastructure Initiative. We talked about the importance of having a verifi...able path from source code to compiled binary, what this set of software development practices is all about, what it means to have Reproducible Builds, the challenges faced when implementing these development practices, and the inherent security you gain from them.
Transcript
Discussion (0)
Bandwidth for Changelog is provided by Fastly. Learn more at Fastly.com.
I'm Chris Lam, and you're listening to The Changelog. welcome back everyone this is the changelog and i'm your host adams dekoviak this is episode 237
and today we're talking to chris Lamb about reproducible builds and the importance of
having a verifiable path from source code to compiled binary.
We talked about all the details of the project, what it means to have reproducible builds,
the challenges faced when implementing these best practices, and the inherent security
you gain from using them.
We've got three sponsors today, GoCD, Linode, and our friends at Flatiron School.
Our first sponsor of the show today is our friends at GoCD.
Head to gocd.io slash changelog to learn more about this awesome open source continuous delivery server.
GoCD lets you model complex workflows, promote trusted artifacts, see how your workflow really works,
deploy any version, any time, run and grok your test, compare builds, take advantage of plugins
and more.
Once again, head to gocd.io slash changelog to learn more.
And now onto the show.
And we're back.
We got Chris Lynn joining us today.
Jared, this show is one of those shows you have to listen to
if you care about software security,
making sure what your source code is matches the thing you actually embedded
into your device or whatever you ship that binary you put out there.
This came from a ping repo issue, though, actually presented by Chris.
What do you think?
Yeah, it's reproducible builds is the topic of the day.
And this show was very much Chris's idea.
So you and I can't take any credit
or any blame if there is any to assign.
We'll see how it goes.
But Chris pitched this to us
and interesting topic.
And one, Chris, that you think
more people should know about.
So first of all,
thanks so much for joining us on The Change Log.
No problem at all. Very nice of you to have me on here.
So the spirit of getting to know our guests a little bit before we hop into reproducible builds and why you believe they're so important.
We'd like to get people's origin stories and kind of find out where they're coming from.
So can you tell us how you got into software and how you got to where you are today?
Well, my software journey starts pretty early, I guess.
I was brought up in the UK and in primary school,
so I may have been around seven or eight.
I started experimenting with programming on the school computers,
but it wasn't until a friend of my mum's,
he was clearing out their um a technical guy he was
clearing out his his sort of uh hacker shed of um old equipment and he was going to take it down to
the local um you know where you could get rid of all computers and stuff like that but on the way
he unfortunately had a small car accident and the the computer that was on the the passenger seat
went right the way through the windscreen into the field.
Police came, et cetera, et cetera.
And they brought all of the equipment back to his house.
You know, they just put it all back in his car, you know, whatever.
You know, he wasn't going to continue on to the tip, as we call it, for some reason.
Anyway, he went back.
And this computer that was meant to be broken and old, he was this kind of person that, you know what, it would be absolutely magical if this computer now works.
So he plugged it in and flipped the on switch.
And lo and behold, for some reason, this car crash had actually resurrected this computer from the dead.
And he took this to the site so he couldn't throw it out then.
And eventually it was basically in the way of being a doorstop.
So he managed to offload it via my mum onto me.
And it was this old, extremely old 8088 IBM computer.
It was dreadfully old, even for the time I got it.
But it had no games or anything on it.
It just has a copy of Turbo Pascal
and every 10th reboot,
for some reason,
it would revert to basic,
the basic programming language.
There was a ROM built into the motherboard
that for some reason,
if the main operating system didn't boot,
it would revert to a basic environment.
So I got some books out of the library
and started programming my own basic things like that.
And then eventually
on from there, really,
just sort of steady stayed up.
Moved into some
parallel programming, I guess.
And by university,
I was programming Python a lot,
C, C++,
and doing, you know,
the usual Java, blah,
as university courses go
and things like that.
After university, I joined a startup in London and did that for two years.
We were acquired.
And then because we seemed to work together quite well as a team,
we decided to stick together and we did Y Combinator.
And I was with that company for four years.
What was that?
This is a company called Thread.com.
It's still a growing concern.
Really great guys.
I just thought I sort of had enough of London by then and wanted a new
challenge.
And this sort of freelance digital nomad lifestyle was sort of calling out
to me.
So I sort of jumped two feet into that.
And that's what I've been doing for the last couple of years,
doing freelance projects um doing a lot of debian work as in the operating system debian and uh all sorts of really interesting varied projects sort of all around the world
really it's been really fun digital nomad that's a lot of fun so you're, it's a pretty pretentious title, but yeah, of course.
I mean, it's the dream, right?
To travel the world and write code and or seek out your personal hobbies and fun stuff like that in all places.
That's a lot of fun.
Yeah, it is really rewarding.
Yeah, I can recommend trying it at least one time in your life.
In fact fact you're
calling in today uh from new zealand so quite a ways that's right calling from from auckland
new zealand i'm looking over a beautiful bay right now and um yeah it's a little chilly here
um but um it'll warm up it'll warm up and going back to the origin story of yours yeah i can't
have a notice that you mentioned that every 10th boot went back to basic i was just thinking jared how much fun that might be to be like having a
computer roulette so to speak like what will i program today because of the computer and what
will force me to do yeah that's interesting i mean to some degree i wonder if all computers
did that like it you boot your mac today and it's not a mac it's a windows or something i don't know
that'd be great or it goes the other way where um instead of reverting to basic it's it says no sorry today chaps you're
you can only program in haskell no so origin story that's a that's a fun piece there what uh
what got you into open source where where did that happen for you um somewhere along the line i got a
came across a book about slackware linux and it came
with a cd and things like that and uh this is before um the internet was you know of any
reasonable speed and so you pretty much have to send off the linux distributions and all my
computers were always very old so i was never really playing well playing and getting distracted
by gaming and things like that so i played around
with the slackware thing but even that was very old so i said there was a company in the uk called
the linux emporium and if you sent them you know sort of five dollars worth they'd send you you
know the latest red hat cds on seven discs or something ridiculous and i'd heard of red hat oh
you know reputable blah blah blah i get that They sent off for that. They also said, oh, we could include some free extra CDs
if you want. Yeah, sure, whatever. I'm 13, I have no money,
so whatever. Send me as much free stuff as you like.
Anyway, I went to install this Red Hat, and it said, oh, I'm sorry, sorry, sir, you need at least
you need a very powerful computer, you need at least
12 megabytes of ram to install red hat
and i think i only had eight on this rather rather lackluster machine so got a bit rather annoyed
and um so reached one of these free cds which again were old for the time they were free because
they were the previous releases and one of them was a very old release of Debian.
And the whole operating system there just completely clicked with me.
Installing stuff was pretty simple.
Installing the operating system itself.
And I ended up using that for many years just as a user,
running my own little web server between me and my cupboard.
Just like, oh, this is amazing.
But I didn't have the internet, so, you know, I can, wow,
I can, you know, type in HTTP 192.168.0.2.
But what would be on your web server that you could possibly want to,
like, would you write up there and then read it later?
Or like, what kind of stuff would you even access to your own house i don't really know that i wrote then because
that was in my own house you're quite right but it was um i think it was copies of software i'd
seen on the internet at school like i i was but pearl based um guest books they were all the rage
at the time that might be way of aging where it was also um it was beginning time of those short url redirectors so this is when you
had domain names like i.am so you would basically a free redirection service that i am you know your
name and it would redirect you so i was writing sort of pearl versions of those in cgi script
the good old days yeah good old days i know you're quite a prolific open source-er
in terms of, well, in terms of what prolific means.
You have lots of open source code.
You've been working on Django quite a bit.
You've been a Debian package maintainer, I believe,
or at least involved in the Debian project since 2008.
On your GitHub, you have 216 repositories,
and 129 of those 216 repositories,
and 129 of those are source repositories,
so you actually began all of these.
Yes.
What's the deal?
Do you just code all day and all night, or how do you get so many things going?
Well, a lot of these things are sort of spinoffs
from other projects or perhaps from freelance work as well.
So a lot of the Django tools I've done
have been like, well, I think this will be,
you know, in the code base, this should be modular anyway.
And as it's a completely reusable component,
let's just remove it out there.
And then it can become more generic, abstract,
other people can contribute to it, et cetera.
And it's sort of good to share back
because et cetera, et cetera.
So that speaks to most of the Django ones.
The other projects, a lot of them are just scratching my own itch.
Like I wanted something to, I think I was looking for a new bike.
And there's a sort of Craigslist in London called Gumtree.
And so I decided to, I knew exactly what size and what sort of make i wanted so i made a script to um yeah
to poll it every five minutes and to send me an email when a particular when something that
matched my specifications arrived and so i was ringing up these people within five minutes of
their advert going alive oh yeah is the bike still available i've just posted it mate i don't know
how did you get it so quickly so a lot of these are scratching my niche.
Some of them people use, some of them people don't use.
But I find putting the code out there keeps myself honest.
It also makes me follow through on projects a little bit better
because there's some sort of vague accountability
if you're putting it on GitHub.
Not much because no one's looking over your shoulder.
That's the basic idea.
I wonder if everybody has the same
I don't know how to describe it, but like the
fact that you do some freelance work
or you've done freelance work
over your career and
instead of simply writing it into that
code base that you're writing into, you
think in a modular way and you think
about the community. I wonder if that's just like
a common thought amongst developers,
if that's,
if that's something that like they need to hear something like your story to
think like,
I should do that too.
If I'm writing software for somebody,
you know,
if I could bake that into my,
you know,
my contract with them,
like,
Hey,
if there's an opportunity for me to open source a module,
whatever,
you know,
obviously I'll disclose that to you or whatever,
but baking that into the ability to be a freelancer and actually give back to open source a module, whatever, you know, obviously I'll disclose that to you or whatever, but baking that into the ability to be a freelancer and actually give back to
open source.
I wonder if that's just common knowledge to do that or it's common things to
do.
I think some of it depends on maybe your attitude and your outlook.
So from my own personal perspective,
I've always felt like I should only like open source the things that I think
are great or useful or polished.
And that always leads me to not open source anything.
The imposter syndrome, basically.
Yeah.
It's not really imposter syndrome.
It's more like values, non-valuable.
It's not like I don't belong here.
It's just like, who would ever want to use this?
That's imposter syndrome.
I don't know if it is.
It doesn't really feel like
it's an edge case of it i don't belong here it's just like you know this is maybe i just code for
myself so just to compare with you chris just the other day i was writing a little script
that you had your bike script that would like check the you know check for you every five minutes
that was very similar only it's like a cigar bidding website. Anyways, I like cigars. And so I'm just writing this thing, you know, that's just helping me get cigars at good prices.
And like, I never even thought once to open source that.
Like, it'll probably never leave my hard drive.
But you, on the other hand, you're like, I'm going to put this up on GitHub.
Yeah, I think it also immediately solves the where do I put this file as well.
Good point. so it immediately solves the where do I put this file as well good point
do I lose it in my directory
like random directory structure
but if it's on github then
it's kind of a backup right
if you squint it's a backup
right
I hear you say Jared it's on my hard drive where
if your hard drive dies today
Chris's
github hard drive does not.
And even if maybe somebody doesn't find it useful or even desire to watch it or fork it or whatever or contribute, you know, it's still yet.
It's like Chris said, there's a backup there.
And worst case scenario, somebody else is like, hey, that's a really awesome idea.
I love cigars too.
And now you've got a new buddy.
Yeah, sure.
And also like it's a generally like, you you know it uses a mechanized library and it
it logs in it does a few things where if you would like to automate some things on the web
you could look at that little script that to me doesn't seem a much value and you could say
here's how you might do that and you could tweak it to your own uses similar to maybe i could take
your bike script and apply it to tricycles or something um i don't know why i came up with that
example but so now i'm talking myself into i should open source some more stuff basically and apply it to tricycles or something. I don't know why I came up with that example.
So now I'm talking myself into,
I should open source some more stuff, basically.
We should all just be,
but aren't we like,
somehow we're just maybe like heaving crap out there
for other people to sift through,
you know, like adding more noise to the ecosystem.
I think there's levels to open source, right?
There's like infrastructure open source,
which is like, in quotes, important, you know, and useful. And then there's levels to open source right there's like infrastructure open source which is like in quotes important you know and useful and then there's other things that are sort of like uh
tinker tools that sort of just embrace the inner kid in us the playful manner and there's a side
of that playful manner that helps you get into the state of flow and helps you go beyond just
like simply learning and it's like right now you actually you know absorb what you're doing
and so it kind of brings out these different attitudes and the developer behind the code and
those who interact with it so i think there's room for that i don't think there's i don't think
it should all just be so serious sure i think in fact uh shout out to cody peterson who was
our designer on changelog.com. You front-ender.
He has this idea, which I'm sure everybody's had this idea,
but he brought it to my attention of GitHub should have tags,
arbitrary tags that you can assign to your own repos
in order to provide context.
You could tag something satire if it's a joke,
or you could say this is a one-off,
or you could have all these different tags that would basically say,
look, this was me messing around.
It's not a serious project, you know.
Or you could tag it, like,
you know, the problem with tags is they're so arbitrary.
Point being is, like,
if we could classify our repos a little bit better in public,
it might help.
What do you guys think about that?
I think that'd be really good
because then I think a lot of people
wouldn't be making these decisions
about that gray area of, well, shall I put it up there?
It's probably not going to be useful. They just put it up there by default, not having to think about it, but just shove one of your tags on it saying, yeah, this is a bit of a toy.
You know, it doesn't even work.
It's broken now.
But it certainly has like more value out there than being on your hard drive
and then it'll eventually die and you'll get lost.
Yeah, absolutely.
Well, let's get into reproducible builds.
So give us the, I don't want to call it an elevator pitch
because it's not a business,
but it's a concept, it's a best practice.
It's something that, Chris,
you think people should know about and do.
It's also something you've been giving sessions on.
You spoke at Linus Conf in Australia recently, which is kind of why you're in the New Zealand area.
So give us real quickly an understanding of what reproducible builds is, and then we'll come back from the break and we'll dive into it.
Sure.
So reproducible builds are a set of practices and philosophy, and it's all designed around to ensure that there's a verifiable path from the source code and the binaries that are being run in your machine.
So if you get, the basic problem is that whilst you can inspect the source code of free software, most Linux distributions, Android, et cetera, provide pre-compiled binary packages.
And so you needed a way of being able to correlate
the binary that's being running on a machine
with the original source code.
And this is particularly important in the modern era
because there's incentives to crack build infrastructure.
If you want to, you can go after a lot of users
by attacking the developers.
And if you can get some malware into a developer's machine,
you can infect all of their users in one go.
I never really considered that part of it, Jerry,
when we were doing the pre-call.
It was like the attack on the actual developers.
Yeah.
I was thinking just simply source code in the binary
that gets put on whatever and runs
and how that gets circumvented,
not the developer's machine or themselves. Indeed. And there's a psychological angle to
that as well. I mean, you can, I could hack someone's developer's laptop, for example,
without their knowledge, but also I could come around their house with a baseball bat. I mean,
it's pretty crude, but, you know, please include this backdoor in your software or blackmail and things like that.
So all of these things protect developers from that happening.
So it'll be of no value to threaten a developer with such things because anything they would do would be caught by the rest of the community.
Well, let's push the pause button real quick. And on the other side of the break, we'll talk more about reproducible builds, why they're important, who's working on them,
and what Chris thinks everybody should know and take away. So we'll be right back.
We're working closely with our friends at Flatiron School to promote their free
online courses. They've got Bootcamp Prep, Intro to Ruby, Intro to JavaScript, and also Intro to Swift and iOS.
In this segment, I'm talking with Kaylee Gray, an alumni of Flatiron School,
who started with their free Intro to Ruby course.
Then she enrolled in their online web developer program.
And now she's working full time at FBS Data Systems as a developer in Fargo, North Dakota.
Take a listen to Kaylee's story.
I studied math primarily in undergrad, but I was also a computer science minor. So I've had
exposure to programming, but before Flatiron, I was pretty timid as far as programming goes.
I definitely didn't have much confidence in that arena. After Flatiron's Intro to Ruby course, I felt more confident in my ability
to pursue programming as a full-time career.
One of the things that I liked
about Flatiron's Intro to Ruby course
was that I was forced to use the terminal,
which up to this point had been daunting to me.
So it was really empowering to feel like I could go in
and make these changes and program these things
that I didn't really know I could do.
If you're like me and you're curious about programming,
but you're feeling a little unsure
that it's something that you can do,
you can try Flatiron for free and see if it's right for you.
And you'll probably like it.
This is great.
All right.
There's nothing I love more than a success story.
And Kaylee is an awesome example.
You can follow in her footsteps.
Head to Flatiron500.com to learn more and enroll.
These courses are totally free to enroll.
The bootcamp prep course is only available to 500
students. So if you're considering this, do it today. Once again, head to flatiron500.com to
learn more and enroll and tell them the change law sent you. All right, we are back with Chris
Lamb talking about reproducible bills. And Chris, we gave it a definition before the break. Like we
said in the intro, you opened up this idea of saying more people need to understand this as something that's important for various reasons.
Can you reiterate a little bit exactly what reproducible builds is and then again, why they're so important?
And we'll kind of dive in from there.
Sure. No problem.
So this isn't about reliable builds or repeatable builds or anything along those kind of lines.
It's really about ensuring that there is this connection between a user or developer can
confirm that the binaries that they're running on their system correspond to the source code
they're expecting to be run on their computer.
So if you kind of wind history back to the sort of richard stallman's early
ideas about being able to run software on your own computer whilst you can get the source code for
you know or a free software operating system and etc most of these distributions are providing
binary packages to you that are being compiled by someone else or different build farms.
And it's really important that no inadvertent,
malevolent or accidental changes have been introduced during that code path.
There was an example given a few years ago of an open SSH binary that differed just by one bit of one byte,
which changed a greater than or less than comparison
to just a greater than.
And just that one bit meant that you could have a root exploit.
So the difference is, I mean, if you ran them through a diff tool,
you'd only see that one byte change, that one bit change.
Yet one would be secure and one would be, well, hopefully secure. One would be hopelessly
insecure with that root, with the backdooring, and one would be hopefully a little bit more secure.
So reproducible builds prevent these changes being added behind your back as a user.
At what level does the reproducible build take place? Is it like,
you know, you you got your list of
who's involved and it involves various levels of linux bitcoin things like that is it us trusting
them to say they adhere to reproducible builds and that's what gives us faith and trust or is it
is it a different level it's i think it's on a different level it's sort of a kind of community
set of tool practices and things like that.
If you jump into the details,
what perhaps reproducible builds
can be quite a misleading term.
I mean, code provenance might be a better way
of phrasing it and things like that.
The way we use the reproducibility
is that we ensure that compilation
of any piece of software always has identical results.
So that means if you run GCC
on a C file, you get an ELF binary at the end of it. And if you reran that compilation process,
you'd get the exact same ELF binary. The MD5, the SHA1 checksum would be just identical.
Then what happens is that you ask multiple other parties
to do their own builds of this same source code and then you get together um hopefully
electronically and uh compare your results so if i got results you know one two three four
assuming that's the checksum and uh you got one two three four and everyone else got one two three
four we can pretty much agree
that if you compile this source code you should expect this binary and if someone come came along
saying oh i get one two three five you would have an inkling that something was different about his
build environment and he could have been hacked he could have something uh breaking his compiler
and things like that but basically there's just something fishy going on
that warrants further investigation.
So that's where the reproducibility comes from.
So ensuring that everyone gets the same result
is where the word reproduce comes in.
So if someone can reproduce your build,
that's where that verb gets added there.
Hasn't it been for a very long time
that when package managers
or anybody who's pre-compiling binaries
and releasing them
publishes their checksums
alongside the downloads
so that you can download the file
and then run your checksum
and make sure it matches theirs.
Isn't that,
is that basically what you're talking about?
Or this is just another level of saying,
okay, well, that was two computers.
We're going to do it on thousands
and make sure that it's always the same?
Yes, pretty much.
I mean, I think when a developer
has a checksum extra file,
what they're trying to do,
if it's just a SHA-1 checksum, for example,
that's typically only to ensure
that an end user can validate whether
the the download completed successfully so for a very large iso image it's very useful to to say
oh yes it did download correctly so that's i think that's a different intention there but you're
right i mean if you had a hundred different checksums that people have provided it is pretty
much like that i built this piece of software
and i got this checksum and then it multiple people did the same thing it doesn't provide
any um authenticity so you would need to pair that checksum with say for example a gpg or pgp
signature you know to sign that binary just to say that I, Chris Lamb, generated this binary.
You see what I mean?
So you need to be very wary about what these checksums
are actually claiming about the source code.
Yeah, and just to explain it,
and you can help me if I don't have it correct,
but I think I'll lay out in terms of the checksumming.
A checksum is a one-way hash that's run on the binary.
That's right, yeah.
It'll always produce the exact same fingerprint on the other side.
The problem with that, especially as cryptographic algorithms get torn down over time, is that
while that exact same binary will always reproduce the exact same checksum, depending on your
algorithm, there are other binaries that can also produce that exact same checksum.
And so we call them hash collisions. And so that's why it's not giving you the level of confidence that
it's secure. It's simply a tool that you can use, like you said, to say, okay, I did get the file
all the way downloaded. It's not corrupted or there's no issues. So while people think that
those checksums are like giving us some sort of security confidence they actually
aren't is that fair that would be fair yes you can you can immediately make them a little more secure
by providing multiple checksums so particularly from different families of cryptographic algorithms
so i mean the the advice for years has been to stop using md5 right and things like that. And if you provide multiple SHA-1, SHA-256 checksums,
you can start to be pretty confident
that your download completed successfully
and things like that.
So give us the doomsday scenario
where we all go away from this conversation thinking,
well, Chris has an interesting point
about this reproducible builds thing, but I'm not convinced.
I don't care.
And so us as a community, we don't care.
We know that's not the case because we have the core infrastructure initiative is supporting this and lots of distributions.
And a lot of people do care.
But let's say that we just don't get it done and we don't have reproducible builds.
What's the worst thing that could possibly happen in terms of hacks or security
problems? Use your imagination if we can't have this guarantee. Well, one advantage is I don't
have to use my imagination. Some of them have already happened, although in small isolation.
So for example, someone used social engineering to offer a backdoored iOS software developer kit for download.
Maybe they bought a Google ad that looked like the download link,
you know, that kind of thing.
So a whole bunch of developers downloaded this,
and it worked completely like the normal iOS SDK,
except it would replace their adverts,
as in any adverts that the developer added, with the attacker's adverts as in any adverts that the developer added
with the attacker's adverts.
The idea was just to make money, really.
And so these developers
in their making their iOS apps
were happily making them.
Brilliant, you know, fine.
And then they went to upload them
to the Apple store
and they signed them.
And so the signing process was completely
accepted. You know, Apple said, yes, this is you. The signing stuff checks out absolutely fine.
But because their software development kit was backdoored, at the very last moment,
it would just simply replace the adverts with the ones from the attacker. And they only really
noticed when they weren't really getting any ad revenue back.
But you can quickly imagine what would happen
if the code wasn't necessarily to just replace adverts,
which sort of sounds a little bit harmful.
They're harmless, should I say.
Ignoring the economic blah, right?
But what if it was sending your address book
or things like that?
The original developer would swear blind
they're doing nothing wrong.
And from their point of view, they are innocent, apart from perhaps some rather lackadaisical security
on their part. But you wouldn't really know who to point the finger at.
And so that's pretty much the sort of worst case where you have no idea who these attackers are.
You have no idea where the software is going. It sort of seems to bypass a lot of
security features that were put in place, like the signing, which is entirely designed
to prevent arbitrary code being uploaded to these repositories and things like that.
So yeah, that's pretty much the doomsday scenario. Another thing that makes reproducible
builds quite salient in modern times is that some of the Snowden revelations refer to using backdoored compilers in a similar way in order to infect machines and things like that.
This is something that the NSA or have been, well, we know for certain they've been looking at it because of some documents released via Edward Snowden.
So yes, I mean, the
doomsday is sort of
here already in a sense, but we just
don't really know how pervasive it is.
Yeah.
It is particularly
insidious that you're not
coercing. You don't have a bad actor.
The developer doesn't have to be the bad actor
if you can infect the developer tools or the developer you know pipeline in any way and and
then when the point that the attack is successful like you said it's very difficult to trace back to
the original threat vector when the developer is ignorant of anything going wrong yeah they're
usually the one to blame that's an interesting though, that you can do that with the iOS SDK
and do something like you had said,
so harmless,
but it could have been something so harmful,
like an address book
or bigger exploits or whatever.
But that actually takes place.
But we all day-to-day utilize
some sort of software,
whether it's open source or not,
in a way where we just sort of like inherently trust it.
I don't know how often either of you do MD5 hashes
or any of this thing that you could do
to sort of like determine if it's truly
what you should be using.
How often do you do this, Jared?
Is this something you do day to day
or how often do you check the software
you're actually running?
So I used to use the checksums when I would, chris said when i download a large file um i and i used to i used
to do it thinking it was a security thing so a lot of people believe that and this was a little
bit of a misunderstanding is to think oh i'm more secure because i do this step right and um by the
way the other problem with that step of i'm going to download a file from this web page and then I'm going to run the checksum to make sure it's the same is that if a hacker actually hacked the web server insofar as they could replace the binary that you're downloading.
Right.
They also could have very easily changed the checksum to match that binary.
Right.
So it's completely not a security thing, even though I used to think it was for a long time.
And so that's that's more though I used to think it was for a long time. And so that's more when I used to do it.
But also back in the days where, you know, a 600 meg Debian ISO was like an all-day download.
You wanted to make sure that it worked right.
And so I would do it back then, but I don't do anything.
I'm very security lax, sadly, in my current.
Well, how much binary code are you running these days that where you would check it?
Like, how often are you either using anything that's binary where this plays into where you take a source code down to one file or something like that?
Well, I mean, anything that you nowadays Homebrew has a lot of precompiled binaries.
Right.
And I assume Homebrew has some built in, you know, I know there's some certificate checks
and stuff going on there.
Chris, you can probably talk to Debian's process since you're involved in the Debian package maintenance, but what kind of security is in place around package managers where people are
pre-compiling binaries and then, you know, we're downloading them and using them?
The Debian and by extension Ubuntu and Ubuntu Mint, et cetera, they use internally, they use GPG and signing.
So there's a known web of trust. So whilst it does validate the checksums of the files when
it downloads them, that's simply for integrity. In other words, it has the download completed
successfully. But there's an additional step, which is documented. If you search for apt secure,
there's a quite interesting wiki page on the Delian wiki about it. In a nutshell,
it basically uses GPG signatures and a key ring of trusted keys to say, okay,
the checksum of this file is X, and we have a valid signature that's in the key ring.
So therefore, we can trust this file to that degree. Is that what I mean? Yeah. Whilst you're completely right.
That prevents your example of the attacker being able to get into one of the many Debian
mirrors and replacing both the checksum and the binary.
That would fail because they would not be able to forge the signature, the extra step.
Right.
So there's that.
And then if we go into like the apple side of
things with the ios you know app store and whatnot those are all developer uh signed uh you have to
have your own certificate and sign your binaries before apple will accept those in and there's a
web of trust that they create in there as well so there are things that are in place but what
you're advocating for with reproducible builds is even more guarantees, not
just that these binaries are trusted, but that we can verify
their origin, the source code that originated them in a
reproducible way. That's right. That's a good summary because
whilst you could have a, I could, using the
checksum and a signature, I know that I've got this binary from, say, you.
And it's like, brilliant, I know exactly who I've got it from.
There's no guarantee that that corresponds to any particular source code that you claim it belongs to.
I have to take your word for it.
You have to actually trust me.
Yes, for that sense.
Yeah, yeah.
With the reproducible build setup, you could provide the source
code in that binary, and I could
not only compile it myself to say,
yeah, okay, it does check out.
I could also ask multiple third parties
to perform that same step.
And then
I can start to trust you and be saying,
yes, this
checksum with this signature
does correspond to this particular source code.
So it's sort of extending that one level back.
Right, so you gave it to us before,
but now that we've,
I feel like we've kind of wrapped our heads around it
a little bit more,
explain it to us again in terms of the process.
So reproducible builds is not like a feature
that you check box.
It's a set of practices that you can operate under
as a development community, right,
that gives you this verifiable path
back to the original source code.
Describe it to us again, the steps that get put in place
before we can say, you know what,
this is verifiably reproducible.
Sure, so the steps are,
the first step is you ensure
that your source code
always produces the same result
in a bit-for-bit identical way.
So this is removing any timestamps,
any variations based on
your time zone that you're in,
any non-deterministic behavior,
any randomness and things like that.
Basically, so if anyone
took the same source code and recompiled it themselves, they would get the exact same binary out that was completely
bit-for-bit identical. Then you ask multiple parties or multiple build servers or distributed
around the web, different isolated environments, perhaps, to compile that same source code.
And if they get the same result,
if everyone gets the same result
in that same binary that you got,
then you can start to say that,
oh yes, this binary here
corresponds with the original source code.
And therefore you can make this claim that
as it's very unlikely an attacker
was infected everyone simultaneously, that this really is the binary you get when you compile the source code.
There isn't any nefarious goings on and nothing has been introduced along the development tool chain.
So who all is involved in this process?
We mentioned before, but you've been awarded a grant from the Core Infrastructure Initiative
to work on reproducible builds. Is this something that the Free Software Foundation is working on?
Is it the Linux Foundation? Is it Debian? Give us a lay of the land on what actors think this
is important or actually putting efforts towards putting these systems in place
for a lot of our underlying operating systems and other things.
Well, it's quite a diverse group of projects, really.
I mean, you can find some old mailing list posts about people sort of attempting this
in the mid-90s, but it wasn't really on anyone's radar as a security vector for a while. And after the Stoddard revelations and the iOS, et cetera,
a lot of people started getting interested in it.
Debian was perhaps the forefront of the distributions,
certainly putting a lot of the initial activity
into reproducible builds.
But now we're a completely distribution
and project agnostic initiative and endeavor.
It's the Linux Foundation that are very generously funding my time and others' time to work on this.
But there's all sorts of distributions involved now.
We have Sousa, Fedora, Geeks, a bunch of BSDs as well.
So it's not even a Linux-only operating system of Arch Linux. But we also
have projects such as Bitcoin and Tor. I mean, you can imagine the incentive to crack the binary of a
Bitcoin wallet. If you could upload a backdoored Bitcoin wallet and replace the developer's
version, then you would become rich fairly quickly. You see what I mean?
That's my plan.
That's your plan.
That's your retirement fund.
I don't even know why you're saying this.
I mean, you're telling everybody my plan.
I'm just going to skim a little bit
off of everybody's Bitcoin wallets
and become rich.
Yeah.
Just a fraction of us sent every transaction.
Yes, sir.
They'll never notice.
But it's pretty scary.
I mean, to go back to the psychology earlier,
I mean, imagine being
that developer.
You're sitting at home,
you know,
how much money
could you make
by adding a backdoor
to that Bitcoin wallet?
How much would it cost
to hire a bunch of heavies
to go around to this house?
I think the economics
would work out.
The moral economics
perhaps wouldn't.
But just in terms of money, just, yeah, very scary.
So if I was that PP person, I wouldn't want to put myself in that situation.
I wouldn't want to put myself at risk for being targeted in that way.
And before we go into the break, I want to ask maybe a hypothetical.
Maybe I'm just being naive when asking this because I don't do this too often, but it sounds like reproducible builds
is a philosophy and a set of best practices that enables you to verify this binary from a source.
And often we have the option to pull down a compiled version or a pre-compiled version
of whatever we're using. Why not just opt to compile yourself? And that essentially,
if you're compiling from
source, you're essentially doing the same thing
as reproducible builds.
That's right.
It's still a convenience factor.
Right.
People don't do that too often.
No.
Gen 2 users would disagree, right?
They would disagree, but they would disagree on many things. Much less Gen 2 users would disagree, right? They would disagree, but they would disagree on many things.
But I mean, for example, do you really expect your phone to compile the software before installation?
I mean, I wouldn't want my phone to have to sit there and compile Chrome before it gets installed.
For example, that would be a little bit inconvenient.
I'm thinking more at the developer level.
Drain your battery, Adam.
I'm not thinking at the end user level
they're going to do that
because that's just too much to ask any user to do.
I'm thinking at the developer level.
Maybe I'm closed-minded
and only thinking of this in one lens,
but so far the concern here
had been installing a Linux version or iOS SDK.
I was just trying to play the devil's advocate
of why would you just not compile yourself?
I guess if it's a developer tool.
Yeah, I think convenience is a huge aspect of that.
I don't know when it was,
but coming from the Mac side,
Adam, like I said,
Homebrew now has pre-compiled binaries
for lots of packages that you install often. And so if you have to compile postgres from source every single time
you have a point update you know that could take depending on your machine it could take 10 minutes
it could take 40 minutes who knows right and so if you have every single piece of software that
you run you're going to compile from source does that how far does that go do i also have to
configure it myself and make sure I've actually configured it correctly?
I think there's a huge inconvenience there.
Well, it may be missing a bigger picture here
in that the security affordances
that reproducible builds provide
should apply to everyone, really,
to all users on any technical spectrum.
So they don't want their to-do list app
and things like that that they're just using as a thingy
to have any backdoors in.
Well, I think the bigger picture there
is just trying to figure out,
I'm just thinking if I'm a developer
and I'm going to use something
that has a binary,
why don't I just compile it myself?
And none of the arguments you've made there
of the conveniences and affordances
and if I'm going to every point release a Postgres,
I'm going to recompile a new version of it.
That's probably a big pain in the butt.
I'm just trying to figure out, I guess,
if is reproducible builds this philosophy, this best practices,
is it enabling me as a developer to have the ability to reproduce it
if I wanted to, and that's the security or is it?
Okay.
So if I wanted to take the convenience hit to actually compile it myself, I could to
prove that what I've gotten is coming from the source.
Indeed.
And what our goal is in the reproducible builds project is that there are enough people out
there already building the software that you can simply rely on those people to
provide you with a checksum that has consensus across say 20 or 30 different people.
And so you would never really have to rebuild anything yourself because all of these 30
other people agree that the binary should be X. You also have binary X. I'm happy with
that. That's fine for me and things like that. And that also
speaks to the end users as well. So they don't have to compile software themselves necessarily,
and if they want to, but if they just want to install a random app on their phone,
some sort of to-do list, for example, they can trust that the 30 or 40 rebuilders,
as we might call them, agree on a particular checksum.
And as they've got that same checksum,
they're happy installing that and saying,
okay, cool, the binary I've got corresponds with this source code.
There isn't any nefarious, nasty stuff being added
somewhere in the mix along the line.
You've certainly given me a speckle of fear
when it comes to installing
potentially minifarious apps in the app store because like there are times i want to use
an application in a genre like for example recently like with with music or something
like that i'm just like i don't know if i should trust any of these people because i don't know
any of the brands they the design isn't that great so like there's some known trust factors you sort of apply to potentially trustworthy software and that doesn't exactly define security or trust like by its look
but it certainly helps it if you care enough about its design that that it's uh trustworthy but
you know just in general you've given me this slight fear that somebody out there is using a
hacked version of ios that's replacing ads and
or stealing my data and and now I have complete fear but let's let's break that let's not open
that can of worms my fears out there the world knows about it let's break when we come back we'll
talk about uh other advantages beyond security things like that so we'll break here we come back
we'll go into that with Chris right back Linode is our cloud server of choice.
Everything we do runs on Linode servers,
the most efficient SSD cloud servers on the market.
And you can get your own Linode cloud server up and running in seconds
with your choice of Linux distro, resources, and node location.
They've got eight data centers all across the world,
North America, Europe, Asia Pacific, and plans start at just $10 a month.
You get full root access for more control,
run VMs, run containers,
a private Git server,
enjoy native SSD cloud storage,
a 40 gigabit network,
Intel E5 processors, super fast.
Use the code CHANGELOG2017 for a $20 credit.
Unlimited uses.
Tell your friends.
Once again, CHLog 2017.
Head to lino.com slash ChangeLog.
And now back to the show.
All right, we're back with Chris Lamb talking about this awesome thing called reproducible builds.
You need it to have secure software.
And maybe it's just a pain in the butt to compile from source.
As I learned that today, you can't do that every day.
But Chris, take us through some other advantages.
I mean, obviously, you got some security advantages here.
Where else should we go for this to help explain to the community why this is so important?
I think the biggest non-security advantage is given that every time you rebuild the software, you should get the same result. It means that if you make a tiny change to the source code,
you should expect there should be a result in the resulting binary.
And only those changes should be apparent in there.
You want to do a new release of software,
and you want to make sure that this new release only contains the changes that you want.
Reproducible builds make it very easy to analyze that, your new version
with the previous version. And if you compare them, you should only see the changes that you expect.
We've even written a very good tool for this called Differscope, which can recursively unpack
binaries and things like that and look inside them and provide a human readable view on a particular binary and if for
example it will decompile java files and things like that and pretty print python source code and
javascript source code and things like that which makes it very easy to say okay i've released a bug
fix release for this particular thing and only the changes i expect are in this new release so that's fine
i'm happy to push it out now this is a massive boon for anyone doing security releases for example
but it's also just if you just want to have really good um quality assurance you want to ship
something to your users you don't have any inadvertent changes like oh yeah it pulled in
this extra dependency whoops and now it's broken everything. Oh, sorry.
If you just change one line of code,
you should kind of just see that reflected in the resulting binary.
The other advantage
when things always build to the same result
is that you just by design,
you get better cash hit ratio.
When you speak to the guys at Google about this,
they're saying this is saving them thousands and thousands and thousands of dollars simply because when they compile a large piece of software, many parts of it haven't changed.
And as they're reproducible, they will always produce the same result.
And so therefore, it just says, well, there's nothing new to compile.
So therefore, I don't need to do anything.
So you can reuse the previous result.
So this not only saves developer time, it's saving the company money,
but also it's saving the environment in a sense,
because you're not wasting CPU cycles and generating CO2 and things like that.
Further technical advantages are, by design,
it removes any unreliable or non-deterministic
behavior in the development process. So if you really want to get the same result, your build
can't rely on anything that's based on timing. So any quote unquote unit tests that do, for example,
using time to check whether something should run in a particular speed or in a particular ratio of inputs to output time,
algorithmic complexities, as it were,
that becomes unreliable and therefore non-deterministic
and therefore can't be part of a reproducible build
and things like that.
It's a good way of finding bugs in uncommon time zones and locales.
So the two or three times I've come across it,
it's been Ruby libraries, so I'm not sure.
But a few Ruby libraries that have been designed
to manipulate dates and times,
their test suite fails when you run it in,
for example, UTC plus 14,
which should be a little worrying
because this is a library that the developer might be using
to say, okay, I know time zones are difficult
and date processing is difficult.
So I'm going to leave it to this library
and the library doesn't actually work
in these strange locales and things like that.
How would the reproducible builds help you track down
that specific time zone based bugs then?
So within a Debian project, we have a reproducible torture chamber test.
So what we do is we build every piece of software in Debian and there's 23,000 different source packages there.
Give you a scale of what we're talking about.
We build it twice, one after the other,
the A build and the B build.
And we try and vary as many things
between those two builds as possible.
So for example, the second build
will be on a different CPU type.
It will be done a few years in the future.
We just set a fake time and things like that.
We change the shell. we change the path environment, we change all the environment variables we
can possibly change, your username, anything we can think of, we change the file system.
Basically anything you think of, we try and change.
And this hopefully surfaces as many differences that would affect reproducibility.
So we want to make sure that any end user
can compile a software on their own machine
regardless of their own local environment
and get the same result.
And so this is a way of reducing the set of variations
that would actually result in a variation in the end binary.
And this also shows up some of these QA advantages as well.
I mean, wouldn't that, that would help us out compile time differences but what about runtime like i'm thinking i'm thinking
about like things that are legit compile like ruby for instance like how does it suss out those
problems if you're packaging ruby things you're quite right um as they're jit compiled the what
gets distributed then is the ruby source code so although it's. So although you have to squint, the binary for a Ruby package is actually still text-based Ruby code and things like that.
Saying that, it can still surface interesting things.
And just on this one happens to be a security, but if there was a repository browsing tool
that had an open ID based login system, and during the build process of it, it was generating
a open ID secret.
You know how it's based on a secret that the private, that the server knows about, and it uses public key, Diffie-Hellman, et cetera, style cryptographic
algorithm to validate logins are secure.
So during the build, it would generate a random number, and that would get put into the binary
package.
Unfortunately, this meant that every installation of this browsing tool would share the same
secret.
Yikes.
Because, yes,
this was surfaced in our QA torture environment
because in the A build,
it would generate secret 1334
and in the B build,
it would generate the secret,
you know,
1943 or whatever.
And we would flag that up as,
oh, it's different
between the A and B build.
What is different?
It's some sort of secret key.
Oh dear, this should not be the same
for all the packages that get built.
You guys have a reproducible build torture chamber.
That sounds terrible.
I like the name.
It definitely conjures thoughts of visualizations.
Probably, it sounds like it's well-named
if you're definitely changing so many things,
you're putting it through, you know, torturous things.
Thinking about how do you, like,
so this is the Debian project.
You guys have a great setup.
You put the time and money into this.
How do other people do it?
Like, there's a set of best practices.
You've described the process,
which seems relatively straightforward. There's a few steps. But tactically've described the process, which seems relatively straightforward.
There's a few steps.
But tactically, how do you go ahead and say, you know what?
Our group is interested in reproducible builds.
How do we get from where we are to where we want to be?
A lot of the work is being done with liaising with compilers and other toolchain-based utilities that are introducing non-reproducibility.
So we speak a lot with the GCC developers.
For example, we had C developers will know the underscore underscore date and underscore underscore time macros,
CP processor macros.
And previously they embedded the date and time directly into the source code as macros.
This affects reproducibility because obviously every time you rebuilt it, it would put the current time in.
And so therefore, every single time you would build, you get a different binary.
So a lot of the time we are speaking to developers in those kind of areas rather than developers of, shall I call them, leaf packages,
you know, sort of ones that depend on other packages rather than where packages are depending
on them. Documentation generators are another example of upstreams that we're speaking to
quite a bit. In terms of just getting the word out about the potential problems of a world where we
can't trust the binaries that are running on our own computers. That's a lot of what we do and talking about the problems and talking about the doomsday
scenarios as we outlined before.
So we've outlined a little bit who's involved and you mentioned all these different projects
doing it, working hard on this, Arch Linux, Bitcoin, Debian, FreeBSD, NetBSD, so on and
so forth, Tor.
Who's not involved that should be?
You know, if you could get their ear and say,
you know, you guys need to be doing reproducible builds
and here's why.
What are some groups or some people or some companies
that should be, you know, doing these things?
And as far as your knowledge is that they aren't.
Well, one thing we know they aren't in that basically people outside of the free software
space. I mean, for example, what made the recent Volkswagen emission scandal possible
is software that has been designed to lie about the sensors in a lab environment.
If you had the source code under public scrutiny, adding some sort of new features would have made
that sort of impossible.
I mean, without reproducible builds,
it's hard to confirm that the binary code
installed in the car
was actually made during the source code
that had been verified, if you know what I mean.
Well, nobody has access to the source code anyways, right?
Well, yes, that should be another...
So anybody who has proprietary source code,
they're not going to be doing reproducible builds as far as a public community has access to them because their source code is private anyway.
That's true.
But because of things like the emission scandal and things like that, we may see more legal-based requirements for these things to be.
And then even if someone did provide the source code for a piece of software along with the binary, in other words
your car, you would still need to be able
to verify that one came from the other.
Well, you would think the EPA might step in there
Jerry, where the general public may not
have access to the source code, but maybe because of
the submission scandal, maybe
a new law or something
is put into place where
the EPA has to have access to
the original source code
to produce a reproducible build to confirm that the software installed in the car
matches the result they got from source.
I mean, that's a possibility there rather than saying,
you know, open the code up to the world because it is proprietary.
Maybe certain security levels might have to be in place.
That just means bigger government.
So different podcast altogether.
Something I wanted to ask, maybe it's we're talking about who should be involved.
But I was I was thinking maybe I was almost going to interrupt you, Jared.
But maybe now that I have the chair here, I can ask it.
Going back to the example originally with the iOS developer who got circumvented and pulled down the wrong version of the iOS SDK. What could that person have done differently to prevent them using a scrutinized iOS SDK?
Well, one, they could have.
I mean, the obvious things of ensuring that they download it from a reputable source,
assuming that Apple are not going to release the source code for the SDK,
which is probably a given.
There's very little they can do, and that's basically the
quote-unquote risk you take when you run for actually software.
What hopefully would have happened is that if they were in a free software world
and they released the source code for their software,
it would have been very obvious and very quickly that the binaries
that they were producing did not match with their source code because, and they would never have matched because
of the way that their SDK was introducing the change of adverts and things like that.
So hopefully very, very quickly, when a third party recompiled their piece of software,
they would say, that's interesting.
You're distributing checksum A, B, C, D,
but when I compile it myself, I get D, E, F, G.
What's going on here?
And questions would be raised very early and things like that.
So in the case of this SDK, they pulled down iOS SDK,
downloaded from a reputable source, which seems to be a logical first step.
But let's say they didn't.
But since it's proprietary code, they can't essentially leverage
the best practices of reproducible bills
because they don't have access to the source code,
and they can't confirm that.
Indeed, yeah.
So they're screwed, basically.
They're forced to use this nefarious version
because they download
it from somebody's hijacked website not apple.com slash developers or something like that
back to who should be involved because i i had a thought in my head but i want to go too farther
but uh i do want to get back to who's involved so we've got a list here i think the url is
reproducible dash buildss.org.
And you can go there and see everybody involved.
And Jerry's question was like, specifically, who's not involved in this that should be involved in this?
Well, I wouldn't want to embarrass anyone in particular.
Oh, come on.
Well, maybe it's not an embarrassment.
It's more like a call to arms.
Get in the pool.
Yeah, convince them.
Don't embarrass them. Convince them.
So someone who's not really
represented here is
Ubuntu, and they do have
a large
installation base that would obviously be great
leverage and would provide a lot of
reassurance to a lot of users
if Ubuntu got involved.
We have actually spoken to them,
and they are kind of waiting to see
whether the Debian tool chain, et cetera,
kind of settles down
because it's a little bit in flux at the moment.
So they have no philosophical objection to it.
It's just not on their radar right at this second.
But hopefully that will change
in the next three or four months.
And we'll start to see
some of the Debian reproducible builds work
trickle down into Ubuntu like a lot of the other work that's shared between those two projects.
So I think that would be the main one that's missing in terms of user leverage.
In terms of people who don't really care about it, I guess anyone in proprietary software can't care about it because it doesn't really, I mean, it doesn't really work if you don't have force to at least.
Yeah, indeed.
So I can't really speak to them.
I mean, it'd be nice if more Windows developers had that kind of mindset and things like that.
But there's still a lot of free software that's being developed for Windows.
I mean, things like Putty, a whole bunch of browsers are free software and released on Windows operating systems.
So perhaps more in that space and things like that.
Microsoft themselves could definitely get involved
when it comes to open source developer tools
in their ecosystem.
They have many,.NET Core and Visual Studio Code
and many things that have been open source for a while
now that they could at least,
and people are actually,
developers are relying on them as their tool chain.
So they could get involved.
That's true.
They could certainly help ensure,
or certainly make it easier for developers
to make their builds reproducible.
Yes, that'd be very nice to see.
And again, there will be another great source
of leverage there.
It'd be one company getting involved
and would help quite a lot of
developers and users. What about at the individual level, you know, Jane developer, you know,
Jane web developer and Joe game developer, you know, Linux users, people like Adam and I,
our listeners out there that maybe, I mean, we take security seriously, maybe we can do better, but what can we do to help this initiative?
So one thing is you can ensure that any source code you do release can be built reproducibly.
So this means removing any timestamps from the build, ensuring that it produces the same result in as many different environments as possible.
It doesn't have any varying behavior, things along those lines.
Most software would not require any of these things, but a lot of software
likes, for example, likes to include the version or which is based on the current
date, or they like to include the machine name that it was compiled upon and things
like that.
So removing those sorts of things is pretty much the first step.
And for most software, the only step required to make
make the build reproducible the other thing you can do is is to occasionally check whether the
code that you're running does match the source code and if it doesn't you know raise a red flag
to whoever is producing that binary and the source code and saying that's interesting i you're
providing this binary and this source code.
They don't seem to match.
What gives?
What gives?
Would they compile their own version?
Would they use this torture chamber you're mentioning?
What's the step they take to ensure that?
I think right now they would compile it themselves.
That would probably be the easiest way of getting a single checksum rather than setting up a torture chamber because that requires isolated environments, changing clocks, etc.,
and things like that.
So the first step would just be just to recompile it on their own machine
and see what they get and compare it with the result
that's being distributed by the original source code maintainer.
I feel like that's still a hurdle that most people will just be like,
whatever, maybe I'm just being agnostic against it,
but I just feel like we're back to the same original problem where we said you know you're
just verifying a checksum essentially you're back to which is what you want because you want to
reproduce a bill by compiling which is the question i asked kind of in the middle there
where why don't we just compile the software ourselves and that's essentially what we're
doing to confirm we have the right thing.
So if they're not going to do it to use it, I'm just wondering if there's an easier hurdle to put in front of people to get over versus that one. I suppose one difference here is that checking the
checksum not only helps Jane developer, it also helps when they are validating a checksum,
they're also checking it for the rest of the communityating a checksum they're also checking it for the rest
of the community as well and they're also checking on behalf of the original distributor as well
it's it's not just helping them and so if they built all their binaries themselves it would be
it's too strong a word but something like it would be sort of somewhat selfish to do that because
only they would reap the advantages of doing that
recompilation. But if they
recall and check with the upstream's
version of that binary,
what they're doing is helping the community
at large saying, yeah,
I've confirmed that
this binary matches this source code at least.
General notes one other go
as well. So it's a bit more friendly
in that sense.
Sorry, Jared, go ahead. I'm just to think of like ways that we could actually get people to do something on this because i agree that like the ad hoc you know check
a binary here check one there type of a thing probably doesn't have much legs with people
but it seems like there could be some like community tooling built around some sort of
reproducible build chaos monkey thing.
Somewhere like how Netflix has on their internal networks where you could
just like build a system that like pulls a random GitHub repo.
Maybe it has to be language specific or something,
but you know,
spins up an EC2 instance,
runs the build,
gets the checksum,
checks it against,
you know,
the,
the published or whatever.
And then like as a webpage with red,
you know, people with red X's and then like as a web page with red, you know,
people with red X's and green checkboxes or something where it's automated,
but accessible, accessible and, you know, community effort.
I don't know, Chris, is that.
Yeah.
Well, he said recompiler earlier.
That's what I was going to ask before I was going to interrupt you.
But you went, it was Kristen,
you mentioned something about recompilers,
a farm or something like that earlier in the call that we didn't go into?
That's right. Yes. I think I referred to them as rebuilders.
Rebuilders. Yes.
That sounds cool.
Yeah. I think it's a pretty cool idea. It's quite interesting philosophically as well,
because you would want as many different and diverse groups of people recompiling the software because if you did
have a community effort whilst you've removed the original building on of the binaries you've
said you've just if you have a central community way of doing this you've essentially then
re-centralized the confirmation of all these checksums so you want as diverse a group as
possible building all the software all of the time.
For example, you might have servers
in Greenpeace's data center
and the Department of Defenses.
People with rather different views of the world,
but if they can agree on the checksum,
the final checksum of a binary,
and they have different motives in this world
and things like that,
then you can start to say,
oh, I can trust this.
Yeah, cool.
It'd be kind of like SETI at home,
only the results would be actually useful.
It would be very much like SETI at home.
I never thought of it that way.
Yeah.
Rebuilder at home.
There it is.
You know, somebody go out there and build that thing
and we'll all just dedicate CPU cycles,
you know, at nighttime when we're sleeping
to making sure all of our software is secure.
That'd be amazing.
I'll get onto that.
So that's what I was getting at was like this hurdle to do the thing that, you know, does what we're trying to do with this conversation and this entire initiative and this best practice is like that last step.
As Jared said, is going to be less likely to be done by the general public if it's just
sort of like, if I think Jared's doing it and he thinks I'm doing it, neither of us
are doing it, right?
And maybe a few, a small handful.
And those are the people getting burnt out.
Those are the people giving talks.
Those are the people running meetups.
Those are people getting pulled every which way forward.
And those are the people getting burnt out.
And, you know, it doesn't scale it doesn't
sustain we have a heartbleed issue again or we have another issue or we got an emissions scandal
going on because you know this uh rebuilder process is just too hard to put on the individuals
of the world indeed yeah and um that's probably our um our weakest point at the moment is how we can really translate the reproducible build effort down to the end users.
So, for example, providing end user tools to say, oh, you're about to install this particular binary, but we don't believe it's reproducible.
So what do you want to do about it?
We don't have tools for doing that yet.
We don't have these sort doing that yet we don't
have these um sort of automated or at least semi-automated rebuilders yet it's certainly
in the pipeline it's just that we are still not quite there as a project to in in terms of we'd
like to move the reproducibility effort on a little bit further before we attack those angles and
there's a few unresolved questions as well i mean just for one example say we had 10 different builders you know greenpeace department
of defense you me etc all publishing their checksums for their binaries what would be the
algorithm the end user tool used would it say um oh all 10 have to agree? Do nine out of 10 agree? Is that okay? What
if I'm a malicious actor and I upload 10,000 checksums that are all bogus? Would they outvote
the others? There's these difficult questions that haven't really been resolved yet in terms
of policy.
Put it on the blockchain.
Blockchain solves all problems.
It's the new spade card or trump card,
you know, just say blockchain and then that's the answer.
Well, blockchain would be part of this thing
to ensure that someone could not
unpublish their checksums.
So yes, we are actually ironically
thinking of using blockchain-like technology.
I'm full of good ideas today i tell
you what this is that's two in a row i feel like you should join our project yeah maybe i will
well chris in efforts of closing up here what uh what closing thoughts do you have to share
this is last chance on the show to sort of get that final person who's like you know i really
like this idea you know what's the next step what uh what final closing thoughts do you have on this i
suppose the next if someone's vaguely interested in the project they should totally check out our
our website it's got some a bunch of talks a bit more background information some recent
presentations with some more interesting gotchas about interesting things that we have
surfaced QA wise in the reproducible builds effort. We also have a mailing list and some
interesting, as I mentioned, the Differscope tool. So one thing that everyone can try right now is
a website called try.differscope.org, where you can upload two files and it will recursively unpack them.
So if you give it two ISO files, it will unpack the ISO files and look for differences within,
look for meaningful and human readable differences between those two files.
That software is also available on your desktop, but this is just a web-based interface to it.
So that would be the
next things to check out if someone's interested in the project good deal well we'll certainly
leave links in the show notes to reproducible-builds.org which is the site that
chris is referring to the talks resources tools events even the news stream you have there is
great uh great documentation so highly encourage those who are listening to this and interested to check that out.
Check the show notes for that.
And, uh, Chris, thanks so much for joining us on the show today, man.
Really appreciate it.
No problem at all.
Thank you for having me on.
Thanks again to our guests this week, Chris Lamb.
Also thanks to our sponsors, GoCD, Linode, and Flatiron School,
as well as Fastly, our bandwidth partner.
Check them out at fastly.com.
Our theme music was created by Breakmaster Cylinder,
and this episode was edited by Jonathan Youngblood.
The best way to keep up with all things
open source and software development
is to subscribe to our weekly email,
changelawweekly.
Head to changelaw.com slash weekly to subscribe.
Don't miss an issue.
And thanks for listening. Bye.