The Changelog: Software Development, Open Source - Securing GitHub (Interview)
Episode Date: June 19, 2024Jacob DePriest, VP and Deputy Chief Security Officer at GitHub, joins the show this week to talk about securing GitHub. From Artifact Attestations, profile hardening, preventing XZ-like attacks, GitHu...b Advanced Security, code scanning, improving Dependabot, and more.
Transcript
Discussion (0)
What's up friends friends? Welcome back.
This is the ChangeLog.
I'm Alex Dekowiak, Editor-in-Chief here at ChangeLog.
Today we're talking about the most important developer platform out there.
Yeah, it's called GitHub.
We're joined by Jacob DePriest, VP and Deputy Chief Security Officer at GitHub.
And the topic is, of course, securing GitHub, securing open source
and all the things you have to do to ensure releases, profiles, GitHub at large is secure.
Now, Jacob is one of many in the line of securing GitHub.
So we dug deep, we go deep, and we ask questions about what it takes
to secure GitHub and to keep it secure.
A massive thank you to our friends
and our partners at fly.io.
That is the home of changelog.com.
And it's also the place you can launch your apps. You can launch your databases
and your AI. You can launch your AI near your users with no ops. And that's so cool.
Learn more at fly.io. Here we go. What's up friends? I'm here with a good friend of mine, Faras Aboukadije. Faras is the founder
and CEO of Socket. Socket helps to protect some of the best engineering teams out there with their
developer first security platform. They protect your code from both vulnerable and malicious
dependencies. So we've known each other for a while now, Frost.
Well, let's imagine somehow I've landed myself a Vercel.
And because I'm a big fan of you, I understand what Socket is.
But I don't know how to explain it to anybody else there.
I've brought you into a meeting.
We're considering Socket because we want to secure dependencies.
We want to ship faster.
We want everything that you promise from Socket.
How do you explain Socket. How do
you explain Socket to my team of ourselves? Yeah, Socket is a developer first security platform
that stops vulnerable and malicious open source dependencies from infiltrating your most critical
apps. So we do that by focusing on real threats and keeping out all the types of risks that are
out there in open source dependencies.
Everything from malicious dependencies, typo squad attacks, backdoors, risky dependencies,
dependencies with hidden behavior.
There's all kinds of risks out there. A lot of reasons why a dependency might be bad news.
And Socket can help you as a developer.
Just keep all that out of your app.
Keep things nice and clean and pristine amongst your dependencies.
I saw recently Dracula. I'm a fan of Dracula. I don't know about you,
but I love that theme. Big fan of Zeno Rocha. And I saw there was like a misspelling there.
And so because Dracula is installed on VS Code and lots of different places, I saw there was a typo squat sitting there that had different intentions than obviously Dracula did. Is that
an example of what you mean? Absolutely. Yeah. Dracula, that's a perfect example. It's super
common these days to see that type of an attack where you see a common dependency that you have
an attacker just pretending to be that dependency, typoing the name of it by one letter and then
trying to get unsuspecting developers to install it. Unfortunately, we're seeing more and more of these types of attacks in the community and they're taking
advantage of the trust in open source. As developers, we need to be more aware of the
dependencies we're using and make sure that we're not pulling in anything that could risk the data
of our users or cause a big breach at our companies. And so part of that is obviously
being more careful and asking questions and looking more carefully at the dependencies we use.
But also part of that is tooling.
It's really a hard problem to solve just on your own as a single developer.
And so bringing in a tool like Socket can really help automate a lot of that work for you.
It just sort of sits there in the background.
It's really, really quiet.
It doesn't create a lot of noise. But if you were to pull in something that was backdoored or compromised in some way,
we would jump into action right in the PR or right in your editor.
Or even as early as you browse the web, we have a web extension that can actually give
you information if you're looking at a package that's dangerous or if you're browsing Stack
Overflow and you see somebody saying, hey, just install this dependency to solve your
problems.
A lot of times even that can be a way to get the attacker's code onto your machine.
So Socket jumps in at all those different places and can tell you if something is dangerous
and stop you from owning yourself.
Yes, don't get yourself owned.
Use Socket.
Check them out.
Socket.dev.
Big fan of you, Firas.
Big fan of what you're doing with Socket.
Proactive versus reactive, to me, is the ultimate shift left for developers.
It is totally developer-first.
Check it out, socket.dev.
Install the GitHub app, too easy, or book a demo.
Once again, socket.dev.
That's S-O-C-K-E-T dot dev. Well, we're here with the VP and Deputy Chief of Security.
I should say, actually, Security Officer at GitHub, Jacob DePriest.
Jacob, thank you for coming on the show.
Definitely fans of GitHub, obviously, and securing GitHub.
Can we do that, please?
Indeed. It's great to be here. Thanks for having me.
I think it is secure, though, right?
Is it anti-secure? Is it fully secure?
Our goal, that's our goal on the security team, is to secure the world's developers platform.
I think it's a good thing. That's why we want to have you on the show.
We obviously had an XZ attack or issue, I guess, a while back.
And in a conversation on this show, we talked about, we speculated, at least I did, on the role of GitHub to prevent things like that by hardening the profile of individual developer users on GitHub.
But I'm sure we can go deep.
Where should we begin when we talk about securing GitHub?
What's a good place to begin?
I mean, starting with the developer is kind of where I always like to start.
So, I mean, I think that actually sounds like a great place to start.
I think when we talk about open source security,
we talk about the supply chain, we talk about all these things.
You could start anywhere, but at GitHub,
we always like to start with the developer.
That's kind of our central ethos is how to empower the developers,
how to secure the developer workflow.
And so that's kind of our approach there.
And so last year, about a year and a half ago, we
announced and started implementing what at the time was kind of a fairly controversial initiative
to turn on mandatory 2FA for all contributors on github.com. And we've been pretty successfully
rolling that out. But there's, I think there's other things we can do in the developer account
space as well, but happy to dig into those things.
I think when we look at the XE attack,
that's a lot of social engineering in a way, right?
But you also have sort of profiles on GitHub that may be nation-state based.
There's a lot of speculation around that particular attack
and that scenario.
Would you call it an attack, Jared?
Was it really an attack? I guess it was. It was a takeover. It was not really an attack.
It was more like a social engineering takeover and then infiltration of...
Well, you call it a supply chain attack, but it's not via an exploit
or brute force. It was via social engineering
and takeover of a project and then the ability to
release new code as the owner of said project and then the ability to release new code
as the owner of said project without people knowing that the ownership had changed.
I guess where I'm curious is how much does GitHub take the responsibility
of securing profiles beyond simply they're secure?
As in there seems to be nefarious actions in the profile. It doesn't seem to be some of the social constructs and contracts of being a citizen of developer land in the world.
How far do you go in terms of securing proactively, and maybe your own ideas,
and how GitHub may react in the future to these kinds of things where you have profiles?
And if there's prevention at the profile level, what can be done?
Yeah, it's an interesting question.
I think that the way I kind of have been thinking about this is,
I think it's a bit broader because when you start thinking about social engineering,
you start thinking about the techniques and approaches that individuals could use in these cases.
Even if the profile is secure, even if there's lots of investigation
and telemetry and all those kind of things,
it's still not necessarily obvious when something is nefarious
versus when something's a mistake.
In some ways, I think what we're talking about,
the analogy I've been thinking about actually is most sufficiently
sized corporations, businesses, organizations, government entities have an insider threat program.
And in some ways, this is sort of like the insider threat scenario for the world, for the larger
software ecosystem. How do we think about that? And so certainly I think developer accounts and
profiles are an aspect of it.
But I also think there's an element of the broader supply chain and things like attestations and salsa compliance.
So the framework to help secure builds, essentially.
So what is going into a build, not just what's the piece of software that was downloaded from the Python, PyPy registry, or the Go registry,
but what was used to build that software?
Where was it built?
What were the instructions that went into it?
And can we know cryptographically that the software that we're installing came from that build process and was created by that developer?
And so would that have necessarily solved
this particular challenge?
I'm not sure that it would have,
but it sure would have given
the security researchers looking into it
and trying to figure out what happened
way more tools and confidence
much more quickly
than they would have otherwise had.
And so I think this is definitely
has an element of, as a community,
we've all got to come together
and figure out what are the standards, what a community, we've all got to come together and figure out what
are the standards, what are the ways we want to distribute software, what are the trust
signals that we can all get when we're thinking about what software we're going to use in our
products. I have a good idea. I think for eight bucks a month, you can just get a verified badge
and then you're good to go, right? You just verify your account and we can trust you immediately.
That's right. Nobody ever lies on the internet.
To that point, do you think that doing GitHub's best to reach into the real world and confirm people are people,
this person is potentially suspicious, this person is okay,
this account is being run by somebody who didn't previously own it?
Let's forget about the implementation.
Cause I understand how hairy that could all be,
but do you think the idea is good or do you think it's not a place where we
should even be prying into these things?
Cause a lot of it going back to the XZ is like,
who in the world is Gia Tan?
Is it a person?
Is it a nation state?
Why were they trusted?
Et cetera,
et cetera.
And it's like,
well,
if we could know who Giatan is
and then track that or no more information,
then we can make wiser choices.
But do you think that's a worthwhile effort or not?
I certainly think it's something
that the broader software industry
should continue exploring in all SaaS platforms.
I also think there's a balance here
because one of the promises,
and it's more than a promise.
One of the outcomes we've seen from the open source movement in the last 20 years is people in developing countries, people all over the world from different socioeconomic backgrounds, being able to contribute, be able to get up to speed, be able to make a meaningful difference in a piece of software. And, you know, I think we have to recognize that not everybody who is contributing
is coming from a place in the world where they're either have the technology or the identification
system or the infrastructure to be able to do this kind of verification, or potentially it puts them
in some sort of risk in whatever environment they're in. And so, you know, this is why when
we developed the mandatory two-factor authentication, we didn't jump straight to, well, you have to have a YubiKey or FIDO or a passkey.
We continued to leave the door open for a wide range of 2FA options is because we have users who are students in schools and don't have mobile phones or can't afford mobile phones or in an area where they can't kind of perform that two-factor authentication in a way that,
you know, the security industry might say is world-class,
but, you know, we have 100 million developers
on the platform, many of which aren't necessarily
tech sector, financially affluent folks
who can be able to do these things.
And so I think we have to balance both of those
as we think about the open source community.
And then I think that's where, to me, it's what can the rest of the community do?
Like, what are the things we can work with our partners in industry on securing builds,
securing attestations, cryptographically verifying what's going into things and attesting to
those so that the companies that are building these things have a better sense of what's
in them.
And the companies with the resources who are using them can contribute back to that and help make these more secure versus kind of putting all that on
the individuals what about change of ownership it seems like that is a pretty strong signal of
potential problems and i'm not sure if github has anything built into it and the security tools around that particular thing. Like this repository is now owned by a new user or org.
I mean, it seems like a lot of times that's still not a big deal,
but as a downstream person, I'd still want to know about it
and be like, well, I went and checked it out.
It seems legit. I'm cool.
Or this doesn't seem legit.
You know, let's take action.
We have some protections in place now
in terms of the account ownership,
and particularly if, for instance,
somebody changes their username
and somebody else grabs it real quick
and things like that.
But one of the things we released recently,
which will be free for kind of public use as well,
it's not just a paid feature,
is attestations for our builds.
And so what you can do here with GitHub Actions
is let's say you're an open source developer,
you're working on something.
Normally you kind of do the build, however,
maybe use GitHub Actions, maybe use something else.
And then you push the artifact up to PyPy
or you push it up to Rust or wherever.
And then when a user goes to download that artifact
and leverage it on their systems,
they have no idea which repo it came from, what org.
They don't know.
I mean, it says the name of the repo on it,
but there's not really a way to prove that it was you
and it was this build process and it was this repo.
And so that's where things can get really wonky
if the repo changed ownership or the users
or the lead contributors all rotated out over a weekend
for some reason.
Then what do you do as an end user?
Well, you can't
really do much today or in the past. And now with attestations, what you can do is you can actually
say, I want to cryptographically verify this build. And you can even do things like, I want
to make sure that it came from this repo, this org, this branch, and you can actually attest to
that before you deploy it in your environment.
And I think that's something that has been possible through partnerships with things like Sigstore and other cryptographic means in the past.
But the accessibility of that's been hard. It's not really been built in
in a way that the average developer could take advantage of it.
And now with attestations, it's literally just add an action
that we support and
maintain to your workflow. And it produces an attestation that people can check against if
they want to have that level of rigor in their deployments and security builds. And I think,
I mean, that's step one, right? That's not going to solve everything, but I think that's the path
we have to go on as an industry, particularly with open source, is making these things more transparent, making them not just researchable transparent with somebody, a human's eyes, but machine readable transparent so that we can start to make risk decisions on them in a programmatic, scalable way.
How would this idea of attestation apply to Eggsy in particular, given that entire scenario where you had a social
engineer over many, like a long time, this was a very patient attack. How would this apply there?
So again, I don't necessarily think this would have been a preventative thing,
but let's fast forward to a future where most open source packages on the internet have this
built in. I think it's a deterrent at that
point. And here's what I mean. If attestations were used in this case, then it would have been
a very trivial manner for any researcher to look at these packages to be able to,
within a matter of a few clicks, get to the build workflow that shows the instructions that were
happening in the actual build itself. And what went into the build? Was it just the source code?
Was there other things that went in?
Like, how do we actually backtrace this
into the visibility of not just the code,
but what went in to take that code
into the artifact that ends up getting used by end users?
And so I think as we see this adopted more and more,
the recognition from malicious actors
that this stuff is really accessible
and everybody's expecting this transparency,
and it's going to be trivial for a researcher
to go look at all the build logs
and start to build analytics and scans
and detections against not just the code,
but the builds.
I think it's going to be an important step forward
in deterring this as a space and an attack vector.
What's scary about XD is that it was discovered by accident. It was like somebody who just happened
to have just like a millisecond too long on their hands and they found this thing. And like, so how
many of these things are happening given now the zoom out of the patients to do the engineering,
the social engineering to get into place and the
multiple profiles and catfishing that took place to get to sort of wear on the maintainer,
right? Like that person was taken advantage of in terms of what a maintainer goes through to
build, run, communicate, et cetera, in an open source community, a software like XE, for example.
The scary part is that it was discovered by accident. And I think, you know, you want to
have this attestation, this build process, look, this sort of reproducible build aspect that's
verifiable. But then you have the other side, which is like, okay, if I'm going to become a core contributor or a maintainer or have right to master or main on a given GitHub repo, at that profile level, I know that you have 100 million developers across the platform, but there's a certain level of developer that begins to become a core contributor to a key piece of software. And that person is different and more unique than everybody else on the platform
insofar that they have a level of power and control
given the prowess and usage of that software.
So they kind of elevate themselves.
And you were part of the NSA, so you get security clearances.
Not everybody can get a security clearance.
So they're set apart, right?
And so I think, I'm curious, I think this is Jared's angle is like, how can we set apart
certain profiles to have certain levels of awareness of the personhood so that we can
have more trusted software? Yeah, I think it's a great question. I would pivot it slightly,
at least the way I think about it, is less about the profile and the human and more about the expectations of these critical pieces of software.
And here's what I mean.
I think there's kind of two elements to this.
I think one is, what is the responsibility and expectations for the organizations, corporations, companies that are using this critical software? Do they have a responsibility to look into and ensure the security of these core fundamental building blocks that
really power a lot of the internet and a lot of these companies, right? And so I think today we've
seen this, I mean, this is a few years old now, but we saw this in Log4J, there was this outcry of
like, well, how are we going to hold these developers accountable? And it was like a handful of folks over in their spare time building this stuff,
right? They weren't resourced to secure, build, look at these things. And then, so in many ways,
I don't view the malicious intent from potentially the alleged malicious intent in the XZ case
is any different than a accidental or poor programming practice or just not securing. I mean,
to a certain degree, the outcome is the same in the sense that there's insecure software that is
being included in core functionality across a lot of platforms. And so I think some of this is,
I think we have to, as a community, take more responsibility for the open source software
we're using. And I think on the platform side, I think there has to grow an expectation
of the security tooling and expectations
of the code that we're using.
And so this is where things like GitHub Advanced Security,
code scanning, secret scanning,
and there's plenty of other tools out there too,
but I think we have to elevate the expectation
that these core pieces of software
are going to have those things turned on.
They're going to have security scanning with the results made available
or at least something that's consumable as an artifact there
so that we're kind of hitting this from multiple angles
to really level up the security.
The challenge of the defender is that you must secure the entire thing, right?
You've got to fortify the entire house.
And the advantage of the attacker
is they only have to find one way in.
That's right.
Doesn't that seem futile?
Like, I don't know.
I'm just getting a little bit worn down perhaps
because it's like, I'm just thinking about
how many lines of code are in, for instance,
the Debian distribution.
You know, because XZ is low level software
and certainly widely deployed,
but mostly invisible. And would we have considered it critical software? I mean,
maybe some people would have, but for the most part, it's just down there, it's utility software.
And how much of that is there millions upon millions upon millions of lines of code,
of course, of course.
It just seems like we have to
overhaul. You're talking about
new-ish
best practices around writing
and deploying secure code,
but it almost requires an entire
industry
come-to-Jesus moment with regard to
these practices before it's ever going to actually help us.
I think if we were just looking at the code bases and assumed that come to Jesus moment with regard to these practices before it's ever going to actually help us.
Yeah, I mean, I think if we were just looking at the code bases and assumed that all of it was sitting unprotected on the internet, it would feel and likely be futile, to be honest.
And I think this is where the rest of mature security programs come into place and things that we can do
as an industry. So I think that that's where zero trust and identity as a perimeter
and those kind of concepts secure by design come into play. industry. So, you know, I think that that's where like zero trust and identity as a perimeter and
those kinds of concepts secure by design come into play. So like, if we have strong authentication
in front of access to a lot of these systems, if we have network isolation between key systems,
if we have, you know, role-based access control. So we kind of assume that parts of our parts of
these systems will eventually
experience some sort of security issue. How do we firewall those off from other parts of the system?
And so I think this is where the rest of that comes into play. And I also think the other element to
this is I think the industry does need to level this up. Like so the CISA secure by design pledge
that was announced and many companies include GitHub signed at RSA a few weeks ago, talks about this, right? It talks about needing increased commitment from key players in
the industry to implement secure by design principles as part of, not just as part of
their internal programs, but as part of the products they offer to the world and to users,
so that the settings that make things more secure on by default,
even if it causes a little more friction or one more click for a user. And I think
that's really an important part of this that, you know, honestly, we do have to progress as
an industry here. And I think it's critical that that's the other element of responsibility that
companies, corporations and organizations take in this space. Yeah, well said. I think it's tough when you get to the individual org
or individual developer, and we're relying on them
to also do their due diligence and their best practices.
Because A, education is a problem, like a lot of us don't know.
And then B, the constant pressure and stress
to be shipping more features and code faster, stronger, cheaper, et cetera,
with tools now helping us write code
that we may not exactly vet.
That just makes the problem even more massive
because we need the big players to adopt
and to sign pledges and to push out secure best practices
and suites of tools and everything.
And then we also need the awareness.
And we need to equip the everyday developer
with the ability to also do these things,
use these things, and really just have their wits about them
despite all the pressures pushing them away
because of that push and pull between convenience and security
and that relationship is just so fraught.
Yeah, I totally agree. I'll give you a concrete example.
I'm obviously more familiar with what we're doing at GitHub than other places, but
we have had a feature for a while for enterprise customers
and public repos that was opt-in. It's called
push protection for secrets. So we have this thing called secret scanning,
that if we detect a secret in someone's code, so like a structured AWS token or Azure token or
something like that, we'll alert, it becomes a security alert, but it had to be turned on.
And recently we enabled push protection, which prevents, that stops those secrets from getting
to the public repo before the commit happens.
And so you're a developer,
you're working on your laptop,
you go to push up a change to GitHub.
If we detect it, we'll stop it
before it gets to the public repo
and send an error back and say,
hey, we detected a secret.
That's push protection.
We turned that on for all public repos recently,
all public repos on github.com.
And that increases friction to a certain degree. There are going to be developers out there who are just pushing test
secrets up to their repos to try things out. And they're going to get frustrated and they're going
to have to go, you know, search and figure out what setting to turn off or whatever.
But it's a secure by design principle that we believe strongly in is that source code is not
the right place to store secrets. And we continue to see issues in the news and industry
where things have gone really wrong for companies
where an innocuous, probably well-intentioned secret
was put in code.
Somehow that code gets leaked and there's a phishing incident.
And then all of a sudden that secret is used to pivot
way further into the infrastructure
and cause a lot more damage.
And so we believe this is a core tenant of secure software development.
And so we turned it on by default for all public repos. And I think that's an example
of the types of things I think we need every company, every organization
who's shipping capabilities to developers, users, to think about,
what can I turn on by default? What can I just take away as a choice
or an education opportunity for
someone? We're just going to do this. And sure, you've got options to turn it off if you need to,
but this is the way it's going to ship. How has that received? Well, so far, honestly, I think it
was, it's one of those things where thankfully most developers aren't doing this every day.
They're not pushing secrets to code. And I think it's very likely that many who are didn't really take the
time to step back and think like, oh yeah, maybe I shouldn't do that. Or maybe there's another way
to do this. And you can still override those alerts as they come in. But as far as I'm aware,
we've had generally positive reception. Same thing with mandatory 2FA. We've seen a significant
drop in support tickets since we've rolled out the requirement to make 2FA. We've seen a significant drop in support tickets since we've rolled out the
requirement to make 2FA mandatory. And then we've seen a 95% opt-in rate across co-contributors
who've received those requirements. It was a day of great joy when I 2FA'd myself on GitHub,
so I was happy about that. Same. What is behind the scenes of this scanning process? Like how
did you have to re-architecture Git Push, essentially, to GitHub?
Did it have to be a sandbox of sorts that gets pushed to,
then scanned, and then kicked back?
What's the process?
And even what's the cost center?
Is this a cost center for GitHub to have to pay for
all of this source code to be scanned?
What's the architecture?
What's the cost?
What's all the things?
So I'm not going to butcher the architecture
by trying to explain it in detail.
Give us the high level overview.
But in general, yes, that's essentially the gist
is there is a sandbox space where we do the scanning
and it's all encrypted.
So it's not like we're punching out of that,
but it hits the GitHub side of it.
And before we put it into the Git
commit, get pull requests, whatever that is on the actual github.com platform, we scan it in a
sandbox first. And if we find it, we kick it, we kick back and alert that says, Hey, we found this,
you should, we highly recommend you deal with this, you know, clean your history out, remove
it from code, get everything clean, and then push up again. And that's kind of the gist of it there. In terms of how we structured it architecturally,
we actually partner with industry partners across kind of every sector here that does
structured secrets. I think we have over 300, 350, 400 partners that we have essentially the
ability to scan for their secrets and people can just register for the program.
And in a couple of cases, we've actually gone a step further
where we can show enterprise customers whether it's still valid or not.
So it's not only a secret we found, but it's an active secret in code,
slightly different than push protection that's after if we found it in your code.
So that's actually a huge benefit as well to developers.
From a cost center perspective,
supporting the open source community
has and always will be one of our kind of core spaces
that we invest in and that we support.
And so, you know, we essentially support
most of the GitHub advanced security features
that enterprise customers pay for,
for all public repos that
includes the compute behind it that includes the scanning that includes you know all those things
the things that you can get on a free account on github.com are incredible code spaces so many
so many minutes a month for code spaces usage which if you don't have a developer laptop is
is a game changer and even for me personally if i'm just going to tinker around with something
on the weekend the last thing i want to do is spend the first five
hours getting my laptop patched up to date, whatever developer tool I need installed. I
don't worry about any of that anymore. I fire up a code space, which is just a remote development
environment that we host and I get to work and, you know, free actions minutes, stuff like that.
And that's, that's part of our mission to accelerate human progress through software development.
And so I think it does cost money for us to run those things for the public repos,
and I think that's okay. Is this the first time you're hearing about GitHub's advanced
security features, Jared? Is it just me first time? First time for me, yeah. Can you explain
that then, Jacob? Because I kind of get what it is.
I Googled it, landed on a page, but it seems enterprise-focused.
But then, as you just mentioned, some of this is public repo blessed.
Can you give us a rundown?
Sure.
I mean, the gist of it is GitHub Advanced Security is our static analysis capability for software.
And that's based on CodeQL.
And it also includes our dependency scanning ability.
So I'll give you an example here.
If you've got a repository on GitHub.com,
if you turn this on,
if we see a dependency in your source code
that's out of date, we'll just send you an alert.
And if you have it turned on,
we'll actually open a pull request on your behalf,
on your code and say,
hey, we found a dependency that's out of date. And also here's the updated
version that we recommend. And if it fits, you can just merge it and move on. I do this all the
time on a handful of projects that I do personally on github.com. I just kind of like go in every
month or so. And I look at all the pull requests, Dependabot's open for me. I merge them and
I'm happy and I move on. And then the last
bit is secret scanning. And so this is an enterprise, those three together are GitHub
Advanced Security. And then for our enterprise customers, we also have things like a security
overview and trending and charts and things like that, that will help enterprise administrators
and security teams to administer this across their environments. So that is an enterprise
offering that we do offer to our enterprise customers
as a security suite for their source code.
But then we offer most of that available for free
to public repos that are hosted on github.com.
So a lot of the open source community
can take advantage of that,
those that are hosted on GitHub. Well, friends, I have something special for you because I made a new friend.
Tamar Ben Sharkar, Senior Engineer Manager over at Retool.
Okay, so our sponsor for this episode is Neon.
And as you know, we use Neon, but we don't use Neon like Retool uses Neon.
Retool needed to stand up a service called retool db
tamar can explain it better in this conversation but retool db is powered by neon okay they have
a service called fleets it is a service that manages enterprise level fleets of Postgres, serverless managed fleets of Postgres.
And RetoolDB by Retool is powered by Neon Fleets.
Okay, Tamar, take us into how Retool is using Neon for RetoolDB at large.
So one big problem we had with Retool, we wanted users to have value,
production value as soon as possible.
And connecting to a ProDB in a new tool is not something that people will do lightly.
But they're much more likely to then dump a CSV into Retool.
And so because of that, we said, OK, well, what if we just host databases on behalf of users?
And then they can get spin up really fast.
And we really saw that take off.
The problem we had is we didn't have a big team.
We couldn't spin up a new team to support this feature.
So what do we do?
And so we were looking at what are the options out there?
And, you know, we found Neon.
Neon is a serverless platform that manages Postgres DBs.
And so like, okay, that's interesting.
Let's kind of look in further.
What's kind of really unique about them is you really only pay for what you use,
which is exactly the case that we have, right?
Because we want to provide this to everybody.
Not everyone uses it.
Not everyone uses it all the time. And so like if we had to like, you know, us manage a bunch of, you know, RDS instances, we have, right? Because we want to provide this to everybody. Not everyone uses it. Not everyone uses it all the time.
And so if we had to us manage a bunch of RDS instances,
for example, right?
Basically, we'd have a whole info team to support,
figure out, okay, what are they on?
How do we do?
Try to have some kind of greedy algorithm
to get all the data in the fewest moments as possible.
This is now a hard problem.
That's not kind of a core value, right?
A core value is kind of providing that database.
And we don't want to kind of go in and take,
we know we're not like an infertile team.
We don't want to kind of get in that game.
I think what's really great is that,
okay, well, one big kind of risk
when you think of going in third party is A, the cost.
We're giving this free to all users.
We have 300,000 databases right now, right?
Like we can't especially especially as we
were um rolling this out to begin with right we like didn't know for sure how it would how people
respond right and you know we can't all of a sudden have like a couple million dollars you
know the bank for uh for this without kind of seeing the the activation that it has on our
users so it's kind of obvious but what was the appeal of Neon?
What was really appealing to Neon, it spins out to zero.
And so because of that, right, it really kind of reduces the cost.
And so really, it's really exactly only what we spend.
And there's really actually not a way to actually spend less money,
even if we always need ourselves.
So you can be like removing all the people cost, right?
Because let's say we use something like an RDS,
we have to figure ourselves, right?
Basically what Neon is doing.
Right. How to bucket all the instances together, how to bucket the usages to have as few instances as possible.
Right. To scale up and down depending on what's going on.
And now we sort of don't have to worry about any of that part, but still get kind of the cost benefit.
And so really it was out, you know, it's a win-win.
OK. Win-win. Always a good thing. I like win-win-wins, but okay, fine. Win-win.
If it were not for Neon and their offering of fleets of Postgres and how they're essentially
your serverless Postgres platform, where would Retool be at with RetoolDB without Neon?
Well, we would have to have at least a fully staffed team. Neon call burden would be a challenge.
You know, I think we have to spend a lot of time on, you know, making it sustainable. And that's,
you know, a whole, you know, other sets of concerns that are, that we don't ever think
about. First of all, like, you know, it's a team of engineers, right? Which is not free.
So it's everyone's salaries, right? So let's say probably a team, let's say, you know,
eight to 10 people, you know, easily only focus on this. And then it's like, well,
does the revenue of RetroBee offset that, the cost, even if just the engineers? So, you know, that's step one. But I think even before
then, right, like, you'd have to set up this team before you even had the product. You know,
databases and, you know, having them the way that Nian has them, right, like, suspend to zero,
having, you know, warm spares that they're, you know, ready instantaneously when you, like,
log on to Retool. Those things aren't free. And even if we tried to do an MVP,
there's a basic functionality that needs to exist
that we all have to start from scratch. And that would be a huge
commitment to this. And I think we would
have completely, it would come out a year later
because we'd have to do a lot more validation
to know that it would have been worth it right before we started.
Here, we were able to quickly try it out,
see that it was effective, and then grow
from there because the cost was very low.
And that really gave us a lot of flexibility of also testing out different features and different flavors of it.
Okay, so RetoolDB is fully powered by, backed by, managed by Neon.
Neon Fleets, neon.tech slash enterprise.
Learn more.
We love Neon here at ChangeLog.
We use Neon for our Postgres database.
We enjoy every single feature that Tamar mentioned for RetoolDB,
but we use it at a small scale, a single database for our application.
They use it at scale.
One single engineer propped it up, manages it.
That's insane.
They would have never been able to do this without Neon.
RetoolDB would have cost more and may not exist without Neon.
Okay, go to neon.tech, learn more, or neon.tech slash enterprise. Enterprise.
Has Dependabot gotten any better about not warning on latent code or being able to detect actually used code versus code that happens to be in a dependency that's never
executed in the run of a program or dev dependencies only. Because it seems like in the past, it's had
a lot of false positives for me. And so I just, you know, I jumped ship. I'm just like, well,
I'm kind of done with you. Because 90% of these aren't actually my problem, but you're making
them my problem. And I'm assuming that that's something that y'all work on because I'm probably not unique in that way. And I wonder if it's unclear if they're used or not.
And being able to trace that through the code
and figure that out in a scalable way
is a difficult challenge.
So it's definitely something that teams are tracking
and working on and always trying to improve.
Fair. I wouldn't want to work on that problem.
I can understand that it's a hard problem,
but I also want somebody to work on it.
Can we go back to attestations?
Because it seems like it could be a good step forward in the right direction. And obviously it's out there and ready to use on it. Can we go back to attestations? Because it seems like it could be a good step forward
in the right direction.
And obviously it's out there and ready to use and stuff.
You gave the workflow a little bit
from the end user perspective.
Like you have GitHub actions if you're using them,
toggle on a thing,
you probably have to decide what happens
if an attestation fails,
and that's roughly the workflow.
But what about from the maintainer perspective?
What do I
have to do to have my code attested to as I'm deploying it out to people? So the main thing
today, and again, this is a very early capability that we're shipping. So I expect this to continue
to just improve as more of our partners adopt it. and this becomes sort of ingrained in the developer ecosystem.
But today, it's as simple as adding a specific GitHub action to the workflow.
So a lot of open source projects
do their builds on GitHub actions.
And in the workflow itself,
you can specify different segments of it.
And often, you would include an action
for checking out the code.
In fact, that's one of the main ones pretty much every action on GitHub.com includes.
You might include an action for deploying your artifact to AWS or to PyPy or to Azure.
And then we've written and released an action that will let you attest to the code. And essentially all it does is, as the build's happening,
once that artifact is produced that you're either going to deploy somewhere,
upload to PyPy or Rust or wherever,
it will sign it using our TrustSor cryptographic kind of root trust.
And then it will store that attestation in the same repository
on which the action was run on.
And so there's essentially a repo name slash attestations. The attestation in the same repository on which the action was run on. And so there's a essentially
repo name slash attestations, the attestations there, and it's available for download and use
for anybody to verify against cryptographically through the GitHub command line tool.
And on the receiving end, obviously you're still in GitHub actions. So now you're just using
whatever you guys built to go ahead and do that process during your own deployment when you're
saying. So you don't have to be in the GitHub ecosystem at all to use this. That's the great
part about it. So let's say the artifact is built and it's uploaded to PyPy. And then a developer
who's sitting in another company using a completely different tech stack, but still uses that PyPy
repo, they can download the PyPy artifact onto their local machine. And then they can use
the GitHub command line to go check the attestation to see if it's the same one that they think it
should be built on that repo, that org, that flow, that branch, whatever their criteria is.
And this is where you can use things like policy enforcement software. So open policy agents, a popular one. So you can
write policies and say, before I deploy, I want to make sure that everything I'm deploying came from
one of these three organizations on github.com, the source code. It was created and built there,
but where it was downloaded and whether it was from a local, you know, artifactory instance or a public, you know, artifact store like MPM
doesn't matter because the attestation is cryptographic and can happen out of band.
Does that allow you to actually track the binary in the case of a binary
deploy to the source code commit?
Like, what do you know about the source code? The exact.
You can go all the way down to the commit. You can go to the commit.
You can go to the actual workflow that built that binary which is
fantastic and this is kind of why when we were talking about xz i think this could end up being
a helpful deterrent again i don't think it's a a one size solves everything situation but
you can go from essentially a binary you found laying around on your computer to knowing which repo
built it, which workflow, the build instructions that went into it, which commit went into it.
And it gives you this ability that is incredibly difficult to do now at scale. And that's really
how we see this going from an industry perspective is more and more tools like this that help us do
this at scale to give that essentially
unfalsifiable paper trail. That's really interesting to like just find a binary and
test it. Like how does that work? Literally, how does it work? Underneath the covers? Yeah,
like how does that work? How's the GitHub command line doing it? Yeah. Tell us how it works, Jacob.
See, now you're double-clicking past my ability to be incredibly useful here.
Well, my understanding is,
and we'll follow up if I totally get this wrong,
but generally speaking,
we are looking at essentially
the cryptographic hash of the binary.
Right.
And then looking up the attestation.
You have to kind of know the attestation
that you're going against.
So knowing which org you think it came from on the internet or knowing
like, hey, I think the GitHub Actions org built this.
So you have to know where to go ask for the attestation generally,
but hopefully corporations deploying in a high sensitive environment
are going to have that knowledge. But you don't necessarily have to say like, oh, I have to go find
the attestation file myself. You just have to roughly know where it's at.
You can point the command line there and then it's going to go grab the attestation
and compare the cryptographic hashes based on the signature and the attestation signing
to be able to tell you if it's the same one or not. And then once you have that,
then you can display all the information about the build that went into that binary. That'll make sense. Is that a GitHub? It's obviously GitHub specific
in your implementation, but is this something that other platforms could also do attestations
and just follow the same? You have a spec or something where we could just get it to be
generally useful? Our approach is based on the SIGstore approach. So it's a commonly,
it's essentially a scaled version of what we released last year with NPM and SIGstore.
So for public repos, there's still that kind of normal flow.
And then enterprise customers have the opportunity
to use a private implementation that's inside GitHub
so that their attestations don't show up on the internet.
They may not want them to.
This is like the second time I've heard the phrase double click in the last week.
I haven't heard it too frequently, but now I've heard it twice in one week.
So good job, Jacob.
You need one more.
Everything comes in threes.
Yeah, for sure.
Well, I was going to ask you what else you're working on.
What else is cool in this space that is burgeoning or in development
or the next attestation that's going to add another layer to our defense in depth.
Yeah, I think this is a space I'm excited about
because I want to see more and more parts
of not just the GitHub product,
but software development products make use of these things.
And I think that's where we're going to be heading as industry.
So right now we talk about supply chain
and table stakes that are necessary for secure development. I think there's,
there's some things that are becoming common in the industry, whether everybody's doing them or
not, everybody at least acknowledges they're necessary. And so that's things like don't put
secrets in code, keep your dependencies up to date, you know, things like that. I think that
the next step is for us to, as an industry, say that
including attestation and a full paper trail of what was, what is going into the software that
we're all using. And this is, you know, attestations is a step beyond SBOMs, right? So SBOMs is here
are the ingredients that are going into the recipe I made. But attestations gives you that receipt from where those ingredients came from.
So you know which grocery store they came from.
You know which shelf they came from.
You know which manufacturer made that and shipped it to that shelf.
And so it gives you that next level down.
And so I think that's going to be where we're headed as an industry is, and where I think we should head as an industry, is not only making those things available, but making it just standard as part of the build workflow.
That's just, everybody expects it.
It's just, we have tools that show it.
It's very easy and built into the artifact repositories,
the developer workflows, CICD flows, things like that.
I think we had to build the scaffolding and framework first,
but now that that's coming along,
I think this is where we're headed.
How well received, I suppose, has the SBOM been?
The software bill of materials, is it widely used?
Is it generally adopted?
I remember talking about this and hearing about this,
but I'm not in this world to even write one, build one, care about one.
But how well received have they been?
I think there's a lot of people interested in it.
I think everybody acknowledges that it's part of the solution
we need to adopt as an industry.
But it's also, I think, acknowledged that it's not going to solve everything.
It's just one part of kind of this broader trust flow that I was talking about.
And so it's, you know, it's something we support on GitHub.
It's something that a lot of companies are making standard as part of their build and deployment practices.
But I don't think it by itself is necessarily going to be the solution
to the supply chain challenges we all face.
When you talk about broader adoption of these practices and tools,
what are you and your teams doing in order to get that done?
Obviously, you put the features out there, you make them usable,
and then you blog about them, and then you use GitHub's channels.
But then is there conference talks?
Is there training?
Is there tutorials?
Because really a lot of this has to be known
before it's going to be adopted.
What are you doing there?
The short answer is yes.
Our teams are going out to conferences and talking about this.
We're putting together documentation and training on these things.
I think there is an awareness here that is part of it,
and we're absolutely doing that. At a broader level, you ask what else we were working on and we made it almost 45
minutes without talking about AI. All right, you're allowed. We waited a while. But I think
this is the other part of where I think that we can make things easier for an adoption for the developer ecosystem. So I'll
give you an example here. We've had this capability for a while called GitHub code scanning, CodeQL,
the one I mentioned earlier about from GitHub Advanced Security. And it's great because it's
got a really powerful engine that essentially models a piece of software. And then it will
trace the sources and sinks, the inputs and outputs
of functions, and it'll trace the data through the source code to find out where there's a
potential vulnerability. It's fantastic. In the past, what would happen is this would run on
accounts that had this enabled, and it would show developers like, hey, we think we found a
vulnerability here. Here's why. Here's some documentation you can read about it. But then
it was sort of up to the developer to go figure out what to do about it. And so they had to pivot
out of their workflow. They had to go to a search engine or wherever on the internet and go figure
out like, okay, well, that's great that you found this for me, but how do I fix it? And so with AI,
what we're adding in is the ability, we're calling it code scanning autofix. And so what will happen
is in the pull request where it traditionally shows you what's wrong or what we believe to be a vulnerability, it will now also open a second
part of the pull request with a suggested fix in it. And so we're using Copilot AI to be able to
do this. And we'll say like, hey, we found this thing. We think this is going to fix it for you.
If you agree, just hit accept and move on. And the ability to kind of get that in front of a
developer and make it part of their flow, I think is really, really important. And the ability to kind of get that in front of a developer and make it part of their
flow, I think is really, really important. And then also, you know, we're early days here,
but we're seeing folks use Copilot chat and the IDE so that the interactive chat capability we
have to ask Copilot about these things. Hey, is there anything insecure about my code? Can I make
my code more secure? You know, what would make this more resilient to an attack?
And, you know, that interaction with Copilot, it's going to look at it and say, well, hey, like if you structured your function input this way, it would be safer.
Do you want to do that? Just click yes. And it copies it over and you're off to the races.
And so I think I really think this is where, you know, we've kind of known as an industry that things like static analysis are
important, and we've all worked with our teams to enable it, and everybody's on a different level
of maturity on that journey. But generally, helping developers keep up with the pace by which
vulnerabilities are found and CVE sent out has been, I won't say a losing paddle, but it's been
challenging. It doesn't feel like we've been making ground. And I think AI and things like
autofix are going to allow our developers and our security teams to make up ground in a way we
haven't been able to do before. Well, now that we've broken the seal, I thought this was going
to be your answer to the pentapod getting better. I thought you were going to be like, well, we're
throwing AI at it and it's getting better at detecting
hot code paths. But I wasn't going to bring it up
in the moment, so I was waiting for you to bring it up
before I loop back around to that.
I think Autofix sounds really cool.
Didn't we see that demoed, Adam, recently, I think?
And it worked great in the demo.
I'm not sure about real life.
Exact same.
Yeah, it works exactly like that.
Demos always work the same in real life, right?
Well, I wonder, is this the real world yet?
Is this the promised future world?
Are people using autofix today?
We're getting great feedback.
I mean, it's showing right now that the suggestions
are remediating more than two-thirds of vulnerabilities
with little to no editing.
And we have bigger plans for this, too,
to be able to really, I mean, our goal is to goal is to make it easy for developers to build secure software.
And so how do we think about this at scale?
What are ways we can reduce that friction for developers finding and fixing vulnerabilities in code?
And the interesting thing with something like Autofix and CodeQL is even if it's clean today and everything's fixed today, it may not be tomorrow because security researchers are finding new things every day.
CVs are getting released every day.
And so how do we make this an ongoing practice that is low friction and low pain for developers?
And I think that's just really part of how we're trying to design and think about integration of AI capabilities across the board is how to accelerate developers and get them focused on the things they want to be focused on. And frankly, pretty much every organization and company wants their
developers to be focused on. They don't want them clicking through a bunch of menus and searching
how to fix something that they fixed yesterday, but just can't remember what it is. They want
them moving on to the value add work. And so do we. What about proactive versus reactive? Because
CVEs are very reactive when it comes to security.
It's like it's a known thing.
And it's obviously an awareness thing once it's known because then it's still burgeoning and it's still being more people being made aware of it.
But how about proactive things?
I know you mentioned scanning.
Obviously, a testing is part of the, to some degree, proactiveness. What other plans or ideas do you personally have
or does your organization have
around being proactive, typo squatting, things like that?
Where it's like, you know,
I didn't mean to type in React spelled incorrectly.
I meant, or with a plural, you know, that kind of thing.
Like how, what are the proactive ways
you're securing things?
I'll kind of work my way into it a bit.
So at a high level, like with our SaaS capabilities,
CodeQL is why I actually am a huge fan of CodeQL
just as a security practitioner,
whether I worked at GitHub or not,
because the way it works underneath the hood
is about variant analysis and modeling,
not about trying to pattern match on a specific CVE.
Now, obviously our security team and researchers are informed by the types of bugs and vulnerabilities
that are being found by the research community. And we have a great research team inside of GitHub as well.
But the way it works is it's modeling and looking for patterns and known insecure patterns
versus a specific like, oh, we know that function in this thing is broken or vulnerable.
And so I think that's kind of step one.
I think the other part of this is I think this is where we're going to see significant advances,
and we're already seeing advances now in editor co-pilots.
So we build and deliver GitHub co-pilot, but being able to have that AI assistant in your code
looking for things that are typos,
that are, hey, we saw you typed it this way,
but did you mean this?
Like, we actually think it would be much faster
if you did it this way.
And like, I think we're going to see a lot of advances
in the proactive space as those filters
and as the models and as things like fine tuning
get better and better in our
shipping. I think that's a good place for AI, obviously the pattern matching and being that
buddy next to you. So you don't always have to be on pins and needles of, am I making a secure
choice or this dependency? Is it, you know, is it really secure? Has there been a maintainer swap
recently? Was it the right name? Did I typo squat or typo butcher this and I
actually installed the wrong thing? We've seen that even with chat GPT where people will ask
for things and they'll give it back fake. You know this better than I do, Jared, because you run news,
but like the chat GPT will give back fake information. It'll hallucinate a package name.
Yeah. And then some of you will go register that package precisely thank you jared hallucinating package names and it's not real but now it's sort of in the zeitgeist of hallucinations
and then people think it's a real thing and now a squatter will sit on that and do something
nefarious which i think that's a great place for ai to pattern match on that because as a human
i am generally going to be lazy or potentially even just not as good every single day, all day,
every day. And so I'm distracted. I'm going to mess up, you know, flawed. Yeah, I agree. I think
there's a, there's a productivity gain here too, that shouldn't be overlooked, which is like,
there are a lot of things that technical people sometimes just don't go do because they're like,
well, that's going to take me forever. And like, I'm a good example of this. So
I grew up as a developer. I've been a developer most of my career, but I don't develop
every day anymore. And so the idea of like getting a dev environment spun up and going and doing
something productive is usually like not worth the effort and not really what I'm paid to do.
But, you know, a few months ago we were working on a, there was like a Jupiter notebook that helped
analyze statistics and
heuristics for the workforce.
And it was something I wanted to use as a people manager running an organization to
like get insight into my workforce.
And it wasn't doing what I needed to do.
And I was like, ah, I haven't done a Jupyter notebook in like eight years.
This is going to take me forever.
And then I was like, I wonder if Copilot can help.
And so I literally pulled it up and started asking Copilot some questions and started doing some autocomplete stuff.
And I had it sorted out in like 20 minutes, had it done what I needed to do, got my answer, was able to get back to what I really needed to be doing.
Versus that probably would have taken me six, seven hours without an AI assistant being able to do those things.
And I just think of that times 10, times 1 times a thousand times a hundred thousand for corporations and organizations. And I think, I think that's gonna just get more powerful as we
go with things that get up, go by time to monitor your crons.
Simple monitoring for every application.
That is what my friends over at Kronitor does for you.
Performance insights and uptime monitoring for Kronjobs, websites, APIs, status pages,
heartbeats, analytics checks, and so much more.
And you can start for free today.
Kronitor.io.
Check them out.
Join 50,000 developers worldwide from Square, Cisco, Johnson & Johnson, Monday.com, Reddit, Monzo, and so many
more. And guess what? I monitor my cron jobs with Cronitor and you should too. And here's how easy
it is to install and use Cronitor to start monitoring your crons. They have a Linux package,
a macOS package, a Windows package that you can install. And the first thing you do is you run Cronitor Discover when you have this installed.
It discovers all of your crons.
And from there, your crons will be monitored inside of Cronitor's dashboard.
You have a jobs tab.
You can easily see execution time, all the events, the latest activity, the health status, the success range,
all the details, when it should run.
Everything is captured in this dashboard.
And it's so easy to use.
Okay, check them out at chronitor.io.
Once again, chronitor.io.
I think AI Red Teams makes a ton of sense.
I mean, there's probably startups doing this.
I don't know if you all are thinking about it or doing it,
but I've done some penetration testing,
especially after I got out of college.
And it's very common for an enterprise
to hire a security team, an outside consultant,
to come in and pen test their system. And they're
very expensive and they're very good at what they do oftentimes. But a lot of that work is just
grueling and fuzzing and like running this against that and doing this. And like they have their set
of things they do, you know? And then of course there are like the, the expert hackers who the
AI is never going to be as good as this guy or this gal because they know whatever, whatever.
That's real.
It's real rare, but it's real.
But for the 80% of orgs that can't afford red teaming or auditing at all,
but could probably just send a bunch of computers
to do non-deterministic fuzzing against their systems,
that seems like it'd be a win in the security world.
Is that going on?
Is that something GitHub's thinking about?
I mean, it's definitely something that's going on.
I'm closely following several startups
that are putting some time and energy into this space.
I think it's going to be another powerful tool
in the tool bag very soon.
I think generally in the security space,
I think there's a lot of things that fall into that category
where the first stage of something that we are paying for a very advanced,
highly educated user to go do is often repetitive, is often like, I'm going to go query this database
to figure this out. And then I'm going to go to do this Splunk query. And then I'm going to take
that and export everything into an Excel spreadsheet so I can do a pivot table with
these other IP logs that came from over here or whatever the case may be. And I think that's an area too, that we're going to see AI, AI like response and kind
of stock, you know, invent like the early triage stage, gain a lot of speed by because, you know,
I think what we really want is we don't want less people on our teams. We want those people doing
the things that they're trained to do and the things that really, truly add value on top of that, of being able to use
their intuition experience and find the signals that don't look right, that know how to go triage
and figure out how to deal with a potential security incident or rule them out. But there's
often so much time spent before the experience can kick in.
I think this is another area we're going to see some pretty amazing work done. I know there's a
lot of companies doing that in the space right now. How do you see GitHub's position in that
world? Where do you decide what GitHub should invest in? I'm sure you're not the only one
deciding, but what's the decision-making process of what is worthwhile for GitHub to be doing
versus, well, that's something that some startups can do,
but we're not going to do that?
It's definitely not my decision.
I run the day-to-day of our internal security team.
But I can share, our focus is developers.
Our focus is accelerating human progress
through software development and enabling open source.
And so we tend to focus on the things
that we can bring a lot of value to in that space.
And that's why we're so excited
about some of the AI capabilities
because I think we all see the news articles every day.
There's a million new models and apps coming out every day.
I would say it's probably, if not accepted,
at least talked about a lot,
that a lot of those are cool technologies looking for a fit or their solutions looking for a problem in some cases.
I think software development is just clear how powerful this is.
And that's why we're so excited about incorporating that into the editor,
into things like, I mean, how often do developers
work on code all day long? They finally get to the time they're ready to the pull request and
they're like, ah, do I really want to spend an hour writing a really great set of documentation
on my pull request? Well, what if we had AI be able to scan all the changes that they made and
write 80% of it for them before they did that? And so I think it's been clear to us for a while.
I mean, we released Copilot Tech Preview in late 2021,
well before the current kind of wave of things was out.
It's been clear to us for a while, this is a huge win.
And I think it will come in other parts of the industry,
by the way.
I don't think, I'm not in any way saying
it's not going to be helpful in other parts of our lives,
but I think software development space,
given the structure,
given the modeling that's existing
and given the tooling and the work that's already going
into the industry the last 20 years,
we're just seeing huge wins and huge gains already.
So you're in charge of actually
GitHub.com operational security
as well, or yes?
Yes. You got any cool stories you can tell
us? Like any long nights?
Any rough weekends? You know, DDoS?
I mean, everybody in the security world's had long nights and rough weekends.
Tell us some horror stories.
Come on, man.
You've survived them.
You survived it.
Surviving, maybe.
Yeah, so internally, real quick.
So our security team is great.
We basically, like the short version is, we protect the company.
So we protect, you know, we call ourselves hubbers. We protect our hubbers, the laptops and the data and the access of our internal systems. We protect the
product. So, you know, operationally and github.com and our products, but also working closely with
our engineering partners to make sure that what we build and ship is secure and safe. And then
we also help secure the community. So we have a research team
that's out looking for vulnerabilities in open source software. They're helping educate the
open source community and researchers on how to use things like CodeQL and secret scanning and
how to incorporate AI into their secure development practices. So that's kind of the three-pillar
remit that we have. You know, in terms of war stories, I think, interestingly, what's been
on my mind a little bit more lately
has been, I mentioned earlier
that we offer
a lot of amazing
features for free to
public repos, so Codespaces and
Actions Minutes and free repos
and most of the
capabilities that enterprises have.
I know you're going with this one.
It turns out that threat actors also have figured that out. Repos and most of the capabilities that enterprises have. I know you're going with this one.
It turns out that threat actors also have figured that out. And when they hear things like free compute,
they go right after it for, you know, pick your abuse factor.
We see campaigns that will try and escalate the number of stars
that a particular repo has to increase its popularity, right?
We see people trying to mine crypto using free compute.
I mean, we see a lot.
Hosting files on github.com that we'll just say
don't have anything to do with software development.
And so it's a challenge.
And it's something that we have a fantastic team working on.
We're employing, we have been employing machine learning
and AI for a while in this space.
We'll continue to do that.
But it's a really complex, challenging problem.
And the balance we have to strike here is because we serve so much of the world software development ecosystem,
we can't turn the dials too strict because then we start locking out hundreds, thousands of users
that have legitimate use cases or security research.
And certainly those things happen. You can't fine tune every filter to be perfect, but we really
try and strike that balance of how we do that. And so this is where we're also working with our
product counterparts to understand what are things we can make changes to in the actual product or
maybe a signup flow or things like like that that will decrease the likelihood of abuse
or make it impose cost a bit higher
to actors who want to do that.
And so that's certainly something, you know,
there's new campaigns every day
that come out in the space that our teams are firefighting.
And they're doing a great job at it.
And it's something that is just top of mind
because it's something we don't see getting less of it.
We're definitely seeing an increase.
And AI is a tool that those actors are using as well.
So they're using AI to generate fake issues and pull requests or whatever the content is.
They're using it to create fake profile pictures, all sorts of stuff.
Wow. Never a boring day, I'm sure.
So what happens in your life, in the life of Jacob,
when a DDoS hits or something?
You on pager duty?
Are you above that?
Do you rush into the hospital? Or the hospital?
Hopefully not the hospital.
No, rush into the office?
Or are you working from home?
Do you get situations where like,
hey, we're getting DDoSed.
What are we going to do?
And then what do you do in those circumstances?
Yeah, I mean, there's always,
I think that's always true in security teams.
We're a fully remote company, so there's no rushing into the office.
It's usually rushing to my office if needed.
But we have a fantastic team.
I kind of jokingly have told the team,
if you need me to log into production and do anything during an incident,
we've probably already gotten to a state where things are pretty bad.
Yeah, bigger problems.
That's not what I should be doing.
So no, my goal is to support the teams, understand what they need,
and figure out do we need to page more people in or the right people in?
How can I support the great leaders that we already have?
And then there's a comms element to this as well.
So depending on what the issue is,
we had to rotate one of our public keys last year.
And we believe very deeply in transparency
in what happens on the platform. We're trusted in the community and that trust is
only maintained through transparency, I think. And so, you know, a lot of it is, you know,
how do we want to get these comms out as quickly as possible? How can we be as transparent as
possible, you know, sticking to the facts and sharing as much as we can, particularly actionable information.
So that's a lot of what we do.
Thankfully, our engineering operations team is world class and handles a lot of the DDoS attacks.
And they're very, very quite good at it.
So on that particular one, we have a great engineering team on that.
No hospital runs for you then, I guess.
Yeah.
Let's hope not.
Get you out of the hospital.
That's good. Don't want to go to the hospital, especially for DDoS. That's a different kind of DDoS, you then, I guess. Yeah. Let's hope not. Keep you out of the hospital. Oh, that's good.
I don't want to go to the hospital,
especially for DDoS.
That's a different kind of DDoS,
you know what I'm saying?
Indeed.
You show up and you tell the doctor,
I got a DDoS.
He's not going to know what to do, you know?
Well, get out of here.
Don't, go deal with that.
When you look at the open source supply chain,
I really don't even like to call it
the open source supply chain,
but it's the industry accepted term.
So bear with me.
When you look at it and you realize that open source is obviously one and you realize how
important a role it plays in just obviously software at large but innovation new startups
new side projects join an individual developer's life the freedoms that a person can have to create software
and just share it simply when you look at that entire ecosystem as a security expert what do you
wish would be there that's not there today to secure it what is it like if you had a magic wand
and you just somehow wave it and a couple new things appear, what would those things be and what role could you personally play in making them possible?
That's a great question.
I think it goes back to what we were talking about earlier, to be honest.
I think that today there is a lot of variation in freedom, which is a good thing, to be clear.
I'm not suggesting we take that away.
But there's not necessarily clear paved paths for open source developers, hobbyists, and even more corporation-backed open source efforts to know what the best practices are for building, securing, deploying, attesting, signing.
It's complicated, right? And, you know, I've been
in this space for a really long time. And so, you know, when I rattle some of these things off,
it may feel like, oh, yeah, like, okay, cool. That's easy, quote unquote. But it's not. It's,
you know, we didn't have Salsa frameworks, you know, 10, 15 years ago. Like, the frameworks
and the thinking are there, but I don't know that as an ecosystem, and this is beyond GitHub, GitHub's part of that ecosystem, makes it really easy for people to do the right thing.
So build the right way, secure it, update it, patch it, deploy it, assign it.
Like that end-to-end flow is still complicated.
People use, maybe they store their source code on GitHub, but they build it
somewhere else. And then when they build it somewhere else, they're not scanning it or that
place isn't secure. And then when they upload it, we don't have any way to see where it came from.
And then when we download it on the other side, we don't really have a way to automatically get
a sense of the risk because it's difficult to tie all those things together. And so I think
if I could wave a magic wand, it would be essentially to have us partners in industry,
you know, I think GitHub's part of this,
to make those things easier for developers
to just do the right thing out of the box.
And then, of course, have the freedoms
if they need to do something more complex or different,
that's totally fine.
But I think a lot of use cases just want to know,
okay, how do I build this thing
and deploy it to this cloud provider? How do I build this thing and deploy it to this cloud provider?
How do I build this thing and make it show up in PyPy
and have it trusted with a little badge on it?
And I think to do that well takes a significantly higher amount of work
and expertise than would be optimal if we really want to scale this.
It sounds like that world that you just painted is a world where GitHub accepts more
and more responsibility as a security center point.
You've already accepted the responsibility of hosting the open source code, right?
You've already accepted the responsibility of supporting on all the ways open source
at large.
So now the final layer might be more and more over time if not just
now responsibility on the security front typo squatting like you'd mentioned on maven if i can
download something from there or pi pi or wherever and i can have that attestation back
that i think this you're already doing some of the proving grounds for this but it sounds like you're
you're for gith up accepting more and more responsibility
from a security standpoint.
I mean, I'll say that we as a company take that responsibility already
very seriously, and we talk about it a lot internally.
And I think at a broader level, I think each industry player in this space,
we all have to take more responsibility for this. I don't think it's
just GitHub. I think it's all of the corporations that are not just investing in open source,
because there's many that do that. They pay developers to work on open source projects.
I think that's great. But I think it's also the organizations and companies that use open source
prolifically need to take more responsibility in this space too. And I think it's all of us
together. I think GitHub's already taking these strides more responsibility in this space too. And I think it's all of us together.
I think GitHub's already taking these strides
and will continue to do that.
And that's why we've released things
like advanced security for public repos.
That doesn't necessarily,
that's not a free thing for us to do,
but it's an important thing for us to do, right?
And so I think as far as I'm aware
that that vision and direction
is not going to change for us.
We're going to continue to invest in those things.
Well, let's imagine then, because this is probably pretty close to true.
There's a lot of people listening to this podcast with us, and they're an hourish in, and they're like, man, this is awesome.
But they hear you say that, and they say, wow, I would love to find a way into – I'm at one of those organizations that could partner with GitHub to bolster the security model of open source, the open source
supply chain. In what ways can they reach into
GitHub, talk to you, talk to others to create that bridge, to create
that partnership? What are some of those paths and methods? Yeah, that's a good question.
I think from a practical perspective, without having to reach out, I think
there's some simple steps that a lot of organizations can take, which is go play with the new attestation capability and use it.
Start signing artifacts and making it part of your build workflows and then talk about it.
Tell people how you're doing it.
Give us feedback on what would make it be better.
Because I think those key scaffolding building blocks are so important to the industry right now.
Turn on things like secret scanning and push protection, like show through example, lead through example on how to do these practices internally.
And I think in terms of the partnership angle, we have a fantastic OSPO, open source program office at GitHub that does some of these partnerships.
The security research team that I mentioned earlier is always out talking up to the security community about how
to do these things and level this up and make it better. And then there's other kind of external
entities. So there's the Alpha Omega project as part of the OpenSSF, the Open Source Security
Software Foundation, if I got those acronyms right, that's looking at ways that some of the
bigger corporations like Microsoft and Amazon and others have invested money into on how to level up the entire open source ecosystem security space.
And what are the programs and possibilities that they can do to help do that?
And so I think there's opportunities there for corporations to invest financially if they so choose to be able to do that.
And then, you know, at a very practical level, like go sponsor your favorite open source project.
Like I use Homebrew like crazy.
Homebrew is awesome.
Go sponsor it.
That kind of stuff.
Dig it.
How about the maintainers themselves?
Like give me some nuggets
for specifically open source software maintainers
who are either burdened, tired, excited.
Pick any adjective you want to describe a maintainer.
What can they do to personally bolster their GitHub profile?
What things should they do?
What are specific things they could do on their repositories, etc.?
Even their organization, if they have an org for their repo.
What are some things for them?
Yeah, I think open source maintainers are amazing, first of all.
I'm so thrilled that that's part of the community we're able to support every day.
I do think that the adjectives you mentioned probably at some points describe every maintainer,
maybe all at once, maybe four times differently a day, maybe over their journey.
Because I think it can be overwhelming, right?
Some maintainers don't have a robust set of contributors that are helping.
And it's, you know,
a one or two person effort. Our hope is to be able to give them the security tools built into GitHub
that get their level of security up to something that is significant. And so, you know, this is
where things like just, you know, go ahead and turn on code scanning if you haven't done that yet and experiment with it
and see if it can help secure the product.
And, you know, things like attestation,
even if you don't, as a maintainer,
use or care about attestation,
make it available to your developers
because it's part of the repo to your users.
Turn it on and include it in there
once we've kind of gone full GA with that.
I think there's things like that,
that developers can do and maintainers can do. And then I think there are other things that we
are continuing to try and make more accessible and easier to maintainers as well. So things like
we've got some scanning tools we released open source to help make GitHub actions workflows
more secure and detect insecure or overprivileged requests in GitHub
actions.
So there's things like that as well to just kind of be aware of and, you know, always
reach out to the OSPO and other places in GitHub and the community for help on those
things if folks are, you know, need some additional guidance.
I was thinking about something as part of this conversation, and I'm going to share
an idea with you.
Maybe it's a, in quotes, feature request.
Maybe it's not.
Maybe it's already there, and I don't know.
But what if there was this idea of consensus for when you add a new maintainer to a repository that there is a toggle that says,
okay, every other repo out there, secure by default, is I have powers.
I can give power.
I don't need consensus.
But if there's one or more maintainers,
you have to sort of have somebody give somebody access to become a maintainer.
But then the other people who are part of the organization have to do it as well.
And maybe some sort of like personal attestation,
which is like I, Adam Stachowiak,
agree that Jared Santo can give XYZ access to maintainership and control of this repository.
Something that's like, because you can do that personally, right?
But is there a way to bake that in with software?
Just a simple thing like that.
Does that add more configuration?
Does that add more burden to the process?
I kind of feel like consensus is a natural thing to ask for.
And why not bake it into the blessing of one more maintainer to the project.
Yeah. I mean, whether we should or not, I'm not going to touch that one because there's been many
books, research papers, and blog posts written on that. The Cathedral in the Bazaar is still one of
my favorites on the topic of open source maintainership and kind of thinking about
these communities and systems. In terms of the technical side of it, that's actually what we do internally at GitHub for
entitlements.
So our internal access system is done essentially the way you described it.
So if somebody wants access to one of our tools or a third party capability that we
have inside, or they are a developer and they're new and they're like, oh, I need production
access to do this thing, or I need that kind of access for this engineering system.
They actually open a pull request.
And depending on the sensitivity of that entitlement, different people get tagged in to be able to approve that.
And if it's a very sensitive one, it's going to go all the way up to a VP.
And if it's extremely sensitive, we have to renew it every six months.
And so the ability to do that in some ways is just baked into pull
requests and the get workflow already, which I think is really fantastic. So I think, you know,
from an open source perspective, I think developers could and maintainers could absolutely do something
like that, have like a maintainer file, and use a community driven pull request approach to be able
to do it. Whether we need to build something in addition on top of that, I think that's a great question.
And I would love folks smarter than me
about open source maintainership
and the socio kind of dynamics at play there
to weigh in more than I would
from a security perspective.
The maintainer file is a good start, I think.
And you already have pull requests,
so there's no software needed to be written, really.
It's just sort of like text file at large in a repository everybody touches it what do you think jared is that is
it a good start to something like that do you agree with that there are people smarter than
me to answer the question no uh i have only thought about it for a few minutes it certainly
makes sense in the context of people who agree that they want to do that, you know? Yeah. But I think there's a lot of people that won't want to do that.
And so like, what do we do for them? Make them do it? Probably not.
Yeah.
Give them tooling to do it. Maybe. But yeah, I mean,
effectively you're vouching, you know,
you're putting your name online for somebody else.
And so at least then we have culpability in the case of bad vouch, good
vouch. You know, at least we know how it went down and it wasn't just like usurped authority.
It was actually provided authority. So I could see some positives. It definitely is the way,
like Jacob said, they do it internally at GitHub. And so like it can work inside of
chains of authority, but open source projects and chains of authority are often at odds with each other.
We use pull requests for everything
inside GitHub. That's how we do decision
documents. That's how we
do all sorts of things through
pull requests, which is nice because we have
the ability to kind of see the changes
and trace the approvals
and it's even how
we do security exceptions.
Alright, cool. Plus your GitHub, so pull requests are really cheap.
You guys are getting cheap over there.
That's right, employee discount.
Employee discount on pull requests.
That's right.
This one's free.
Use them if you got them.
Everybody probably thinks my developer green square chart
is wild while that person develops all the time.
This is just the way we work at GitHub,
so most people's GitHub activity chart looks that way.
What's left in terms of securing GitHub or keeping it secure,
whichever you want to phrase it,
you'd probably say keeping it secure versus securing it.
But what else can we talk about that makes sense
before we call this show done?
What's on your mind? What have we not asked you?
That's a good question.
I mean, we've covered a lot of the topics that I think are near and dear to my heart, certainly. I mean, I can probably talk about
the work we do on the security team for another eight hours and not run out of things to talk
about. But, you know, at a high level, we take that responsibility very seriously that we talked
about it being at the center of the developer ecosystem. It's embedded into everything we do
inside GitHub. It's really great to see the partnership between the security team and
the engineering teams and the product teams. It happens every day, all day. We're, you know,
side by side with our engineering teams, helping to build in security across the board.
And, you know, I think part of what we're also excited about is the integration of
AI into those capabilities and what it's going to do for not just being able to kind of have that
there for the sake of it, but truly being able to make life easier for the developer and remove some
of that security toil and just regular toil from their plate so they can focus on things they want
to focus on, things that their teams and businesses want them can focus on things they want to focus on,
things that their teams and businesses want them to focus on.
And so at a high level,
those are the things that we're really focused on
as a team, as a business,
and I think make a lot of sense.
Appreciate it, Jacob.
It's been a lot of interesting conversation.
I definitely am with you.
I'm bullish on this attestation thing.
I'm not bullish on how hard it is to say the word,
but I do think it is a feature that should be highly leveraged to much success.
It's not easy to spell either, but I agree.
Was there not a synonym? I mean, where's the thesaurus? Can we pick something a little bit
easier? A test attestation.
This is not a test. Oh, it is.
Oh, it was a test. Yeah. you failed it, then you passed it.
Jacob, thank you so much for taking time out of your day
to just spit some security knowledge with us,
take us through the ropes of what you're doing there at GitHub.
We obviously are massive fans of the platform
and all the developers on there doing what they do.
We appreciate you sharing your time.
Thank you.
Thanks so much for having me.
This was a great conversation.
So as you would expect, it takes a lot to secure GitHub. Of course it does, right? It's the
largest developer platform on planet earth. It's probably the largest target for all the things
basically on earth. And so Jacob and the many teams that support Jacob
and their cause have their work cut out for them.
So give them grace and maybe a vote of confidence
and some ideas on securing GitHub.
Attestation seems kind of cool to me.
We talked to Daniel Stenberg about this, about Curl.
That is the upcoming episode on Friday on Change Talk on Friends,
deep into the world of Curl.
And I mentioned at Testation and this episode on that show.
So there you go.
Stay tuned to that on Friday.
Of course, a massive thank you to our friends over at Neon.
They power our tiny little, in comparison to the fleets of databases,
serverless managed Neon Postgres fleets of databases.
Yeah, they power our database.
That's kind of cool.
But they also power fleets, RetoolDB, and many, many others.
Check them out at neon.tech slash enterprise or just neon.tech.
And of course, to our friends over at Chronitor, I love Chronitor.
I monitor all my chrons with Chronitor, chronitor.io, and you should too.
And of course, to our friends over at socket.dev, Proactively securing open source,
shifting left in a dev tool.
The best.
That's the best.
Check them out, socket.dev.
And of course, to our friends, our partners,
our home, fly.io.
That's the home of changelog.com.
Launch your apps, your databases,
and of course, now your AI
near your users, no ops, fly.io.
And to the beat-freaking residents,
Breakmaster Cylinder, oh my gosh.
The beats are banging.
I love them.
The beats get me going.
That take on me riff.
Oh my gosh.
That's fire.
That's fire.
Okay.
I'm done.
You're done.
This show's done.
We'll see you on Friday. Bye.