The Changelog: Software Development, Open Source - Securing the open source supply chain (Interview)
Episode Date: March 1, 2022This week we're joined by the "mad scientist" himself, Feross Aboukhadijeh...and we're talking about the launch of Socket — the next big thing in the fight to secure and protect the open source supp...ly chain. While working on the frontlines of open source, Feross and team have witnessed firsthand how supply chain attacks have swept across the software community and have damaged the trust in open source. Socket turns the problem of securing open source software on its head, and asks..."What if we assume all open source may be malicious?" So, they built a system that proactively detects indicators of compromised open source packages and brings awareness to teams in real-time. We cover the whys, the hows, and what's next for this ambitious and very much needed project.
Transcript
Discussion (0)
what's going on friends this is the change law thanks for tuning in if this is your first time
here subscribe at changelog.fm if this is your millionth time listening hey we love you thank
you so much for tuning in all these years if you haven't yet check out changelog plus plus that is
our membership you can skip the ads and support us directly. Check it out at
changelog.com slash plus plus. Today, we are joined once again by the mad scientist himself,
Firas Aboukadej. You may know Firas from GS Party and other parts of the internet, but
today we're talking with him about his newest project called Socket, the next big thing in
the fight to secure and protect the open source supply chain.
While working on the front lines of open source,
Farras and team have witnessed firsthand how supply chain attacks
have swept across the software community and have damaged the trust in open source.
Socket turns the problem of securing open source software on its head and asks,
what if we assume all open source software may be malicious?
So they built a system that proactively detects indicators of compromised open source packages
and brings that awareness to teams in real time.
Today with Firas, we cover the whys, the hows, and what's next for this very ambitious and very much needed project.
Big thanks to our friends and our partners at Fastly for having our back, our CDN back that is,
our pods, our assets,
everything is Fast globally.
Fast is in their name,
and that's what they do.
Check them out at Fastly.com.
This episode is brought to you by our friends at Sentry.
Build better software faster, diagnose, fix, and optimize the performance of your code.
Over 1 million developers and 68,000 organizations already use Sentry.
That number includes us.
Here's the absolute easiest way to try Sentry right now.
You don't have to do anything.
Just go to try.sentry-demo.com.
That is an open sandbox
with data that refreshes every time you refresh or every 10 minutes, something like that. But
long story short, that's the easiest way to try Sentry right now. No installation,
no whatsoever. That dashboard is the exact dashboard we see every time we log into Sentry.
And of course, our listeners get a deal. They get the team plan for free for three months.
All you got to do is go to Sentry.io and use the code changelog when you sign up.
Again, Sentry.io and use the code changelog. all right we are joined by js party regular the mad scientist himself it's for us what's up for
us thanks for joining us on the changelog hey guys it's great to be here it is uh it's good
to have you here for us i'm gonna, I guess, knowing you and paying attention to your mad science over the years.
Thanks.
Yeah, I know.
I love coming on ChangeLog and JS Party and sharing whatever I'm working on.
I always launch my things by talking to you guys, it seems like.
It's a good way to do it.
We appreciate that.
Yeah, we like it.
You like it.
I think we met you all the way back in 2016 if this episode is to be believed.
It's called Mad Science WebTorrent and WebRTC.
That means we've known each other for, gosh, six years.
Yeah, it's been fun being on JS Party.
I don't remember when I joined,
but whenever we do those episodes,
it's always a good time,
and there's always stuff to talk about in JavaScript.
Totally.
Wow, that was a long in JavaScript. Totally. Yeah.
Wow, that was a long time ago.
2016.
Wow.
Mm-hmm.
Changelog 227, which is also a 1980s, I think, TV show.
227.
Oh, really?
Yeah.
I haven't heard that show.
Yeah, that's a long time ago.
Who's Googling furiously to confirm?
Yes.
May 6th, 19... Well, no, it was a long time ago it was a cool show
i watched that 1985 sitcom five seasons it's called 227 what did that refer to was referring
to frost's episode of the changelog i think it was basically like uh they say follows the lives
of a group of middle-class people living in an apartment building so it's like that their
apartment number was 227 yeah yeah so it's like a lot of personalities in there it's like Was that their apartment number? Was it 227? Yeah. Yeah. So it was like a lot of personalities in there.
It's cool.
Do you believe in numerology?
The belief that numbers have
you know
Significant meanings?
Significant meanings.
Yeah.
I'm not sure how much
I believe in it
but I believe it's a thing.
This episode is off
to a great start.
I said I believed it
until I saw lost.
Because they had the 248 thing
and then it ended up meaning nothing.
I can't remember.
I don't know.
Lost me.
Good first two seasons, and then just complete,
utter disappointment from then on out.
Well, there you go.
Totally.
It's true.
And now we're just lost in this conversation.
So let's loop back to Feroz.
Episode 227.
Go check it out, though.
Let's leave that there at least.
You mentioned that you come on and you
talk to us about your
creations. So we had an episode
of JS Party about BitMidi
back in the day. That was just you and I
on that show. And then we also had you
on JS Party way more recently.
That was in July
of last year. Talking about
Wormhole. Today we're here
to talk about Socket, which grew out about Wormhole. Today we're here to talk about Socket,
which grew out of Wormhole.
So let's start with Wormhole, just a brief explainer
of what that is, what it does, why you built it,
and then you can tell us the story of how
this new thing came about.
Yeah, totally.
So Wormhole is a project to help you send files securely with end-to-end encryption.
So it came out of our desire to build a file-sharing, a file-sending service that didn't need to see your files.
And Firefox actually had a thing called Firefox Send for a while.
I don't know if you remember that.
And they shut it down at some point in 2020.
And so when we wanted to do this, we realized we would just have to build our own version of it.
And at the time, we wanted to build it into a bigger idea around end-to to end encrypting everything, like all the things like documents,
notes, you know, collaborative, you know, things like chat. And we wanted to start with files
because it's files are the easiest, you know, they're static content. So why not start with
that? You know, and so we built out this thing. And, you know, tried as much as we could to make
it really, really secure. And we started off with a kind of a fork of the Firefox send code base. And we were inspired by like the encryption system that they used.
And we added actually a bunch of improvements to it and additional like web security features.
So we, you know, we did a bunch of things like, you know, there's no third party scripts in this
in the site at all. You know, we have a really strong content security policy to make sure that if something gets in there
that we can block those kinds of cross-site scripting attacks and stuff like that.
And we also added stuff around peer-to-peer to make it actually faster to send files.
So we wanted to kind of really make the user experience really good with Wormhole as well.
So we do this thing where you can actually start downloading a file before it's
fully uploaded. So if the sender and the receiver are online at the same time, you don't actually
have to wait for the file to fully upload. It'll actually just, the downloader will start to be
able to stream it directly from your browser using WebRTC. So we did a whole bunch of cool
things like that in Wormhole. It's a really cool project. It's online at wormhole.app. You can
check it out. And you can check out that in Wormhole. It's a really cool project. It's online at wormhole.app. You can check it out.
And you can check out that episode for a long conversation about some of those cool tricks and hacks that you all did.
It was a JS Party episode, but it had Adam on it,
so it has very much a changelog feel.
That's episode 185 of JS Party.
Nick Neesey also joined us on that one.
So you're building W wormhole and doing cool stuff and very security conscious,
conscientious, whichever word is correct or both. And you said one of the security things
you wish you could have done better was vetting of your dependencies. And it seems like what
you're up to now, in addition to wormhole or kind of this new thing that you're doing
because of wormhole or out of wormhole thing that you're doing because of wormhole
or out of wormhole is a kind of a shift or a change or a new direction towards solving not
just file encryption and sending problems but this is like every developer's problem yeah on
the internet now which is the whole dependency supply chain security problem that nobody really has good answers for at the moment.
Exactly. Yeah. I think everybody feels the pain around dealing with their dependencies. And that's
definitely something we experienced when we were building Wormhole. You know, there's so many
dependencies in the average JavaScript application. You know, the average dependency has 79 other transitive dependencies.
So you install one package, you get 79 other packages.
That's the average on NPM right now, according to this paper that came out last year that analyzed the registry.
And there's also 39 additional maintainers that you trust when you trust a single package.
So it's a pretty large attack surface.
And for the most part, it's fine. I
mean, most maintainers are good. And, you know, this, this, this, this system mostly works. I
mean, I'm a maintainer and a lot of my friends are open source maintainers and, you know, the,
the NPM ecosystem is, is a wonderful place and it's, you know, full of, full of amazing packages.
And it's, it's, it's really awesome. You know, that everyone can, can just install this stuff
and, and build apps really fast. It's the reason why, you know,
open source is the reason why
you can build an app with like two people
that would have taken like teams
and teams of people before to do.
And so I'm not trying to,
you know, me of all people,
I wouldn't be complaining about open source.
But when you're trying to build an app
for like maximum security,
you're faced with this kind of really difficult
trade-off where really the like the safest thing to do on the one like if you had to just pick the
most impractical position you could have it would be we're going to read every line of code of our
dependencies and we're not going to use anything that we haven't vetted ourselves and that's
actually appropriate for some applications like Like if you're building,
you know, signal, like the secure, you know, and encrypted messaging app, like you probably
want to know, they probably want to know everything that's in every single one of their
dependencies, right? Big companies, Google, right? They're actually an example of this as well. They
check in all their open source code into their own internal repository, they vendor it,
and they consider it their own code. And they have a security team that actually vets it first.
And they also vet updates. So if you want to update to a new version, the security team is
actually involved in that process. They look at the diff and they kind of decide when or if
Google's going to apply that update. So it's a very manual process. But obviously for most teams,
that's both of these approaches of like fully vetting a package is totally impractical,
both because of like time and resources. But beyond that, even just like the skill to even
understand all these various projects, it's like not necessarily the case that every team can even
do that. I mean, part of the reason why you use dependencies is because, you know, you're trying to like stand on the shoulders of giants and you don't want to be
an expert in all the stuff that you're depending on. So to then require every single team out there
to like fully understand every line of code of their dependencies is really asking a lot.
But the thing is that if you don't do that, you know, then there is a small chance that you're
going to install a package
which contains malware, a package that's just been hijacked and that doesn't do what it
says on the box anymore.
And everyone's seeing more and more examples of this in the news.
There was an incident literally last January where a maintainer compromised his own package,
and we did an episode about that on JS Party.
I'm sure Jared's looking up the number right now.
That's the faker.js and the colors.js.
I believe it's 211, but it might be 210.
Still looking, but we'll get it in the show notes.
That's one example, but there was you know another example
before that in november uh and another one in october of last year um is this seems to happen
nearly monthly now you know in in october and november it was koa rc and ua parser js that
were compromised they had a bitcoin or not a bitcoin monero miner added to the package as
well as some malware that stole passwords on
windows from over 100 different programs on windows and also the windows credential manager
so if you were unlucky enough to install one of those packages during the period where the bad
version was on npm then uh you know you would be compromised you'd have lost all your you know all
your passwords you'd need to reset everything you set all your keys and everything like that and um you know ua parser js is depended upon by
react native so like you know these these packages had a lot of dependencies like you know they had
each of those three i mentioned had 30 million downloads a month each so we're talking like
seriously popular packages that were compromised um and uh you know the the ua parser one is just
an as an example you, that one has 7
million downloads a week and has 3 million GitHub repos that depend on it. Wow. Just like that one
project, 3 million GitHub repos. And, you know, the way that one was compromised was the, I believe
the maintainer, their account password was just sort of like sold on a on a russian hacking forum so there's this
post uh wow yeah it's it's not it's not like 100 certain that this is uh you know that this is the
the cause of it but at two weeks before ua parser js was compromised there was a post on a notorious
russian hacking forum uh so on october 5th was when the post was was done and then on october
22nd 2021 was when uh ua parser was compromised and the the post on the and then on October 22nd, 2021 was when UAParser was compromised.
And the post on the 5th said that,
so this guy was basically selling an NPM account
with more than 7 million weekly downloads,
which is a suspicious number
because that's exactly the number that UAParser.js has.
And it said there's no 2FA on the account
and I have the password.
So that's enough for you to log in
and change the email address.
And they were selling this for $20,000 to the highest,
to the first bidder.
$20K.
$20K.
$20K for access to everyone's builds.
Exactly.
Well, I mean, if the commons keeps growing,
which we want it to, right?
Like you had just said, you can produce some amazing software, if not a full-fledged company that could be worth billions with two people or less than 10.
I think, who is it that, I have to look up my notes, but somebody is like super bullish on like with less than 10 people, you can build a billion-dollar company.
And that's totally possible.
I think it's challenging because that's a lot of revenue for 10 people to manage
all accounts and whatnot.
But I digress.
The tech, sure.
The company may be harder
to actually run with 10 people.
I think WhatsApp famously did
what they did with like
less than 50 engineers.
And that was,
how long ago was that?
WhatsApp's acquisition,
because that was a billion dollar
plus acquisition,
had to be seven, eight, nine years ago.
And that was 50 people.
50 engineers.
So just, I think we could trend down towards 10.
Just adding some evidence to that statement, you know?
And we're not going back.
Yeah, we're not going back, are we?
Well, okay, sustaining a billion dollar company
could be more challenging.
Sure.
The point being, though,
is that you want the commons to grow, right?
You want the commons to be there because it's obviously super valuable.
And we're all humans.
And so you can only harden security to a certain point before just somebody's 2FA is not enabled.
Their password is less secure.
They get socially engineered somehow.
I mean, it's the last four of their phone number.
You get their phone number.
I mean, just like there's just unique ways you can get into different things with people. So how do you really,
you can't, since you can't solve for the human problem, you have to solve for it a different way.
How do you do that? Yeah. I mean, I think, I think it's worth looking at kind of why is this
becoming a problem now and why has it not been a problem up until now to kind of get to, to kind
of understand, like, like you know maybe that
gives us ideas of like how we need to solve this and i i think what's changed since you know what's
changed is kind of the way we write software so you know with the with the emergence of npm and
newer ecosystems even rust i would include in this the way that we write software's changed so the
number of the number of dependencies
that we have in an average application
is just off the charts, and it's part of the reason
why people always make fun of JavaScript
and say that the JavaScript programmers
forgot how to program. They need to install
a five-line
package. Their node modules
is heavier than the universe and all that kind of
jokes. Exactly.
That whole thing is like there's some truth to that.
There's some truth to the, you know, it doesn't seem ideal that we have so many trivial packages
and that, you know, the standard library is the way that it is in JavaScript.
But on the other hand, you know, in the aftermath of LeftPad, there were all these people posting,
you know, oh, I can implement LeftPad in one line.
Here it is.
And almost every single one of those implementations had bugs.
Like the actual LeftPad package actually did it correctly.
And so even for something as trivial as that, getting bug fixes and having the code be centralized in a package and having it improve over time actually has a lot of benefits.
So I don't think we're going back even in that regard.
But I think the other thing that's changed is,
so because of that, no one is reading the code because there's just too many dependencies
and no one is actually looking at what these packages do.
And they're relying on tools like these tools that look for vulnerabilities
and sort of just calling it a day and saying,
we installed Dependabot or
something like that. And so we're safe, our open source is safe. But the thing is that
a vulnerability scanner can only tell you so much about a package. It can only tell you if there's
a known vulnerability. And a known vulnerability is something that a security researcher has found
in a package. They found this problem, they've reported it to the maintainer, and they've published a CVE. And this tool can now tell you, okay, this
package is this particular version is vulnerable, and you should update to such and such version to
get a fix. But, you know, that that that doesn't actually stop the type of attack we're talking
about here, where a package is taken over by a maintainer and malware is inserted. That isn't going to be in a CVE database. That isn't going to be a vulnerability
that is a known vulnerability. That is, by definition, an unexpected occurrence, right?
So what you really need, in my opinion, is to actually look at the contents of the package
to figure out what's inside the package. What is it doing? What capabilities is it using?
Does it talk to the network?
Does it read files on your file system?
Does it run an install script?
Was a new maintainer added recently to this package?
That's the kind of thing that if you had that intelligence,
then you could have caught all the supply chain attacks
of the last year.
You can catch all of those pretty early.
Because if you had two new maintainers in the last month
and it
installs you know it's a good candidate to check to vet further yeah rather than the ones who don't
for of course um just for those who may be uninitiated what's a cve for us so a cve is a um
what does this even stand for? Common vulnerabilities and exposures.
Yeah, it's basically a number that is assigned by this organization called MITRE.
And when a security researcher finds a vulnerability in a software package,
they report on that to the maintainer,
and then they can get this sort of standardized number issued.
And what's cool about that is it feeds into all this tooling.
And so there's a common way to refer to these vulnerabilities with a common identifier.
And when tooling reports about it, you know, there's like a thing you can look up and know
what vulnerability exactly we're talking about, because there's so many, we need to like number
them. And so this is kind of just like a central organization that coordinates these, these,
this numbering. So it's also actually a government effort. So the U.S. Department of Homeland Security
and CSA's Cyber Security and Infrastructure Security Agency are the ones that run it.
And all this goes into this thing called the NVD or the National Vulnerability Database.
So it's actually a government, a U.S. government effort. And that's the thing that all the different
tooling basically just reports on right now. It's all like, you know, and this is actually what NPM
audit does. The thing that, you know, and this is actually what NPM audit does.
The thing that, you know, that you see every time you run NPM install and it tells you
you have, you know, 10 packages with vulnerabilities.
It's really just doing a lookup into this database and telling you how many CVEs are
known for, you know, for the versions you're installing.
What you're saying, though, is you can't simply rely on that.
Like, it's a good practice, of course, right?
But if it's the only line of defense, that's where you have a problem with it yeah i mean i think like so take take like what we were doing with wormhole right i mean we were looking for this
i mean we had dependabot installed and you know that will go that pen about will go ahead and
send us a pull request if we have a you know a vulnerability in our project and we would update
that and get that out as quickly as we could and And that's a great thing to do. But the thing we were specifically worried about was
the kind of attack that would target Wormhole specifically
or one of these supply chain attacks that would just start doing random things on our server like mining
cryptocurrency or talking to random servers. There was that ESLint scope bug
that was another attack that happened where the malware actually stole
your NPM RC file and it could publish packages as, as you, um, you know, and then there was, I mean, I can
just get the list goes on, like the things that can go wrong with. So it's like you have this
small probability of a really bad thing happening because when you run NPM install, you're basically
just trusting that the code, you know, is going to do what it says. And you're running this on
your laptop, right? With like
all your personal files, your photos, all your documents, your social security number,
all that stuff is on there, right? And then you're running it on your production server
with all your user data. Now, in the case of Wormhole, we didn't store any user data,
but our app needed to be secure to not serve compromised JavaScript code down to our visitors.
And so that was our concern.
That was what we were worried about.
And once Wormhole got popular and we started getting over 100,000 people using it every month
and we started looking at the really basic statistics that we gather about the users,
we saw like a third of our visitors come from China.
And they're using it to send files there securely.
And so then I started to get pretty worried.
I was like, well, we're not the biggest fish in the pond, but there is a chance that, you know, we could be
the target of state sponsored attackers, you know, who would want to know what what's going on in
those files. And, you know, I, you know, so it seemed like a thing that we should at least have
a plan for how we're going to deal with our dependencies and how we're going to vet these
things. And when we looked around for tools for this, basically I didn't find anything.
I mean, one option is,
I guess you could check in your dependencies
into your repo,
but then you're still not going to be vetting them, right?
I think basically what everybody does
is they just kind of hope for the best
and they don't look at their code.
This is what I call trust the system for us.
I made this up a long time ago.
I went to Jamaica with my wife.
We got married in Jamaica.
And this is actually when I coined the phrase, trust the system babe i said to her and my sister-in-law future
sister-in-law because we weren't married yet hey what are you doing with your bag i just trust the
system you know it's a different country it's not the usa it's like things get handled differently
nothing you can't trust them but there's just different things happen in different countries
differently than they do here and so long story short my assistant-in-law did not have her bag.
And thankfully she checked one because then she actually had clothes
and she could actually go to our wedding and do all the fun things we had planned.
But I came up with this term, trust the system.
And that's what you do.
When you NPM install, you trust the system.
Yeah, but I like trust but verify better.
There you go.
Okay, trust but verify.
Touche.
We also have Dependabot installed and have long been a user of
intrusion detection systems in the past
and monitoring systems
and these are vulnerability scanners
and their technology solutions
ultimately for me they've come up short
because of the noise
because of the noise,
because of the false positives,
because they're not smart enough to know that actually that vulnerable package
never runs in a production context.
It's part of my build process.
There's seven reasons from Sunday why it doesn't apply.
And when I hear your solution of,
well, what we need to do is like be looking for these
other things, such as this project has two new maintainers. Well, that's a, that's an alert I'm
going to ignore immediately, right? Because aren't we just adding more noise to the potential signal
that we're trying to get out? Or maybe there's a different way that you're going about this that
makes it special. Yeah, that's a great question. I'm very sensitive to the noise issue,
so I definitely wouldn't want to use a tool that added noise.
And so that's definitely not what Socket's going to do.
So we're being very careful with the types of things
that we will alert you on.
So if you install the GitHub app for Socket,
it will currently just warn you about typosquatting,
which is the number one supply chain attack happening right now on NPM. Explain typosquatting, which is the number one supply chain attack
happening right now on NPM.
Explain typosquatting real quick.
Yeah, so typosquatting is when somebody registers
a package name that is one or two characters off
from a popular package,
and they hope that people will accidentally install
the typoed version, the typoed version, you know, the typoed name and get the
attacker's code, you know, so think about, you know, think of like something like, you know,
say like you registered the name, you know, react, but you swap the location of the like R and the E
right in the name react. And you just hope that like, I don't know, a couple hundred people are
going to make that mistake and install that typoed version. And then once that runs, I don't know, a couple hundred people are going to make that mistake and install that typoed version.
And then once that runs, then, you know, the attacker's code is in there and it can do whatever it wants.
And we see that, you know, like 60% of malware on NPM uses an install script.
So that means that just you hitting enter on that is like enough to compromise you.
So you're going to type NPM install typoed version of React.
You hit enter, and then
even before you've imported it or whatever,
it runs code on your machine.
And that's the install script
attack vector that you see
most malware use.
And then what is it going to do? Of course it can do
whatever it wants at that point.
But they're just hoping, they're basically just
riding on this popularity.
The thing that's crazy is there's so many of these that even like you know i found for example there's this package called browsers
list have you heard of it before um you probably use it though um it's it's like uh it's this thing
that lets you kind of define the browsers that you want to support in your application so you can
you can say like i support ie11 or oh god forbid hopefully you don't have to support in your application. So you can, you can say like,
I support IE 11 or, Oh God forbid, hopefully you don't have to support IE 11, but you know,
I support, you know, the last two versions of Chrome or whatever. And you can put this into a standardized file in your repo. And then all the different kinds of tooling that you use
can like refer to that file and, um, you know, use that for automatically generating polyfills
and stuff like that. So all the, all the different tools can rely on one place
where you define the browsers that you support for your whole app.
But it's a weird name, right? It's called BrowsersList.
And so I constantly make the typo, I type BrowsersList
instead of BrowsersList.
It's a very easy typo to make.
And if you look up that typoed package,
you'll see that the oldest version that's been published in there is a security holding package from NPM, which is usually you'll see that when a package used to be malware at some point.
And then it got taken over, you know, it got kind of removed.
And then they put up a kind of a blank package to kind of hold the name so that bad guys can't register it in the future.
So I don't know for sure whether that used to be malware in the past,
but it seems like it may have been.
And if you look at it now, the current owner of it has published
kind of a one-line package that just throws an exception and says,
hey, you shouldn't install this.
You really meant to install the other one,
which is a nice service that they're providing to people.
But if you look at that, it's still installed 700,000 times a year,
the typoed version. And so like, of course, yeah. Yeah. There's no tool that's telling people
that they're installing the wrong version and that this is a source of unnecessary risk. And so we
found this and we looked at what packages were doing this and we found, you know, the popular
Preact library was installing browserless, the wrong package. So Preact was vulnerable. So we
went, I mean, it wasn't, you know, an actual vulnerability but it was you know is this unnecessary dependency that could you know
is just adding risk for no benefit and uh you know we found other stuff like there's i mean
there's all kinds of like you know browserify right there's a typo called bowserify it's
literally browserify but like with extra for nintendo extra yeah for nintendo fans but also
with extra code injected into your bundle oh like that's all they added why not a little bonus the
same thing but with extra code it's like a power up's all they added. Sure, why not? A little bonus. It's the same thing, but with extra code.
It's like a power-up.
Yeah.
I don't know what it's doing.
I mean, maybe it's fine,
but it's just like these weird things out there.
And remember, NPM is kind of a wiki.
I mean, anyone can publish anything.
Sure.
So there's no guarantee that these random typos
that only have like 100 downloads
have ever been vetted by anybody.
So you really don't want to run that stuff on your computer.
So anyway, to answer your question, Jared, that's an example where if we
were to tell you in a pull request, hey, you installed a typo, it appears that you install
the typo because we found this other package that's one letter different that has a million
times more downloads than the one you chose. And it just asks you to double check, you know, like,
like, hey, are you sure you meant to install this one and not this other one?
Then we see that as, like,
that's probably not going to be a noisy alert.
Like, it's probably not going to happen that often.
But if it even happens, like, once every three months
that it catches a typo,
like, you're just really happy that it did that, right?
You're not going to complain that this bot, like,
warned you about this typo.
So, like, that's the kind of level of things
we're trying to catch with the bot,
and we don't want to anything like noisy.
We're not going to include it.
It's just not worth it.
Isn't there a thing too where when you read a sentence, you can remove the vowels.
You can like rearrange the first and last letter.
I'm totally making this up, but there's something that's like you can read a full on sentence
that's totally jacked up from a character organizational standpoint.
Like it doesn't have to be spelled correctly and you can still read it because that's the way the human mind works it completes itself
so to speak and i don't know the exact study if y'all know then for sure share it but while you
were talking there about browsers list plural i googled it and landed on browsers list.dev and i
was thinking like how do i even know if this is the right site because you could totally be the misspelled version of it and put a very similar site up and it can look very good it can
look just like browsers list should be and you know especially since the source code for browsers
list.dev is probably open source on github you just copy theirs and make your changes and republish
and it looks almost like the real thing.
Yeah, I mean, the thing that makes this hard too is like that a lot of these names are,
it's not clear, like, you know,
there's like things like, you know,
blah, blah, blah, dash proxy,
or, you know, blah, blah, blah, dash proxied, right?
It's like, what tense of the word are you supposed to use?
There's a lot of this kind of stuff
that you see in these type of squatting attacks.
Another common one is the JS suffix, right? Or in other ecosystems, there's the.py suffix.
Is the library called, is it called standard or is it called standard JS? Well, it turns out
both of those are available on NPM. And if you install the wrong one, right, it's some random
garbage that someone published that is nothing to do with the original project.
Sometimes the JS version is the right one
and sometimes the one that's missing it is the right one.
So it's hard to know a priori what's the correct one. This episode is brought to you by our friends at Square.
Millions of Square sellers use the Square app marketplace to discover and install apps they rely on daily to run their businesses.
And the way you get your app there is by becoming a Square app partner.
Let me tell you how this works.
As a Square app partner, you can offer and monetize your apps directly to Square sellers in the app
marketplace to millions of sellers. You can leverage the Square platform to build robust
e-commerce websites, smart payment integrations, and custom solutions for millions of businesses.
And here's the best part. You get to keep 100% of revenue while you grow. Square collects a 0%
cut from your sales for the first year or your first 100 Square referred sellers.
That way you can focus on building and growing your Square customer base
and you get to set your own pricing models.
You also get a ton of support from Square.
You get access to Square's technical team using Slack.
You get insights into the performance of your app on the app marketplace.
And of course, you get direct access to new product launches.
And all this begins at changelog.com slash square.
Again, changelog.com slash square. so if we were to focus in on your typo squatting detection algorithms are you using other heuristics
besides download counts like how do you decide all right we're gonna go ahead and open that pr
because we think this is a typo there's probably some you gotta be putting some work into that
whole that whole aspect of it help us through the thought process yeah so a typo. There's probably some, you gotta be putting some work into that whole, that whole aspect of it.
Help us through the thought process.
Yeah, so for typo squatting,
we basically say,
okay, any package that has at least 50,000 downloads
is probably not a typo.
It's gotten popular enough that,
I think it's actually,
we're still tweaking it.
So it could be 100,000.
It could be some amount of weekly downloads
where we say,
this package has reached a critical mass of people downloading it that we don't think that this is all typos and so we're just gonna like not ever consider those to be typos but then if
the package is kind of less popular than that threshold then we say okay does it have a name
that's similar to any other name of a package in nPM. And we do this thing called Levenstein distance,
which is an algorithm for basically just counting up
the number of characters that have been added, removed,
or replacements that have happened in a string.
So it's like a way to sort of describe the distance
between two strings.
You can assign a number to it,
like how far are these from each other.
And so what we do is,
if the number is too close,
the distance is too close,
and we set that, I think it depends on the length of the package.
The longer the name,
the more chances for typos there are,
so we scale it a little bit
by the length of the package name.
And then we also take into account
common endings and things like.js
or.py or whatever,
or swapping orders of things.
Because sometimes there's, you know, there's like a package called node canvas, but is it canvas node?
Like, you know, so things like that where you swap the order of words, we consider that to be, you know, that's like one change. Even though technically a ton of letters are moving around, we consider that to be like an easy mistake to make.
So that's like a one letter or like a one cost change.
And then, so then once we do that and we figure out, okay, these are all the packages that have similar names,
then we say, all right, are any of these a thousand times more popular than the one that
you installed? So it has to be, you know, like vastly more popular. That's kind of the current
algorithm we use now. We're still tweaking it and improving it, but that seems to kind of work so
far and it catches all this stuff that we know for sure are typos and we're sort of just going with that and tweaking it as we get as we kind of notice more
cases that it triggers false positives or false negatives but it seems to work pretty well right
now that approach how are you qa in that you're just doing a bunch of typos yourselves and trying
to see how it works or do you have i guess is socket being used by anybody where you start to
get real human feedback like and this is not a typo, sorry.
And you can work that into your sample set or what?
Yeah, we're working with design partners
so early customers that are using this on their repos.
So we have a bunch of different people using it.
Brave, Browser, Expo, that's the React Native tool.
Replit and Passfolio and a couple of other
ones that I can't mention. Let's say
one of them is an end-to-end
encrypted messaging app that you may have heard of.
It's not Wormhole.
It's not Wormhole, exactly.
I can put the message in a file
and send the file and now it's a messaging app.
Boom. Oh no, no, it's not
Wormhole. Yeah, yeah, yeah.
You can't mention it, but it happens to be wormhole.
Yeah.
There are people that are using it and so they're giving us feedback.
I mean, it's a thing we're trying to improve
and get feedback on because it is early.
I mean, this is a new thing we've built
and we're just trying to see what people think of it
and what the right thresholds are for all this stuff.
And there's no tool out there that currently does this, right?
Like looking at the author changes or the contributor changes,
looking at typo squatting type scenarios
where you have, sure, a thousand ways to want to do it,
but obviously I would imagine similar to maybe the way
Richard Hilt might answer it with SQLite,
that their sweet spot really is their test suite.
So SQLite is open source, but the test suite is not.
So I'd imagine that over time, this test suite you have to test this algorithm
for the typosquatting will probably be behind the scenes.
Yeah, yeah. I mean, right now we're testing this through
spot checking it and through really, really basic tests and then just
reports from users.
But building out our super thorough test suite
is definitely on our roadmap.
The thing is, we don't want to ever block a developer.
Even in the worst case where you do get a false alert from this,
it's a comment on a GitHub PR,
so the goal is not to stop developers from doing their jobs.
The goal is to just give you information
that can help you make a better decision.
Nobody wants to have a tool that stops them
from doing their job and stops them from shipping code.
We've got to keep things moving,
got to move quickly.
Nobody wants to install a tool that stops them
from getting work done.
That's why it's important to keep the bot
really high signal as well.
Because we don't want people to get annoyed with it
and turn it off.
So it's currently just stuff that's really high, high signal.
So typosquatting I can see being high signal, low noise.
Permissions creep is another thing you mentioned.
You know, new maintainers
or new permissions on maintainers for a project.
That's the one that I brought up as, to me,
seems like I couldn't possibly care less until I do.
So I'm curious, just the implementation of that one, right?
Because that one seems like it's even more tenuous to get right.
Yeah, so we have, first of all, I should mention,
we have a website that you can go to
to look up the results of the analyses that we have.
So we have like 70 of these analyses on there. And you can see, you know, exactly what they are. If you go
to socket.dev and click on the issues tab at the top, you can see the list of things that we can
find in packages. So we're actually running analysis of every NPM package that's published
and looking for all these things. You can think of it like a linter kind of like, it's
sort of just like hunting down these issues. And then when we find stuff, we put it onto the page
for that package. And then separately we have this GitHub app. And there's a question about like,
what things do people want to know about while they're on GitHub? And so we don't necessarily
take all the 70 things we find and put that into the app because that would be a little too noisy.
And so, you know, this example you have about like a new maintainer isn't by itself an interesting
thing. But if you combine that with, well, also it seems obfuscated code was added in this version,
right? So maybe that, maybe obfuscated code being added plus a new maintainer, right? Plus like
eval being used. Maybe the three of those things rises to the level of like,
you know, this is noteworthy
and this version may require some further investigation.
And it kind of goes back to what Adam said earlier,
where, you know, you have a limited amount of time
to spend on this kind of, you know,
on vetting your dependencies.
Right now people spend zero time
on vetting their dependencies.
And then the big companies like Google spend like,
you know, ton of time and money vetting their dependencies. But for everyone else who's in
between the two or wants to do a little bit more than nothing, but doesn't want to quite go to like
the level of Google, then having a tool that can point you to when a particular package has changed
in a way that is suspicious and potentially malicious so that you can spend your time
vetting that one dependency and looking at the diff for that one update. That's a good use of
time. And then you can ignore the rest. You can say the rest, you know, no new maintainers,
nothing interesting happened. The code isn't, you know, it doesn't even hasn't changed in any
significant way. So what's the big deal? You know, let's just update, right? Yeah. So that's kind of
the idea is it's like, it's a balance you got to, for factors that are for factors like adding a
new maintainer or not by themselves suspicious enough, I think to warrant most people's attention.
Although, you know, again, for certain projects, I could see them actually caring about that
a lot. So this needs to be configurable. Right. So we're working on a, like a dot socket dot YAML
file that you can use to configure exactly what things you want to get alerted about.
But we want to have really sensible defaults for that.
So to give the listener an idea,
if they're not going to socket.dev,
if you search for a package that you use in any of your projects,
it's pretty cool.
It'll give you the readme for that project
as well as an overview of what it is.
And then you all provide this kind of scoring system
of 1 to 100 for supply chain security,
how it rates for quality, maintenance, vulnerabilities,
and licensing as well.
I plugged in Umbrella.js, which we have on our site,
which is like a lightweight,
it's like a three kilobyte jQuery kind of thing
for those who still like jQuery style DOM manipulation, but in a light sense.
It has a 76 for supply chain security, not great. Quality is high, 95.
Maintenance is 50. I think it's probably kind of a done package
as one maintainer. So it's definitely a place where you can
start here if you're just vetting dependencies that you're considering, right?
You don't have to use it in the GitHub app,
alert me and open PRs kind of style.
You can just use it as a source of information.
I'm looking at left pad.
Ooh, package is deprecated.
Package has a non-OSI approved license.
Package has not been updated in more than a year.
Yeah, you see the kind of alerts we find?
So yeah, it's using apparently a wtfpl license
low score yeah a low score it gives it a low a low license score and that's a big red flag
and then yeah it's also deprecated and hasn't been updated in more than a year so those all
show up as a big alerts at the top of the page yeah that's pretty cool there's a particular
recipe for nefarious activity and you're detecting like, like Jared had said, he's like, well, I don't really care if a maintainer changed necessarily.
But like you had said, for us, if you combine that with a recipe of potential nefarious activity, then you do care.
Exactly.
A license change or, as you mentioned, maybe there's more permissions this time.
Yeah, the permissions one is really big yeah i mean if you look at a lot of this this uh the supply chain attacks you know they go they go for you know like you have a package like
ua parser js which is literally a user agent string parser right it doesn't need the file system it
doesn't need to talk to the network right it's a completely it's a self-contained package right
and then suddenly this new version came out that was downloading an executable file and then running chmod to make it executable, right?
Wow.
That's a shell command, right?
And then it runs the file and then that talks to the network, right?
So you have all these things like shell, file system, and network that are in a user agent parser.
That's like a pretty dead giveaway that something's changed here.
Maybe you don't update to this version quite so eagerly.
Yeah, so that's the kind of thing we want to catch
and highlight and draw people's attention to.
Let's probably jump in the gum a bit,
but how then do you scan all this open source?
It must be an expensive activity.
How do you do it?
What's the process?
Give us a walkthrough of how you pull down new code,
pay attention to new code,
pay attention to, are you only paying attention to master or main?
Is it simply a pull from GitHub?
What's the mechanics?
Yeah, so we have a pipeline that can do analysis tasks
and we feed in npm packages and so you know we do this on every
package published that happens on npm so we have a program that's tailing npms following all the
publishes in real time and then whenever a new package is published we download the tarball
you know we did we save the the data about it the metadata and then we we then kick off our analysis
job to kind of give us,
basically to run all these tasks that we've,
we've basically written, like I said,
about 70 of these analyses for a package.
And so we kick that off, it runs,
and usually it actually takes,
we're actually quite efficient.
Each one is taking us, let me,
like five seconds or 10 seconds for a package.
And then packages might have transitive dependencies,
so we might be doing this on,
however many things
they depend on, we're gonna have to analyze those as well,
because we don't wanna just analyze the top level package,
that's gonna miss anything nefarious that's added
in a dependency of a dependency.
So we run this on all of them,
and then we save the result, and that's pretty much it.
And we designed it in a way where
we can also do this lazily,
so we didn't need to sort of sit down
and just kind of like crunch through in a batch job. We didn't need to like crunch through the
entire registry. So we actually have the capability to wait until a user visits the page for a package
or requests the score for a package or, you know, does a lookup to actually then run that analysis.
And so we can actually do it lazily if we need to. So're doing that now in some cases for the really, really not popular stuff.
We're not going to necessarily go and run an expensive analysis on all that stuff until
someone looks it up.
Then we'll do that in real time for that package.
We've done it in a really cool way where we actually built it as, we did our own custom
pipeline system.
We didn't want to use something like Apache Spark or whatever, which requires you to use Java and is kind of slow and clunky and has a little bit of latency for running these jobs.
We did it with our own JavaScript pipeline that we wrote, and it actually can cache the intermediate results of these analysis tasks so that you can build a task that depends on other tasks.
So say you have one task is download the code for this package, right? You can cache that forever. Once you've done it,
there's no need to download that tarball again. It's not going to change. You know, a version is
immutable and then you can have a package or a task above that takes in the tarball and then like
untars it. Right. And then you can cache that forever. And then you can have a task above that
that takes in the result of that and parses it into an AST, you know, for the JavaScript,
and then you can cache that. And so you can kind of construct these tasks can call into other tasks.
But then when one part, when one subtask is done, it may need to never be run again,
right? Whereas like maybe the top level analysis we're doing that actually might change more often.
And so we can change that freely without having to worry about re-computing or redoing all that work below if that makes sense
you can think of it like a tree structure basically so it's kind of nice uh and then we can
store these mutable blobs into um into uh you know a storage system that uh can store them forever
i don't know if that was too much information but i think it's cool it really it's it's no it is
cool it's a cool it's a cool cool advantage that we have with our design,
and it helps us run this stuff lazily in real time
and not have it be too slow.
I'd imagine you want to do it lazily until,
once people see Socket as a proactive security source,
you may not want to lean on lazily.
So maybe while you're proving the model,
startup mode, let's just say that's okay but like in the future once you become the beacon of light for security
which we believe i believe will happen because i believe in you for us and i like what you're
doing here i think then maybe you throw lazily away because and but then maybe that's more
venture capital maybe it's you know a larger user base maybe it's more venture capital. Maybe it's, you know, a larger user base. Maybe it's an acquisition, who the heck knows.
But, you know, this is,
you're doing something that hasn't been done before,
you know, and that to me,
like I think the recipe for nefarious activity
is uniquely, uniquely done here.
And the way you think about end-to-end security
and the way you think about certain bits is just unique.
So I love the fact that how you just like mapped it all out.
That totally didn't make sense,
but it also made lots of sense.
I didn't completely track you,
but I'm also over here just typing in package names,
trying to find out how good my stuff is.
Maybe my explanation wasn't very good.
No, no, no.
I mean, I was half in and half out.
So that's my excuse for not following.
This is super cool.
One thing I did notice is when you get to a package,
it's socket.dev slash npm slash package.
So as an information architecture nerd,
I'm noticing there's this npm subdirectory feel going on,
and I'm hoping that means you have future plans
to expand beyond the JavaScript and npm world
and provide similar services for other ecosystems.
Go, Rust, Ruby that is that on the
roadmap too totally yeah i mean this the problem of supply chain attacks is not javascript specific
it's it happens in all all the ecosystems it's just that usually javascript experiences the
problems that other ecosystems experience but like a couple years earlier because javascript
javascript is just like a little bit bigger and a little bit
more chaotic and it has a little bit you know fewer more beginners in it than other ecosystems
because there's so many newbies always learning and joining and so it ends up kind of hitting the
breaking points a little bit sooner than other ecosystems so that's kind of why we started there
plus i just you know i like javascript so i wanted to start there but uh no we're gonna do all of
them eventually and uh i think i mean mean, honestly, Python and like Rust
are kind of my personal, the top of my personal list.
But there's a lot of people asking us already
if they could use it for Java at work, you know, or Go.
So we'll see what we prioritize.
I guess if you're interested in using this
for a different language and you want to reach out
and let me know which one you really want to see,
we can maybe use that as an input to decide what to prioritize.
But yeah, a lot of the stuff around the maintainer behavior,
that stuff we can apply pretty directly to other ecosystems.
But the specific static analysis for each language
is going to need to be redone.
So there's a little bit of work there, but we can do it. But the specific static analysis for each language is going to need to be kind of redone.
So there's a little bit of work there, but we can do it. Yeah, I imagine each new language will be a separate lift
with some separate tooling and analyzing.
And usually in tools and languages that support that given ecosystem.
So we'll probably have a diverse set of skills and or engineers
by the time this thing is, you know, worldwide and global.
Yeah, yeah.
We're going to need, there's a lot of work to do, that's for sure.
Are you up for that?
I mean, that sounds like a huge undertaking.
Yeah, but I think it meaningfully makes security better.
I mean, I think that, like, you know, people need to, we need a mindset shift in the industry for how people think about their dependencies.
Dependencies are not this magical thing
that you can just show up and use as many as you want
and there's no downsides and there's these magical...
It's not this infinite buffet of open source
that you can just take and then there's no costs to it.
Eventually, you will pay a cost for it
and it's just a matter of
it's not a matter of if it's a matter of when and i think that automatically looking for changes in
a package that are nefarious is like a very low cost no-brainer thing to do to help with the
problem and so i don't yeah i don't i don't think that uh there's really anything better than this
that i could be working on to improve the security of the ecosystem.
I mean, what else?
Yeah, I just want to make a difference and make this stuff more secure.
And I'm trying to think, I wish that we could all just agree
on the best packages to use and we could vet those
and we could bless them and call them a standard library that we just use.
But with 1.8 million packages on NPM
and millions more on all these other languages,
it's just too much.
We're not going to ever be able to read it all.
So we need tooling to help us.
Well, too, when you involve a human in this scan analysis,
it's fraught for error.
It's going to happen.
So like we said, with reading a certain sentence,
you can read it without the vowels in it
and things that happen when an individual sits down
and reads a bunch of code.
It's just, it's too large of a undertaking to read today.
The code was produced today,
let alone tomorrow and the next day and the next day.
It's just not going to happen from a human perspective.
What I think is quite beautiful though,
is how this came out of wormhole.
You know, where you had this mission to be security-minded.
You vetted your packages, or lack thereof, very aware of your security footprint.
And to desire a tool like this, and for it to not be in place, for you to then build it.
And I don't know which one will be bigger.
I mean, I have some bullish ideas about Wormhole itself.
I'd love a better Dropbox.
I love Dropbox.
I think they're great, but I think that they've sort of gotten, I don't even want to call
them lazy by any means.
I think there's a bunch of great people working there, so I don't want to belittle their work
by any means.
But I feel like there could be some good directions for the product, and they've kind of lost
their way.
And this idea of end-to-end security and what you could do with Wormhole really impressed
me and piqued my interest.
But I'm wondering if, like, because of just simply the size of Snyk, for example,
hundreds of millions of dollars in venture capital raised, you know, billion dollar company
from when I last checked evaluation. I mean, that's the kind of potential that you have here
with Socket. Is that, I think when Jared said, are you ready for that? Are you ready for that too?
Because I mean, that's, if you keep going this direction, that's what's going to happen.
Yeah, it's a huge opportunity for sure. I mean, I don't think Snyk solves this problem today.
They just do vulnerabilities and, you know, we need to actually look at what's inside the packages
and go beyond that. So yeah, I mean, I would love for this to be the next Snyk, you know.
I think we're on the track to do that.
I think if people want to be early adopters and use us in our current form,
then we'll grow into a complete solution eventually that will compete with Snyk and do the job better.
Right now, it's very focused on this thing that we do differently that's uniquely differentiated.
We actually analyze the package and look for these issues in the code of the dependency and tell you whether, you know,
you need to worry about the dependency or not. So, but eventually I think we'll grow into a full
kind of thing and do all the stuff that a solution like Snyk does. But yeah, I know it's, it's,
it's, it's going to be a journey, you know, for sure. And I think, I mean, I think we want to
keep, we'll, we'll keep working on wormhole as well, but I definitely think socket, you know for sure and i i think i mean i think we want to keep we'll keep working on wormhole as well but i definitely think socket you know this this security thing is is actually a
bigger solution because it's a problem every company has the world at large the world has
yeah and it's something that i feel like we're uniquely suited to do as open source maintainers
and developers ourselves like it's it's it's like a a rarer set of like understanding and kind of skills that it takes
to build something like this well.
I think a lot of like the tools that you see in this space
are like made by these kind of outsiders
that kind of come in like,
oh, we're gonna tell everybody how to do security.
And then they kind of impose this like top down tool
on everybody, on all the developers
and kind of tell them how it's gonna be.
And like, no one likes to use it and it's annoying.
And there's all these false alerts and stuff.
And we as maintainers ourselves building this
kind of understand the burden of all this stuff,
this tooling and these false reports and all this stuff.
And we know what developers want to use.
We're developers ourselves.
And so I think we can really build something that's good here
and really meaningfully improve security for people.
Touch on that then. What's the adoption story? If someone's listening to this and they're like,
I want to, okay, fine for us. I believe in you. I believe in what's going on here.
And maybe it's just JavaScript at the current time point. So NPM packages. What's the adoption
story? Walk us through that. Yeah. I mean, so Socket's super easy to use right now. You go to
the website, you click install GitHub app,
you select the organization or the repo
you want to install it on,
you click install, you're done, that's it.
So it's really easy.
There's no configuration files.
You know, I made standard JS,
so I'm a fan of no configuration by default.
So it's really easy to get started.
And once you install the GitHub app,
it will monitor your
package JSON file for changes and any pull request that adds a dependency or updates a dependency.
We'll analyze that. We'll figure out what exactly is changing, you know, and we'll tell you in a
comment on that pull request, anything you need to know. Primarily, we're starting with these
typos squats today, but we're adding more and more of the 70 detections that I mentioned earlier into the bot as we're confident
that they're not going to be noisy. So we're starting with typo squats because those are rare
and they're always important to see. And there's like never, you know, never a really a case where
you don't want to see that you're, you may be installing a typo, but some of these other things
we've talked about are, or we're integrating those into the GitHub bot over time.
So if you install this today, over the coming weeks,
it will continue to grow and support more and more
types of supply chain attacks that it can detect and stop.
So that's kind of the plan there.
What about price point?
So it's totally free for open source repos.
And then for private repos, it's also free right now.
But I think eventually we're definitely charging for that
because that's sort of the model
that a lot of these kind of tools use is,
you know, if you have a private repo,
we want to charge you for that.
And so, you know, we want it to be like something
that everybody can use and be affordable.
I think we'll probably even do a thing where,
you know, like if you have five users or less,
then you just also get to use it for free, even if it's a private repo.
Just because we want small teams to be able to use it.
But I think if you're like a 30 person,
you have like 30 people working on a private repo together,
or hundreds or thousands or whatever,
then you definitely will have to pay for it eventually.
That's kind of the plan.
So business model pending, basically.
There's some thoughts and inroads,
but business model pending.
No, I mean, I think we like the model of,
do you remember Travis CI?
Everyone's kind of switched to GitHub Actions these days,
but I really like this model of free for open source
and then paid for private
because it encourages people to open source stuff
as much as they can.
And it really charges the people who can kind of afford it.
And it gives it away to the community.
So I really like that model for pricing and I think that's kind of what we'd want to do.
It makes sense because if you're going to be working on something privately,
it's probably proprietary software that you're making money off of.
And so you can pay some money to make some money,
but if you're willing to open source it and let it be in the world,
or if it's for other people as well as yourself,
then it's open source already and it's free in that sense.
So I think it scales alongside
usually the way people make money with software.
So I think it's a good model.
I think I should clarify pending then,
because instead of pending, I'd probably use still in the works.
Still working out how it will actually play out
in terms of the metrics and heuristics.
It's mostly in place, but not so much pending.
So I take that back.
Thought through, but more work is happening.
Any active business is probably working
on their pricing at all times, right?
Yeah, I feel like early startups
basically always change their pricing
every three to six months in the beginning
because they're just trying to figure it out. Yeah. This episode is brought to you by our friends at Retool.
Retool is the low-code platform for developers to build internal tools.
Some of the best teams out there trust Retool.
Brex, Coinbase, Plaid, DoorDash, LegalGenius, Amazon, Allbirds, Peloton, and so many more.
The developers at these teams trust retool as the
platform to build their internal tools and that means you can too it's free to try so head to
retool.com slash changelog again retool.com slash changelog and also by our friends at work os
work os is a platform that gives developers a set of building blocks for quickly adding enterprise-ready features to their applications.
Add single sign-on with Okta, Azure, and more.
Sync users from any SCIM directory.
HRIS integration with Bamboo HR, Rippling, and more.
Autotrails.
Free Google and Microsoft OAuth.
Free Magic Link sign-in.
WorkOS is designed for developers and offers a single elegant interface.
It abstracts dozens of enterprise integrations.
This means you're up and running 10 times faster.
So you can focus on building unique features for users.
Instead of debugging legacy protocols and fragmented it systems,
you get restful endpoints,
Jason responses,
normalized objects,
real time web hooks, a developer dashboard,
framework native SDKs. And even if your team is not focused on enterprise right now,
you can still leverage WorkOS so you're not turning enterprise away. Learn more and get
started at WorkOS.com. They have a single pay-as-you-grow pricing that scales with your usage and your needs.
No credit card required.
Again, WorkOS.com.
They also have an awesome podcast called Crossing the Enterprise Chasm, and that is hosted by Michael Greenwich, the founder of WorkOS.
Check it out at WorkOS.com slash podcast. so up until wormhole all of your projects your entire life have been open source projects.
Wormhole was when you shifted strategies a little bit.
You talked about that on JS Party.
I'm assuming since Socket's a startup as well, that you're keeping this one closer to your chest.
Or is Socket going to be open source?
What's the story with Socket itself?
Yeah, we're going to open source as much as we can of the application but i think we need to have some part of this be like a server side
component that because we're doing this analysis on you know the full npm data set and it's like
15 terabytes of metadata and in order to actually like look at the maintainer behavior and figure out what's going on across all this
metadata, we need to do some of this on the server with access to the full data set. And so we're
going to make APIs for all this stuff and make it available, but there's just no way you could do
that locally. So that's the kind of stuff that there's not really an easy way to open source
that and make it actually useful to people. But then there's other stuff that we can open source
that you can, analyses
that you can run on a package
locally, and we'll try to open source
as much of that as we can. But yeah, for now
it's primarily
all this processing happening on the server side, and so
it's an API we'll provide
for free to people, but yeah,
the code is proprietary.
Are you hiring? Yeah, yeah, definitely hiring. Yeah. Yeah. We have a team of always. Yeah. We have a team of five right now.
It's really cool. It's all open source maintainers. We have really cool, like working with really
cool people. It's awesome. Yeah. We have just different node maintainers that you might know,
McCall, I Senko, Brett Combs, Alex Morais, who was a co-maintainer with me on WebTorrent. And
then John Heisey, who also did WebTorrent and Browserify.
So yeah, it's just like
pretty cool crew and
yeah, I think collectively we have like a billion
NPM downloads a month or something crazy
like that.
But you know, NPM downloads are all inflated anyway.
They're all just like CI bots.
CI bots, yeah, pretty much.
It still feels good.
It sounds good, yeah, it feels cool.
It feels cool, doesn't it?
That's how JavaScript developers' badge of honor
is their NPM download number.
Yeah, it's super funny
how many of those are downloads from CI bots.
I would say it's like 100x inflated probably.
What do you think?
1000x maybe.
I think 100x is fair.
I wouldn't be surprised if it's 1000,
but I think 100 is definitely in the order of magnitude.
10 would be.
10x is not enough.
10x doesn't do it.
But still.
Yeah, I think even if you just publish a package
and no one downloads it,
you automatically get like 500 downloads
or like something just from like all the...
NPM just hands them out like Oprah Winfrey.
You get 500 downloads.
You get a download.
You get a download. You get 500 downloads. You get a download.
Just publish a package.
You just get 500 downloads for free.
Hilarious.
No, I mean, part of that is us, right?
Like we're downloading every package.
So like we're adding to that number, right?
So yeah, exactly.
So our analysis engine, I mean, it's actually a lot of worker servers that are running now
that are like chugging through all these packages.
We're trying to do what Adam said
and pre-process as much as possible
so we can proactively catch issues.
And so we're still scaling that up right now.
So it's like right now it's hybrid.
Some of it is pre-done and some of it is lazily.
So do you have a stash of cache
that you're just burning through
as you run these EC2 instances or whatever,
wherever you're running your backend? Are you just spending money right now or what?
It's not too bad right now. We're spending, it's in the thousands of dollars
per month on hosting. But yeah, it's not
totally going to bankrupt us
in the short term.
Could you future cast for us a bit? What would, I was going to ask that.
What would happen?
Let's see if our questions
are the same then, Jared,
so you can have your own future cast
if it's not.
Okay.
When you move from JavaScript
to other ecosystems,
what would happen
to make you feel like
you're going the right direction
to take on Rust, Go, etc.?
Python.
Python.
What do you mean by that?
Well, what would happen
between now and then?
Like,
so you've got some,
some assurance that you're going the right direction.
What would happen with the platform to make you feel like,
okay,
now is the right time to take on the next ecosystem.
What would have happened?
Yeah.
I mean,
I think I'd rather,
I'd rather focus on,
on doing one ecosystem super well and,
and trying to like be the best at JavaScript before we go on and just
try to try to do breadth for breadth's sake. You know what I mean? So I think, I think like right
now we're doing all this stuff in JavaScript that no one else is doing. We can catch all these
issues. We're looking at all this stuff in a way that none of the other tools are, but I think
there's even more we could do in JavaScript land. Like, you know, even before we, we move, move
outwards to other languages.
There's so many things you can do with taint analysis and analyzing data flows through these modules.
There's a lot more complicated analyses that you could do.
Some of this stuff is really going to help unlock
catching even more issues in the future.
Well, all software has licenses though, right?
You have licenses, you have maintainers,
you got certain things that are sort of like
at large open source, regardless of ecosystem, right?
Maybe there's a way you can sort of
carry a certain feature set for everyone,
but maybe you go deep on JavaScript,
but like surface on a majority.
Yeah, there's definitely a lot that is an overlap.
I mean, a lot of the stuff around repository health is reusable.
Stuff like, is the package maintained?
Does it have a security policy?
How are the maintainers doing?
Are they active?
Are they inactive?
What's the health of this thing?
Is it published by a trusted source?
Is it a typo?
All that stuff is pretty reusable.
That's for sure.
Yeah, that's true.
That's why I asked that question, because I see this
as being highly useful, and
it does give me pause to hear you say
you want to go deep on one particular ecosystem
right now. I can understand
why. I can understand your desires for it, but
that almost reminds me of perfection
versus progress kind of thing.
It's like progress is sort of like a
base layer for all open source, and
perfection would be going
deep on javascript i could be wrong i just i guess what i mean is i want to make sure that it's it's
useful to people before we move on and try to sort of boil the ocean so it doesn't necessarily mean
that we need to have the same feature set in all languages like you're totally right it's probably
better to provide some value to people who write Python or Rust or whatever language, rather than just
telling them, sorry, come back later. But I want to make sure that what we've shipped in JavaScript
is really solid first. So that's kind of what I meant. Like, we're still taking feedback from
people who are using it and, you know, making sure that the signal to noise ratio is right.
And, you know, we're still adding these new detections to the GitHub bot. So we still have
a little ways to go in JavaScript
before we get into those other languages later this year.
I think if nothing else,
it will be a useful operational structure of your business
as your engineering team grows
to break out based on language support.
And so at a certain point,
I mean, you have all JavaScript devs now.
And so at a certain point,
you could easily say, well, here's our team
that works on the NPM ecosystem. Here's our team that
works on the Python ecosystem.
Did you say that that feature where you're detecting
like, hey, this package now has IO
that it didn't have or now has network requests that it
didn't have, is that done and baked? It's in
there or is that something you're working on? That's something that
we're working on for the GitHub app. So that's
not quite ready yet. I mean, like the stuff
like that, I don't think is going for perfection.
I think that's going for like,
this is how we are different than other people.
That's super high value.
And I think that's like depth first versus breadth first
as opposed to trying to make it perfect.
So I think you're on the right track there.
Because I think the other stuff is nice
and I like it especially,
I really like just browsing the website and just vetting different packages
because it just makes me feel smarter than I used to be.
To be like, wow, now I know about this.
It's kind of like when maybe your son or daughter brings home a potential suitor
and you're thinking, okay, what's up with this fella or this young woman?
Maybe I'll ask their parents.
Maybe I will run a background check.
Maybe I'll see what their stance is on some things.
It just feels like you can just vet a package
and have a better idea in an instant.
It's super cool.
But I think that ongoing stuff,
as long as it's not too noisy,
of letting me know, hey hey you have this charting
library in your package and all of a sudden it's like calling to a third party it didn't used to
yesterday but now it does i mean that's the kind of stuff that really saves your tail
yeah and we have uh we found like surprising instances like that already like there's a
there's a package called angular calendar which does, it's a calendar widget for, you know,
your website to pick a date.
And it does like a bunch of stuff you wouldn't expect,
like shell scripts, file system, network access,
install scripts.
And when I saw that, I was like, what is going on?
This is a web component.
Like, why does it need to like run some shell commands
on my computer when I install it? And so I got really suspicious about it.
And the cool thing is when we find these alerts, we actually link you directly to the line
inside the package that triggered it so you can see what exactly it's doing.
Was it legit?
It wasn't outright malicious, but it was definitely worth pointing out what it was doing.
It wasn't the best idea ever, that kind of thing.
Yeah, it's...
It can reduce your trust in the package, really.
More than the security issue.
I'm really conflicted about it because it was...
So it uses this dependency that does analytics
for the maintainer to figure out who is using their package.
And they're gathering kind of information
about the environment that the package is running in,
basic information about the...
Controversial take.
You know, Homebrew went through a big ordeal when they added that kind of a feature.
Exactly, exactly.
So it's a useful feature for the maintainer.
Yeah.
But I would say it's a little bit invasive,
and I could see how some people wouldn't want to do that.
And so they would want to know that there's a way to opt out,
or they kind of would want to know that the package may reach out to the network and do this kind of network request.
So I would say our tool served its purpose.
It pointed out this to us and we could look at it and decide for ourselves if this calendar
widget is something we want to use in our app or not.
But I feel conflicted because I also want maintainers to kind of get paid and to kind
of know who's using their package so they can know who to reach out to.
And so it's one of these things where every company is going to want to make their own decision about, do we allow this in our company?
Do we want to allow packages to do telemetry or not?
And so that's something actually down the line that I think we could support as an additional feature is setting a policy for your organization.
Like, do you allow packages to do telemetry? Do you allow
packages to use install scripts or not?
And enforcing that so that a random developer at your company can't
necessarily go in and just add something that's against the policy of the company.
So it's like a linter for your
dependencies, basically.
Well, can I do my feature cast now?
Yeah. Was mine the same as yours, Jared, or what?
No, slightly different. So I'll ask it.
Okay.
So walk five years down the road, turn back and look. You got Wormhole and you got Socket.
Which one's bigger? Which one's more successful?
I think Socket.
Why? which one's more successful i think socket why well i think because it's enterprise software
it's it's just a lot it's a lot more straightforward like what to build next because
you have customers that are telling you what they want uh and so i just think uh that that's a really
nice thing about um building something for paying customers instead of for you know i guess consumers
can be paying but um it's a little different you, when you're going for scale and you're going for mostly free users.
So I think I just have a little bit more, not a little bit, a lot more confidence that Socket's
going to be, you know, the big hit. So let's imagine a world where you're correct and Socket's
a big hit. Does Wormhole just, can you kill your darling? Does it go by the wayside? Do you open
source it? What do you do in that world? Do you keep a team working on it for the love? Yeah, I think we're definitely gonna keep
working on it. I mean, it has, like I said, it has hundreds of thousands of people using it every
month. So there's no reason that we would shut it down. It's not that expensive to run because most
of the data is peer to peer. We store the files for only 24 hours. You know, the end encrypted
files, we store them for 24 hours. So the data cost is pretty low. So there's not really, there's
no reason to not continue running it. So yeah's not really, there's no reason to not
continue running it. So yeah, I mean, if we were to consider like shutting it down or whatever,
we definitely open source it before we did that. And we might even open source it anyway,
just proactively because, you know, it's, it's a thing, it's useful for people to be able to like
run their own instance of it or whatever. Yeah. So what we're thinking about it, I mean, it's just
a matter of time. Like someone has to go and do that. And then there's the burden of pull requests and issues,
and then you've got to run a community and stuff.
Yeah, totally.
That's kind of the other thing in the back of my mind is like,
I don't necessarily want another thing to maintain.
The other part of open source.
What's about how much attention you can give to things happening in your life?
At some point, you've got a certain amount of RAM to devote,
and focus is a superpower.
By the way, the other thing I wanted to mention,
if you're poking around the site,
another thing you might want to look at is,
if you go to the footer, we have this cool page called Removed Packages.
I don't know if you saw that.
I did. I tried to look at it.
Yeah, the UI is a little iffy on it.
But what it does that's really interesting is,
so remember how I said we save every package that's published to NPM
as soon as it's published?
Well, what you can do if you do that is you can actually then see
when NPM takes down a package,
it actually gives you a nice way to highlight the sketchy packages.
So you can just see what are they taking down,
and then you've already saved a copy of it.
And so if you poke through our site there,
you can actually see examples of malicious packages
that NPM's taken down.
So this is stuff that could have been reported by anybody
or maybe they take it down proactively.
But you just see the amount of stuff that's on there.
It's like thousands of things.
Some of it is spam.
Some of it is malware. Some of it is pen testers. Some of it is just completely obfuscated blobs of code.
It's like, who knows what it does? Like it's just all kinds of, you can find all kinds of interesting
stuff poking around through there. So yeah, if you're curious, it's really interesting to just
click around and see, see what's in there. I'm doing it right now. Sorry. I'm distracted
looking at this feature. This is pretty cool. I mean, especially for people who are just curious
what's going on. Airbnb-Feejax has a bunch of versions.
Now they're all gone.
It's like a package JSON and a distribution folder.
And yeah, some real shady code in there, it looks like.
If you go into the dist folder, you can see uh what it's doing
and yeah it's it's it's really it's it's it's sometimes you see these ones that have company
names in their package names and that's uh that's a dependency confusion attack that's where right
um that's where a company has their own internal npm packages that they publish to their own private
npm registry inside their company but then then if they don't register that same name
in the public registry, then an attacker can go and publish something there. And then if their
tools aren't careful, they may accidentally install the public version instead of the private
version. And so the attacker can use that to basically get code into, you know, into the,
into like the Airbnb app in this case. So I don't know if that's exactly what happened here, but usually these company name ones
are something to do with that.
Yeah, this one is certainly some kind of
Ajax wrapper library that allows you to
do some sort of Ajax calls.
I don't see anything immediately that's like,
oh, and now it's phoning home here or doing anything,
but I'm also just scrolling through the code.
So some of these
things are harder to see but yeah definitely seems like it's uh attacking what is a real airbnb
library oh i mean i see i already see the part where it phone phones home here it's it's kind
of it's kind of obfuscated at the bottom of the file there's uh this dot fetch line and you can
see x xml http request i don't know if you click the same version as me because there's multiple versions of this but it's sending it's sending some data yeah see a few fetch calls
there i do see the one in line 278 but now we're getting way deep into the weeds on this file
let me pull this out of the weeds a little bit then so what you're doing with socket is not
prevention it's awareness right because i'm looking at like the post you share with us as part of your thesis for pitching the show to us.
And it's like colors and faker breaking thousands of apps, library hijacked.
It's still user passwords, crypto mining installed.
So you're not going to prevent those things.
You're going to make the open source users, the devs aware of what's happening in their repos. You're not,
at this point in time at least, preventing. That's NPM's job.
No, I don't think so. I mean, obviously it is NPM's job. I'm not disagreeing with that part.
No, but I think that NPM, you know, historically hasn't been able to stop these things before they happen.
So all these attacks, October, November, January, the ones we've talked about,
they were all on NPM for hours before someone caught them and took them down.
And if you look at research that was published last year in various security conferences, they find the average malware is on NPM for like 300 days before it's taken down.
So for whatever reason,
NPM is not taking this stuff down quickly.
In some examples, they do take it down within hours
if it's like a really big package
and people find it and all this stuff.
But there's so many instances
where it lasts on NPM for a little bit longer than hours.
And so I would love for them to be able
to do this stuff faster and I'd love to work with them.
If we find stuff, we're definitely reporting it to NPM
and getting it taken down for everybody.
But I think there's definitely a space for Socket
to actually prevent this stuff because if you're getting
a Dependabot pull request, let's say, right?
Dependabot's trying to update you to some new version
that just came out yesterday, right? It's very easy to look at that PR and say, okay, my tests pass and, you know,
the change log looks pretty good. It looks like they fixed some bugs. Let me go ahead and just
click merge, right? That's what I do. That's pretty much what I've always done is I just kind
of, you know, hope for the, you know, it's probably fine. Like the change log looks good.
The tests pass. Just trust the system. Yeah. So that's the system. Click the green button, right?
Sure. And what Socket could do in that instance is it could tell you
before you click that green button, there's a comment
here we posted that says, hey, this
package is now doing X, Y, or Z.
And then that might make you think
twice before you click the button if it's something...
But that's still awareness, though.
It still lives on NPM. That's what I mean by
awareness. Because my original question before I even asked it, I answered it in my own head, which is, okay, if you've got this list of nefarious things that have happened out there, does a future with Socket in it prevent them?
And my initial answer is probably no.
Just know what you've described what you're doing so far.
It's more of an awareness to developers before they click the green button and integrate or install a package, etc.
It's not a prevention system.
It's going to live on NPM.
You're not preventing it from existing.
But you may have tooling that NPM can use as prevention.
But so far, you're awareness.
Yeah, I think that's fair.
Yeah, I think we want to get to a place where we're actually sending our insights to them in real time so that they can take stuff down and you don't have to use the socket GitHub app to be protected.
Because, yeah, you are right that if we do flag something as being a suspicious update and we warn people, the next step is like, well, okay.
Broadcasting that.
Broadcasting that information, exactly.
And maybe we don't want 20 different teams to all get the same comment that says, hey, maybe you should look
at this update. You know, we should, someone should just look at that update and say, oh no,
this is actually bad and then block it for everyone using the GitHub bot and also get it taken down
from NPM in parallel. So that's kind of where we need to go to is to be able to not, you know,
make everyone duplicate all this work. But in the meantime, obviously, we want to still give people the tools
to see these suspicious updates
and do something about them on their own.
But there is an element, eventually,
we're going to want to not have everyone duplicating that work
and we're going to want to just summarize it for them
and say, yes, it's true that this package everyone uses,
like React, is now doing this kind of a new thing,
but we already looked at it and it's fine.
And so for most people,
we could suppress that information
and not bother them with it
if it's something that we think is expected.
Yeah.
I almost imagine a world where there's a future
with you having a team
that maybe does that on an ecosystem's behalf.
Maybe one, three, five, a small team.
I'm not sure how big the team needs to be.
I'm not trying to describe your company or how you should hire or grow.
But I can imagine at some point these threats become so important to broadcast and to potentially prevent where maybe now you're sort of in the awareness arena, but you could skew into the prevention and a more corporate prevention rather than just simply an individual team awareness piece you know what i
mean where you can have that alert bubble up maybe it's to an internal team initially and you have
them do some of their sort of deeper analysis things you don't want to this is their noise to
deal with that turns into signal for more people so you can have sort of a higher signal threshold
on a user basis because you've got an internal team kind of dealing with the noise potentially.
Absolutely, yeah. Absolutely.
Well, I'm excited for us.
I'm always excited for your mad science activities.
I think what you've got going on here,
it definitely gives me hope.
I think what you're doing here certainly gives us,
I would say, a more sound footing on open source.
You know, if open source has one, which it has,
and if open source enables, you know, two people to build,
you know, a potentially billion dollar company,
or in the future, the 10 people can do that,
if that finally comes down to that number,
you know, then open source is totally a part of that.
And securing that commons is a great
endeavor. And I applaud you for doing it. I'm so glad that you've, you know, through all the
iterations of your skillset and your career. I mean, I know you don't see yourself like that,
and maybe you want to self-deprecate, but let me just say, I know Jared shares my feelings too,
that we believe in you. We think you're awesome. We think you're doing something cool. And I'm
so excited for Socket in the future. So don't stop. Keep going.
Yeah. You guys have always been the most supportive. So it means a lot to hear that.
And, and you know, always get your encouragement and support on things. Yeah. I think I think that
we're onto something special with this and definitely want to help make the whole ecosystem
more secure for everybody. And I think it's important for the future of open source too,
you know, without, without if, if, if this stuff keeps accelerating as it has been you know, more secure for everybody. And I think it's important for the future of open source too.
If this stuff keeps accelerating as it has been,
I think that the trust will suffer
and we're just going to get to
a situation that's untenable eventually.
So better to be proactive about that
and do what we can now.
Well, we talked about how many downloads
there are per download, right? Like that order of magnitude. But talked about how many downloads there are per download, right? Like
that order of magnitude. But think about how many vulnerabilities there are per vulnerability that
we know about, right? How many exploits there are that we don't know about. And tools like Socket
are going to help us know about those things faster, sooner, better. And I agree, it has to
happen. It's getting more and more dangerous. The stakes are rising.
$20,000 for an exploit that'll get you,
that wasn't even an exploit, for a password
that will get you the keys to the kingdom.
I mean, serious money getting thrown around,
state actors, it's never been more serious of a game.
So I'm thankful we have you on our side, Firas.
You're out there fighting the good fight for all of us.
And I also hope you have lots of success with this.
I agree.
For us, it's been awesome.
Thank you so much.
Yeah, thanks, guys.
I really appreciate it.
It's been an honor, as always, to be on your shows.
Thank you.
That's it.
This show's done.
Thank you for tuning in.
Yes, we got the mad scientists back.
I always love hearing about Firas's big ideas. And more importantly, I'm excited about the future
and the security of the open source supply chain. I think this is a very ambitious goal for Firas
and the team there at Socket. And I, for one, am going to be paying close attention to this. If you have any thoughts, any feedback, any desires from Socket, Faraz, others involved in this,
please let us know in the comments.
The link is in the show notes to discuss.
And for our Plus Plus subscribers, number one, thank you.
And number two, there is a bonus after this show.
So if you see some extra content there, that is why.
And for those on the Plus Plus feed,
check it out at changelog.com slash plus plus.
Skip the ads and support us directly.
Again, changelog.com slash plus plus.
And last but not least,
I want to thank our friends and partners at Fastly.
They get our CDN back.
You know them, you love them.
We're fast because of them.
Check them out at fastly.com.
And break messages still under those beats are banging. You love them. We're fast because of them. Check them out at fastly.com. And Breakmaster's still into those beats.
Our banging.
Thank you so much.
Loving the new beats.
Always, always, always.
Breakmaster, you're awesome.
Thank you.
And that's it for this show.
Thank you so much for tuning in.
We will see you next week. Game on.