The Changelog: Software Development, Open Source - Let's archive the web (Interview)
Episode Date: November 27, 2024Nick Sweeting joins Adam and Jerod to talk about the importance of archiving digital content, his work on ArchiveBox to make it easier, the challenges faced by Archive.org and the Wayback Machine, and... the need for both centralized and distributed archiving solutions.
Transcript
Discussion (0)
What's up friends, welcome back, this is the changelog.
Software moves fast, so keep up.
On today's show we're joined by Nick Sweeting, the archive guy, talking about the importance
of archiving digital content his work on archive
box to make it easier the challenges faced by archive.org and the wayback machine and the need
for both centralized and distributed archiving solutions nick also shared some cool stories his
personal experiences with internet censorship via the great firewall while living in china
okay we got lots to cover today.
A massive thank you to our friends and our partners over at Fly.io.
Yes, that's the home of ChangeLog.com,
and it's also the public cloud built for developers who ship.
That's us, that's you.
Learn more at Fly.io.
Okay, let's archive.
What's up friends i'm here with kurt mackie co-founder and ceo of fly as you know we love fly that is the home of changelog.com but kurt i want to know how you
explain fly to developers do you tell them a story first how do you do it i kind of change how i
explain it based on almost like the generation of developer I'm
talking to. So like for me, I built and shipped apps on Heroku, which if you've never used Heroku
is roughly like building and shipping an app on Vercel today. It's just it's 2024 instead of 2008
or whatever. And what frustrated me about doing that was I didn't I got stuck. You can build and
ship a Rails app with a Postgres on Heroku the same way you can build and ship a Next.js app on
Vercel. But as soon as you want to do something interesting, like as soon as you want to, at the
time, I think one of the things I ran into is like, I wanted to add what used to be like kind
of the basis for Elasticsearch. I want to do full text search in my applications. You kind of hit
this wall with something like Heroku where you can't really do that. I think lately we've seen
it with like people wanting to add LLMs kind of inference stuff to their applications.
On Vercel or Heroku or Cloudflare or whoever these days,
they've started like releasing abstractions
that sort of let you do this,
but I can't just run the model I'd run locally
on these black box platforms that are very specialized.
For the people my age, it's always like,
oh, Heroku is great, but I outgrew it.
And one of the things that I felt like I should be able to do when I was using Heroku was like,
run my app close to people in Tokyo for users that were in Tokyo. And that was never possible.
For modern generation devs, it's a lot more Vercel based. It's a lot like Vercel is great
right up until you hit one of their hard line boundaries. And then you're kind of stuck.
The other one, we've had someone within the company. I can't remember the name of this game, but the tagline was like five minutes to
start forever to master. That's sort of how we're pitching fly is like you can get an app going in
five minutes, but there's so much depth to the platform that you're never going to run out of
things you can do with it. So unlike AWS or Heroku or Vercel, which are all great platforms,
the cool thing we love here at ChangeLog most about Fly
is that no matter what we want to do on the platform,
we have primitives, we have abilities,
and we as developers can charge our own mission on Fly.
It is a no-limits platform built for developers,
and we think you should try it out.
Go to fly.io to learn more.
Launch your app in five minutes.
Too easy.
Once again, fly.io. we are here with nick sweeting a fullstack software engineer in Oakland and founder of Archivebox.io.
Nick, welcome to The Change Log.
Thanks for inviting me. It's a pleasure to be with y'all.
Pleasure to have you. You want to archive stuff. Let's archive stuff.
Let's be pack rats.
Let's start with why archive? I mean, isn't that just a lot of work and no gain? Why archive stuff?
Yeah, it's a totally valid question.
I think for most people, the answer is maybe you don't have to archive stuff and that's
okay.
Archiving is sort of a curation role and some people are drawn to it and some people are
not.
And I think that responsible archiving involves some amount of curation labor.
It doesn't have to be a lot of labor, but it's the labor of
choosing what's important and what is not. And that can be just for yourself. It can be for your
family. It can be for your friends. It can be for your academic institutions. But it is some labor
that you're taking on by deciding to preserve something. And just acknowledge that and pat
yourself on the back. And if you do decide to archive, keep in mind that it's not just a one-time decision. You're going to have to decide,
oh, do I move this data from this hard drive to the next one when it inevitably gets old?
Do I give this data to my kids? Will they care about it? Do I give it to a library? Where does
it go next? What should I do if someone asked me to delete it and they don't want it preserved?
And all of those things are sort of things that you have to think about but if you're excited about archiving you know don't
weigh yourself down with all of that just just save one or two things and see if you like it
when it comes to archiving the web or digital artifacts i'm not sure how broad archive boxes
ambitions are but i thought we had archiving the web kind of figured out,
like there was a whole group of people who are enthusiastic about it and still are enthusiastic
about it. Of course, I'm referring to archive.org, the Wayback Machine, and that entire operation,
which felt like the web's archive was in good hands. And all you have to do is donate to that,
those good hands, or support those good hands and hope that everything continues as normal.
But recently it seems like they've been going through trials and tribulations.
I'm not sure the exact details of who and why have been attacking the Wayback Machine
and trying to take archive.org either offline or somehow ruin it,
but it seems like maybe that's an assumption that is not well-based.
What do you think about that?
I think archive.org is doing an incredible job.
They're tasked with a really hard problem of doing this labor that I just described,
but at a massive scale for the entire internet.
They effectively become moderators for the entire internet.
Because if someone doesn't like the content that they've decided to preserve,
which is basically everything they can get their hands on,
they get personally attacked and they have to take the flack for it.
So it's a really, really tough position that they're in
as the sort of centralized curators of everything.
And inevitably, they're going to get attacked by people who don't like stuff.
And I think that they've done an incredible job so far,
but there's limits to a central moderation team
that has to be able to manage and defend
every piece of content on the internet from attack.
So they've undergone attack recently.
Do we know the motivations of these attackers?
Is it simply, we don't know yet?
Adam, do you know? Nick, do you know? I don't know. That's why I'm asking the question in earnest. we don't know yet? Adam, do you know?
Nick, do you know?
I don't know.
That's why I'm asking the question in earnest.
I don't know the answer to this.
They've actually been going through a lot of stuff.
I mean, they had not just like a DDoS attack
on a situation where you have somebody
trying to take it down or keeping the set offline.
They've had a major copyright case loss recently
where they were trying to archive things that,
you know i think
we as a society want these things to be archived and like you had said nick this might be part of
that curation aspect to like just us as humans wanting to preserve not so much to break copyright
but there was some breaking there so there's there's a point of breaking i suppose or a
breaking point with the internet archive where you've got copyright concerns, things like that.
They've had various versions of attacks that isn't just simply an attack or an attack vector trying to take it down.
It's beyond that.
I would say one thing about the copyright case, if you'll allow me a moment.
Yeah, please.
Their stance is pretty admirable. I think I originally was quite worried about it.
And I commented online and was like,
yeah, why are they risking the whole Internet Archive
to take this stance?
It seems like they should spin out a separate company
if they really want to fight the publishers on this.
And I talked to Brewster about it,
and I've sort of come around now.
Who's Brewster?
Brewster's the founder of Archive.org.
Sweet, okay, great.
Incredible character.
It's been his life's mission to make all of human knowledge available for everyone.
And I think he's doing a great job.
But his take on it was that he's personally wealthy from a dot-com era sale.
And he wants to do good things with that money.
And part of that is rebuking publishers when they start really
crossing lines around content ownership. And the archive.org is actually properly legally
structured so that these things are isolated. He's not risking archive.org and the internet
archive by doing this, by taking this fairly strong stance against publishers forcing content licensing as the only option upon ebook
readers. So basically, publishers were saying, we're not going to sell you an ebook anymore.
And this effectively makes libraries lending ebooks impossible, because you can't reshare
the license to an ebook. They want to charge for every view of the ebook. And so libraries can no
longer lend ebooks. And so he just thought that this was an
egregious line to cross. And he's like, okay, you know, as someone fairly well off, who cares a lot
about this, and who cares a lot about the freedom of information access for future generations,
I can afford to take a stance and lose sometimes on cases like this. And I think that this case
needs to be very publicly fought and won or lost. And it's not jeopardizing the rest of the internet
archive. I think that that message doesn't get out enough. So that, you know, they did the right needs to be very publicly fought and won or lost. And it's not jeopardizing the rest of the Internet Archive.
I think that that message doesn't get out enough.
So they did the right thing there.
And they have this software that does, it's CDL.
You may know this, Nick.
It's Controlled Digital Lending is what this program,
it's not just software, it's a program they had to allow this.
I wasn't sure of the details of which books.
I think it was mostly older books,
but it was essentially ruled that it was fought in the Second Circuit Court recently in September.
That's why this is so fresh in my brain, at least the details to some degree, basically concluding
that this practice of this controlled digital lending that the Internet Archive is doing,
it harmed the publisher's markets by
providing free digital copies of books and you know i don't know those specific details like
which kind of books were they new i mean obviously they're new that doesn't make any sense but if
they're older or it's sort of like almost public domain maybe that makes sense but you know
certainly if it's public domain it makes sense but yeah i mean i think at that point you don't have much of a leg to stand on in terms of the fight but you know i'm for you know freedom
of information i'm not for freedom of information insofar as it takes away a corporation's ability
to control their own work and their own financial destiny with the things they've helped create in the world as information. But there is a line there that at some point we have to adjust.
And I applaud them for trying to adjust it.
Yeah, I think they broadly agree with not depriving publishers of content ownership.
That's not really the issue they're fighting.
They're more fighting that the publishers crossed a line by forcing licensing as the only option for content access and that that was not where the line was before that they moved it.
And this is their way of fighting back.
And that this there's broadly been a sort of Overton window shift of what is acceptable content release policy in the first place.
And the publishers have successfully moved that to licensing only.
And you can no longer own anything.
And that's what they were fighting.
So yes, they did cross some lines
with the controlled digital lending,
where they were not counting how many copies they lent out.
And I think that they expected to get sued for that.
I think that they wanted to take a fairly strong stance there
by saying that the way that the publishers
are releasing the content in the first place is unacceptable. There's a, we can go 17,000 more
layers deeper on this. There is an article on the EFF, or I should just say EFF.org, the EFF
website, Electronic Frontier Foundation, that, you know, gives a few more details. There's
four different publishers, Hatchet, HarperCollins, Wiley, and Penguin Random House. And the stance basically was that these libraries have paid publishers billions, I'm quoting, libraries have paid publishers billions of dollars for books in their print collections. And they're investing enormous resources in digitalization in order to preserve those texts.
And they say the CDL helps to ensure that the public can keep access to those full books that they've bought and paid for, basically.
That ensures the usage, digital versions of them, they've already paid for.
So it seems like there's some details there for sure.
But they've lost that case publicly recently.
But again, it's back to several different ways this central
point is being attacked, whether it's
in the court of law.
Legally or technically.
Yeah, which brings us back
to Archivebox and maybe it's
the need for it to be
distributed. Yeah, I just think
fundamentally that both should exist.
I think having big centralized
resources is awesome
because centralized moderation is effective. You can keep bad actors out if you take a stance and
you don't get dragged down by politics too much. You can do a really good job and you can provide
an amazing free public resource for a lot of people, and that's awesome. But we should also
have distributed archives that cover all of the things that the central archives can't just from a scale perspective.
A lot of different people saving stuff on a lot of different hard drives is always going to be able to save more and know about more content.
Not everyone wants to report what they find to the Internet Archive.
Maybe you want to save something without announcing to the world that you're saving it.
There are lots of reasons, political, personal.
Sure.
So when did you start Archivebox? And what was the initial inspiration for that?
What made you actually get the editor out and start coding?
So the initial, I'll start with the initial inspiration. I grew up partly in China. My
family moved when I was nine and I did like middle school, high school there, had an amazing time.
And I obviously ran into the problem of having censored internet.
So, you know, we'd read news articles
and then 20 minutes later you refresh
and it's a 404, it's gone.
Great firewall.
Yeah.
So you get used to, just for practical reasons,
you get used to saving pages out of your browser
or screenshotting them or making PDFs just as a default whenever you find something interesting in order to be able to share
it with people there. And so that led to creating a small tool called Bookmark Archiver that I was
just using to auto download all of my pocket sort of saved articles. And that was a side project for
many years. And I've sort of come back to it over time, adding features here and there.
And then I used it.
There was a funny security incident when Equifax got hacked.
I used it to make a spoof site impersonating Equifax's site and got a whole bunch of viral attention for that.
And I was like, OK, this is just a random, interesting side project.
It's not actually what I care about working on.
But a nice thing to come out of that was a bunch of attention towards Bookmark Archiver,
where a bunch of people were like, oh, I would use this.
This seems useful.
And then so I've been slowly chipping away at it, adding features over the years.
And then I quit my consulting job a couple years ago and decided to work on it full time.
And over the last year and a half, I've been building it up full time.
Wow.
Some layers there for sure.
Yeah.
I was thinking about this.
Not sure if it's a direct one-to-one, but have you read the book Fahrenheit 451?
Yeah.
Okay.
You smiled.
Nobody saw that smile.
What made you smile about that?
He's read it.
Well, there's a lot of interesting layers to that book that are becoming increasingly relevant, which is kind of terrible.
But I don't know.
There's a lot of misinformation and disinformation these days.
And it's sort of, you know, at the foothills of where Fahrenheit 451 starts before it's outright deletion of information as a public strategy becoming acceptable.
That's my concern.
Jared opened up with why it should be archived.
You didn't say a full zero,
but you've said that before in other cases.
I'm sure that you probably didn't say that.
No.
What would you say then?
Is it not important?
No, I think it's incredibly important.
It's incredibly important. Okay, good.
And I think that's why I'm like, let's get Archivebox on the show.
Right.
I'm a huge fan of Archive.org. I think it's a shame that it's getting so much, you know,
problems. And I think that if we can decentralize those problems across a bunch of people, that's
probably better. So no, I'm not against it by any means. I don't think it's...
Gotcha.
A fool's error. And I do think it's a hard problem and-
Laborious, for sure.
And expensive and lots of stuff,
which is why the software needs to be there.
But go ahead, Adam.
I was not trying to, you know,
say you were seeing something bad or good,
but it just, it seemed like why do it was the question,
like in a negative light.
It was the question.
What's the point?
Yeah, what's the point of this?
I think it's a valid question.
Yeah.
I think so too.
But I think when we look at, you know,
when we cross-examine the challenges
which we opened up with
for the internet archive and
then this book or the premise of
this fictitious book in light
of today's world and then your history
of living in China behind the
Great Firewall and the challenges
that come from internet
disappearing essentially. Like truth
is, you know, you can go online see a price
for something and tomorrow that price changes but unless you screenshot it or something that you
can't go back to that retailer and say hey look the deal should still be the deal they're like
now we we just change that price behind the scenes or something like that like your only truth is the
artifact you can claim or that you have a hold of. And I think that's kind of the premise of the desire to archive the internet
so that we can preserve it for years to come,
but at the same time, just to hold true what's true.
I think there's one more public perspective that's pretty common
that's maybe worth addressing around why is archiving worth it.
A lot of people sort of have the valid idea that, oh, with AI tools or with modern
technology or better tooling over time, we can have our computers just sort of osmosis all of
the content and keep track of what's important for us. And we don't actually need to preserve
the actual website the way I saw it originally. Like, well, just use a browser extension that
sort of ingests it all into a model or, you know, they're training models on the whole internet all the time anyway,
why do we need to save the original sites? Let's just keep these models over time, and that's good enough.
I think that that's, it's a reasonable thought, and that might work
in the long run, but I think in the short run, we haven't seen those models
be accurate enough to recall all of the original content without hallucinating
at all, and then, unfortunately, the subsequent models get trained on the output of the initial models. So it's
really important to keep those primary sources around for as long as possible, because our future
kids' kids' kids might care for historical purposes also, you know, what did websites look like,
but also for contextual purposes, how is this content delivered in what format, what ads were
on the page, you know, all of these things are things that future people might care about that might not seem
important now. That's part of why archiving this active curation, this active labor that I describe
it as is important is because you're trying to preserve as much of the original historical
context of the world around this piece of content at that moment in time with the content. It's not
always just about the raw content.
Right.
Well said.
And I don't think the technique you describe,
because of the way large language models work,
I mean, they are effectively compression algorithms. And so lossy by definition.
I mean, they're not lossless.
Maybe eventually they become lossless.
And so they can have both your compressed artifact
and your original artifact perfectly pristine. Well, then they're just archivists, aren't they? And so they can have both your compressed artifact and your original artifact perfectly
pristine. Well, then they're just archivists, aren't they? And so we are still archiving.
We're just letting the machines do it. But you're kind of letting the machine do it, right? That's
what your software does. Sort of. Yeah. So I actually don't take as much issue with the
compression. I think all archiving is lossy to some degree. I take more issue with the lack of
perspective of the tool. I think that the
perspective of the person doing the saving is almost as important as the actual record. Because
if I visit a website in the US and on the Eastern time zone, I'm going to see a totally different
New York Times homepage than if I visit it from Germany. Or if I visit my Facebook timeline, it's going to look totally
different to me than to someone else. So the perspective of the person viewing it is almost
as important. And these models don't have that perspective. They don't record any information
about who's doing the saving. Why are they doing the saving? When did they do the saving? What did
they visit before and after? And so all of that stuff is part of the curatorial work of creating these archives.
Gotcha.
So that's something that's unique to the web then because of the dynamism of the documents.
Because if we were going to archive ancient writings, maybe you want to know what cave this came from and all the context you could possibly gather.
But there's not the perspective of the gatherer.
Maybe they choose to exclude some stuff or there's not like the perspective of the gatherer. Maybe they choose to exclude some
stuff or, you know, there's censorship and things like there is a bit of an editorial to like decide
what to archive. But, you know, based on one person in Seattle and the other person in London
gets two different web pages. That's a really good point. But it seems like it's almost unique to
web. If you go back far enough, I think you'll encounter editorial adjustments more often, right? History is written by the victors and the victors
are the ones who retranscribe it over the years. And so you're essentially getting layers of
delayed perspective added. I think that if you look very closely at any sort of historical archives,
the older they get, the more perspective is necessary because those are each layers of
decisions to decide to keep this around. Fair. The documents don't change though.
Yeah. Right. Unless they like literally
change them. Unless there's, that's fraud then. So now we're talking about fraud.
Hopefully. Yeah. But then you get libraries of Alexandria and you have to retranscribe things
from memory or oral history. Once you get to a long enough timescale, it all becomes layers
of recollection. But yeah, you're right. Hopefully the documents don't change in the 100-year timescale.
The interesting thing, though, is that the internet is fairly young in comparison to pretty much any other archived medium.
It's one thing to have an archive or a museum of paintings or of art or of different artifacts.
The web is a uniquely, like Jared said, it's dynamic, you know, but at the same time,
the perspective of, we don't know what's important right now until later. So it's almost like
archive as much as society might think is important because we're not really sure what
is important right in this moment. We have to have a zoom out, which is time, right? The time is the perspective. 50 years from
now, the world and the web or whatever the web becomes or whatever the web makes the world become
will be drastically different for sure. Bad or good, we're not sure. But today's breadcrumbs,
so to speak, may point us to why or how or what later on because the questions we'll have later
are unknown to us it's almost like the unknown unknowns just archive it all as much as you can
and distribute it and protect it so that we have the opportunity for the look back
yeah that's i think a good strategy for a central actor like archive.org their strategy is just
archive literally everything
they can get their hands on.
You submit a URL to them, they'll archive it.
I do think that breaks down somewhat
in distributed archiving,
where the goal is slightly different
because you're empowering individuals
to save things that they care about.
It's a little counterintuitive,
but actually recommending that people save
as much as they possibly can tends to backfire
because they end up with massive multi-terabyte collections that they just can't handle. They can't deal with,
and they don't know who to send it to. And eventually they stop paying for hosting.
So that's why I really stress this sort of archiving as an active curation line. It may
get old, but for distributed archiving, it's especially important. It's especially important
to recognize that the people running these are really contributing labor. They're contributing
public service to other people, and they should do it to the extent that they can sustainably do it.
And if you dive headfirst into saving everything you possibly think is useful,
I've seen many, many people burn out on archiving from that. It's a fad. They'll get into it a
little bit. They download 10,000 URLs, and then they're like, okay, I don't know what to do with this.
It's too big to search. It's too big to use. It's kind of cool. Maybe I'll send it to someone.
And it actually dies faster. Whereas if you empower people to archive what they care about
and sort of harp on that a lot so that you make it easy to curate and tag and add context. It's the context that indicates why it's
valuable. And it's a different strategy than a big library of Alexandria warehouse where you just
store everything you possibly can. It's more about having nodes of these curations of different
groups. And these nodes can then start sharing what they think is important with each other.
And through this sort of federated network of decision making on what is important, you end up with the same average result at the end of
basically everything that anyone has cared about at some point being saved. But putting that whole
responsibility on one person of, oh, if you're starting archiving, you must archive everything
you possibly can. I think it actually tends to backfire more than it does good. I can certainly see that. So Archivebox is to empower individuals to archive that which
they care about from the web. So this is a tool for downloading web pages, storing them offline
in their own little archives that you can bring them up and look at them again. You know, HTML,
JavaScript, PDFs, images,
like the raw nuts and bolts of what puts a website together.
Is the end goal then, like we all have our own little archives,
is it like you described and like ArchiveBox
is somehow going to provide this Wayback Machine
based on this federation of me agreeing
and other people agreeing,
which feels a little kumbaya,
but would be awesome if we all agreed
to like share
our little view of the world with everybody else is that the idea no actually so i don't i don't
want archives to be necessarily defaulting to being public for everyone because again that's
not the role of this distributed archiving tool like it's a great role for a library but it
doesn't work as well for distributed archiving because of cookies, because of authentication.
Basically, one of the main selling points when you actually get down to it and you're like,
do I really run this tomorrow? Is it worth it or not?
Is, oh, I can save my social media. I can save stuff behind paywalls.
I can save stuff that I have to be logged in to see. Archive.org cannot save any
of that, and they won't take it.
Or they'll upload it for you,
and they'll hold it privately for you,
but you won't be able to share it with anyone
because they don't want your cookies, right?
They've archived your cookies, your login sessions, all of that.
So a lot of that content is kind of unshareable
until you die or stop using those accounts.
And so it gets really tricky.
That's the main selling point of saving stuff locally.
If I start adding features of like, oh, share your
archives with the whole world, most people don't
want that. They're saving their Facebook
photos. They're saving their news
articles and stuff they read, but also a lot of their
own personal browsing history.
They don't necessarily want to share the URLs
only, and they don't necessarily want to share the
snapshotted page content. But it's important for the longevity of, and they don't necessarily want to share that snapshotted page content.
But it's important for the longevity of humanity and this information for it to be shareable eventually.
And so I think very carefully about sort of different ways to tackle that issue.
It's a really human issue.
It's not a technical problem, right?
Do you have time unlock?
Do you try to incentivize people to donate their archives to a public collection by providing
free hosting in exchange for them releasing the information? Do you have scrubbing tools that try to go through
and scrub all the sensitive information? If you do that, where do you stop? Because you are
tampering, you know, archivists try very hard, like you were saying, to not tamper with the
original documents. But the original document has someone's personal email and username and
password in the HTML somewhere, there's a trade
off. At some point, you do have to scrub that for it to be useful to other people without being
harmful to the original curator. Curation is an act of labor. We shouldn't punish those people
doing the curation by spreading their social media logins to the world. So it's a very delicate
balance. And I think that the answer is there's no one permission setting that gets pushed on
everyone ever. This tool is never going to force everyone to upload all their
archives to a big federated network. This tool is never going to force everyone to only have
private archives and not be able to upload stuff to a big federated network. Instead, it's going
to give up a range of options and it's going to be annoying to some people that they have to decide,
you know, do I share this with other people or not?
But I think that that's the right move for now is giving the full spread.
Do I keep it local?
Do I share it with my neighbors who I know and trust?
Or do I share it with everyone in a big, untrusted, scary world where someone might use this content to hurt me later?
And every social app network platform has to make these decisions when they first start.
For sure.
The time unlock is super interesting because we recently spent some time with Jordan Eldridge.
I'm not sure if you heard that episode, Nick, the Winamp era, where he had dug through different Winamp themes.
I love Winamp. And he had found in these themes all kinds of digital artifacts, things that shouldn't have been there.
Because he has this Winamp, not theme, what are they called?
Skins.
Skins, yeah.
He has this Winamp Skin Museum, which is really rad.
And in that, he had found like old pictures of people.
It's like basically a compressed file of a folder of files.
And in there is like the stuff you'd normally have for a skin,
but then like random things that he found in there.
And he shared some of those.
And we were looking at pictures of people from the nineties and like old audio files of like, you
know, kids at their computer recording weird noises. And it was just really enjoyable to kind
of have that snapshot of the past people we'd never met and never will meet. Sure. If we had
seen it right after they had taken it now, it's like almost, it's a privacy violation, right?
Like, I didn't want you seeing that.
Well, you shouldn't have dropped it in your Winamp skin.
That's right, purposefully.
Over time, like, you know, they're gone and old or dead
or, you know, it's just like the context is gone.
There's no fear there.
And it's really, for us, it was nostalgic,
but there's lots of reasons why that would be interesting.
So I like that time unlock option of like,
you know, maybe, like you said,
maybe I donate my archive when I die
or every 20 years, like go back 20 years
and those are now publicly available.
Similar to how stuff gets declassified,
you know, in our government.
Yeah.
I think that would be really cool.
Yeah, that's sort of what I'm gravitating towards
as an initial carrot to offer is like, you know,
if you agree to time
unlock, then I'll host your stuff for free as a backup. It gets dicey when I have to rehost
content for other people. So the way archive.org works is they're, they basically operate as a
library, right? They're a nonprofit institution. They don't earn income from their hosting.
They have a separate LLC that does some paid services, but it's a separate LLC. And they're
not basically not earning revenue directly off of re-hosting often copyrighted
content.
I would have to, if I ran a public hosted service where I'm mirroring people's content,
I would have to either be a library like them, in which case I can't accept payment for hosting
at all.
So this is the only way that I could offer to host people's stuff.
Or I have to figure out
some other new legal system that hasn't been invented yet to do this. Basically, you're trying
to make a business out of BitTorrent, right? It's a very similar problem, right? It's very hard to
charge for this and not be legally liable for re-hosting copyrighted content. So there's probably
some middle ground where people are buying an app that they are running locally, that they are operating, that's connecting them to other people running this app. But I am nevering that stuff if I get copyright complaints.
If someone sends a DMCA notice and says,
I have to take it down, I have to comply as a central agency.
But the people running those individual archiving apps
can still share it if they want to.
Something like that.
That's sort of a middle ground option.
Okay, friends, I'm with a good friend of mine,
Avthar Swithin from Timescale.
They are positioning Postgres for everything from IoT, sensors, AI, dev tools, crypto, and finance apps.
So Avtar, help me understand why Timescale feels Postgres is most well positioned to be the database for AI applications.
It's the most popular database according to the Stack Overflow Developer Survey.
And Postgres, one of the distinguishing characteristics is that it's extensible.
And so you can extend it for use cases beyond just relational and transactional data for use cases like time series and analytics.
That's kind of where Timescale, the company, started, as well as now more recently Vector Search and Vector Storage, which are super impactful for applications like RAG, recommendation systems, and even AI agents, which we're seeing more and more of those things today.
Yeah, Postgres is super powerful. It's well loved by developers.
I feel like more devs, because they know it, it can enable more developers to become AI developers, AI engineers, and build AI apps.
From our side, we think Postgres is really the no-brainer choice.
You don't have to manage a different database.
You don't have to deal with data synchronization and data isolation
because you have like three different systems
and three different sources of truth.
And one area where we've done work in
is around the performance and scalability.
So we've built an extension called PG Vector Scale
that enhances the performance and scalability of Postgres
so that you can use it with confidence for large-scale AI applications like RAG and agents and such.
And then also another area is, coming back to something that you said, enabling more and more
developers to make the jump into building AI applications and become AI engineers using the
expertise that they already have. And so that's where we built the PG AI extension that brings
LLMs to Postgres to enable things like LLM reasoning
on your Postgres data,
as well as embedding creation.
And for all those reasons,
I think when you're building an AI application,
you don't have to use something new.
You can just use Postgres.
Well, friends, learn how Timescale
is making Postgres powerful.
Over 3 million Timescale databases
power IoT, sensors, AI, dev tools,
crypto and finance applications,
and they do it all on Postgres.
Timescale uses Postgres for everything, and now you can too.
Learn more at timescale.com.
Again, timescale.com.
And also by our friends over at Wix.
I've got just 30 seconds to tell you about Wix Studio,
the web platform for freelancers, agencies, and enterprises.
So here are a few things you can do in 30 seconds or less on Studio. Number one, integrate, extend,
and write custom scripts in a VS Code-based IDE. Two, leverage zero setup dev, test, and production
environments. Three, ship faster with an AI code assistant.
And four, work with Wix headless APIs on any tech stack.
Wix Studio is for devs who build websites,
sell apps, go headless, or manage clients.
Well, my time is up, but the list keeps going on.
Step into Wix Studio and see for yourself.
Go to wix.com slash studio.
Once again, wix.com slash studio once again wix.com slash studio
i would be motivated to archive for legacy you know this this what's internet today for me is
not the same internet of tomorrow for my kids and so i think that would be where i would personally
find some motivation and i i'm kind of hanging out in that motivational space
because you're like describing, you know,
archive 10,000 URLs, you get burnt out and you sort of quit.
And so the job of you is to instill the obvious software to do the job
but at the same time bootstrap and educate the people
that you want to sort of clone and say, this is why it's important.
Here's how you can use it for yourself.
Here's ways you can even share it with others that make it so that you stay motivated.
Yes. time, if I can't show off my stuff, the things I think are cool or have a purpose or a reason to do
it, I'll eventually become bored with the practice and just basically move on. I think for me
personally, I would want an archive box for my future generations. And it's not to be narcissistic.
It's because those, it's my people, my closest people is who I really care about in life.
Sure, I care about everybody and I'm a kind person, but at the same time, like
family is family. You know, I want my kids to know where I came from, what was important
about me. And maybe it's part of the podcast. Maybe it's part of the, you know, the web amp
museum, so to speak, you know, these little things that were cool to me that eventually
my kids can spelunk and be like curious and explore and find new things and reach back and all that good stuff.
Or maybe they decide to donate it to a museum and then the museum decides to, you know,
bring a whole new life to it. Like your kids have a bunch of interesting agency and choice that they
can make. But yeah, that's a great point. Legacy is a common attractor for individuals who want
to do archiving. I'd say it's right now it's an even split sort of between journalists,
researchers, lawyers.
Lawyers are the biggest category, to be honest,
about archiving.
And individuals who want legacy or just sort of personal use,
archival of their bookmarks, that kind of thing.
Imagine this headline in 2070.
Seemingly long-time digital pack rat
finally, through family and legacy,
has had their internet archive or their
web their archive box donated and it's enabled this new technology to be the foundation of i
don't know like some i don't know like reaching for the stars here but like imagine that kind
of headline like somebody who was like really archiving the good stuff and they gave it to
future society and enable this brand new thing that is just
super cool well you also have like certain creators through time who were prolific
and they wrote way more than they published for instance and then that person died and they became
famous because they wrote such great prose and over time you're like wow what if we had their
unpublished works what if we had their journal what if we had their unpublished works?
What if we had their journal?
What if we had their thoughts?
We could mine those for such interesting insights like Albert Einstein's and such.
Yeah, there's a delicate balance there, though, because with any content that people create,
they're being vulnerable and sharing a part of themselves that they might not otherwise
share if they knew that everything they shared was instantly public 100% of the time.
Well, that's why I'm speaking of legacy, though.
This is like your foundation that you arranged.
They decided that we're in the context of you saying that finally this person died and their foundation decided to open up their archive box, for instance.
Yeah, that's totally a fair game.
Yeah, yeah, yeah.
And then they probably scrubbed it first just to make sure it's not embarrassing and stuff.
And then the public benefits, that's where a fair game. Yeah. And then they probably scrubbed it first just to make sure it's not embarrassing and stuff. And then like,
you know,
the public benefits,
that's where I was going.
Not just like,
hey,
all your secrets are public now
that you're dead,
you know.
Well,
I was also meaning for people
running these distributed nodes,
I think it's also important
to sort of discourage the,
oh,
archive everything you possibly see
mentality because I think
that would also
kind of destroy the internet
to some degree.
Like part of the beauty
of the internet is that there are pseudonymous spaces,
there are anonymous spaces, and there are real namespaces.
But you're not forced to be one in the same identity across all of them.
And so you get more vulnerability, more connection,
more willingness to share things online that you might not have in person.
And that the threat of everyone watching is actually tape recording
everything they see 100% of the time. And even if they don't decide to share it today, within 20
years, 100% of everything is going to be online copied by everyone. I think that that is rightfully
a scary concept for some, especially people who feel more threatened, right? If you don't experience
a lot of threats online day to day, it seems like, oh, that's not a big deal. Like, you know, if my stuff is time unlocked in 20 years, like I'll be
fine. If you're experiencing a lot of oppression today, and you don't think your situation is
going to change, having all of your social media public in 20 years might not seem as attractive
an option. And I just want to acknowledge that there's sort of a range, there's a range of
privacy that's needed. And there's a range of respect that's needed and there's a range of respect that
needs to be given to privacy from archiving tool makers to acknowledge that
we're not trying to build the tape recorder for the entire internet,
especially the private stuff,
the stuff that requires cookies and logins that because archive.org doesn't
have this problem,
right there.
They're not archiving stuff behind logins,
but of course I am pro archiving in general.
I love people to archive. I just, of course, I am pro archiving in general. I love people to
archive. I just, you know, I feel like these points don't get harped on enough when people
talk about archiving online. And so I feel like this is the right space to give them a little bit
of airtime. Sure. So tell us how Archivebox works then mechanically as a person who might use it.
How do I point it at things and how do I decide? Just like walk us through it. Yeah, so Archivebox right now
is a self-hosted Docker app mostly
and a pip library.
So if you don't know what those things are,
I'd say Archivebox is not for you.
There are other apps out there
that do a way better job
of providing a nice user interface,
a nice iOS app.
And all of that is coming for Archivebox eventually.
But right now we're a server that you run
like NextCloud or Plex or Home Assistant that you set up on a little $5 a month machine.
It's totally fine. You run a couple commands. It takes five minutes to get it running.
You have an admin interface, web UI, and you have a browser extension that you can use to submit
URLs. Or you can just paste in URLs manually or drag them in from a spreadsheet
or your bookmarks out of your browser.
There's ways to ingest most of the common ways
that you would want to send a list of URLs to this.
Then it goes through and it pretty serially,
we can't do too many in parallel
because you'll get blocked pretty fast.
So we just go through one by one.
And for every URL, we save it in a ton of different formats.
So the raw HTML, we'll save single file, which is an excellent way to get everything into
one HTML file, including JavaScript and images and all that.
Wget, YouTube DL, so we rip all the audio, video subtitles out, video metadata, comments,
photo galleries, like basically every piece of content.
We archive boxes stances to actually rip it out of the original page.
We're not trying to do the, oh, preserve it perfectly in its original format thing.
Because I think that that, even though I harped on before how important the original context is
of a piece of content, honestly, it's a really difficult technical problem. And so I'm going the other direction
where I'm actually trying to get the content out into its usable forms for
LLMs and for humans to
actually use it. And so I don't actually write it to this work standard, which is sort of the
internet archiving standard file. I think it's a little bit unapproachable for most people who
don't interact with work files on a day-to-day basis. And so instead, Archivebox writes everything
as raw files to the file system. You get a normal PNG, a normal PDF, a normal.txt file with article text.
You get JSON.
You get just basically really simple, common file formats that I think will survive for
more than 100 years.
And you get it all flat on the file system right there.
You can just dig in and look at it.
There's no complicated binary formats, nothing like that.
Yeah, so that's generally how it works.
And then you can set up scheduled archives
that pull in stuff on a daily basis.
You could archive your own Twitter feed
or Hacker News or whatever you want.
And then you can tag it.
You can send an archive to someone else.
You can export it statically in a way that you can share.
And the distributed sharing between archiving nodes is coming.
I'm working really hard on that, but that's not out yet.
So that's how it works so far.
How do you deal with the, if it's a flat file,
how do you deal with file size or archive size over time?
I understand the reason why you're doing that
because you want it to be preserved in a format that is accessible,
whereas work, which I believe is W-A-R-C, right?
That's the file format. format yeah where it's stuck in
this other thing that may not be accessible you know i don't know like a zip files probably be
around forever but you know these randos that might not be which work is not but at some point
somebody might be like no that's not cool anymore regular pds let's do that i think works will last
so work is actually a zip file modern works like work Z is just a zip file. You can add dot zip on the end and uncompress it. I don't think it's too bad. Like it's really if once you get used to them, they're very easy to work with and they're quite standard. And I think that will survive for a really long time. I just want archive box to be like immediately usable by the next tool that you want to consume the data with. Like I don't't want multiple decompression steps and stuff like that.
So for your concern about file size,
yeah, it does take up a lot of space.
It's not as bad as you would expect, though.
I'd say about 1,000 URLs take up, on average,
about five gigabytes with most of the methods enabled.
So as long as you're not saving only YouTube videos,
you can expect, if most of your content is text,
plenty of images still, but no massive,
massive videos, because that's what really skyrockets it quickly, about five gigs per thousand URLs.
So, you know, 10,000, not too bad, 50 gigs, you could probably stick that on a drive somewhere.
As storage gets cheaper, that's not that big of an issue.
I, for my big, you know, massive archives that I keep, I use CFS that has built-in compression
and lately fast deduplication.
And so I like to solve those issues at the file system.
Where you dedupe, huh?
I'm experimenting with a new fast dedupe feature.
I haven't used it on the big, big archive yet, but it's working well.
I usually disable dedupe, honestly.
I mean, I don't have a need for it.
But I think if I was running an archive, I would probably want it.
Yeah, it's like one of the few cases where it makes sense
but specifically
the new
recently released
like in the last
few months
fast dedup rewrite
by
is it IX systems
or another company
stepped in
and contributed a big
update to it
interesting
so it's more
reasonable now
for people to run it
yeah
as I was asking that question
I was thinking
Adam
don't worry about it
the file system will do it so I was going to ask you what your favorite file system was or what file system is beneath this thing.
I love ZFS.
I assumed you'd say ZFS, and I'm thankful that you did.
So am I. Otherwise, we'd have a fight, you know. It'd be Adam versus Nick.
It's not worth going there.
It's the wrong place to slap somebody, you know.
Well, I know I can appreciate your taste in so many other things that I know that you appreciate ZFS.
So there you go.
There you go.
And that really, I mean, I'm a ZFS guy myself.
That's exactly what I would put, you know, this archive on.
I would spin up a new ZFS file system and I would let that file system do all the work
of compression, dedupe, stuff like that that would matter, and let the archive box do its thing,
which is what it should do. Let me, as the user and the curator of it, interact with the original
file system versus, or the original file types versus what the file system can do for me.
Yeah, one way to make that more accessible is I've added support for our clone recently,
so you can link it up to a Google Drive or a,... A lot of people don't have terabytes of storage at home
anymore, and so letting people use
their Google Drive as their storage, I think, is
important. And then Google Drive, they'll still
charge you for every file, but they're doing
de-duping on their side. Same with AWS
or all of them.
I think that'll get cheap enough over time that
it's not a big issue.
I think most people
are going to run into losing motivation
sooner than they're going to run into running out of storage.
File system size, yeah.
Yeah.
How do you get this to go?
Well, I guess maybe the better question would be,
how well used is this?
How much are people using Archivebox,
and what would it take to make it more used, more adopted?
Yeah, so I don't have analytics in the actual product. There's only a few stats that I keep
track of. So there's 6 million Docker Hub pulls so far, 6 or 7 million if you include both repos.
That's a lot.
The PyPy installs are sitting at around 70,000 a month. And the Google Chrome extension only has about 2000 users.
So a lot of those are automated. You know, people have scripts that auto update their Docker
container or auto update their pip packages. But I think it's in the 10s of 1000s exact numbers. I
don't know when people open GitHub issues. That's a pretty strong indicator that they care enough
to say something. And there there's thousands of GitHub issues
and hundreds of contributors
and a few granted donations
not enough to make it a sustainable business model
but enough that I can't ignore it
lots of attention whenever stuff goes on Hacker News
so I know people care about the issues
and I know that people are using it
but I refuse to add analytics
so hard to say
You're one of us. You are one of us.
So how does it get into your credentialed stuff?
Do you have to be using Google Chrome? Is that the extension?
Is that what that does?
Or is it grabbing cookies out of your cookie jar? How does it do it?
Yeah, that's a, it's constantly evolving.
I'm trying to make this as smooth as possible.
The golden rule is don't let
people use their normal accounts. This is based on talking to a lot of my industry peers. We just
don't think that the scrubbing tech is there yet to sanitize these archives. And unless people
really, really know what they're doing, which some people do, and they can save that stuff,
you don't know who the audience of an archive is going to be in five or ten years.
And so people are going to forget,
oh, this archive was saved with cookies turned on,
which means your whole personal information is probably mirrored in the HTML somewhere.
So I basically force people to create separate accounts for archiving.
If you want to archive Facebook stuff, you make a second fake Facebook account,
invite it to all the
groups that you wanted to have access to it's an arduous process it's annoying and I'm being paid
by companies to automate it so that's how Archivebox is a sustainable business right now is that's the
paid service that I offer to companies is creation of sock puppet accounts there's no engagement I
have a hard rule I don't allow these accounts to do anything other than view,
but you create these accounts, you log them into all the groups that you want to be able to save stuff from. And then these accounts will archive on your behalf. And that way, if the accounts
get burned by, you know, an archive being shared or something, it's fine. They're not, you know,
they're not real info. They're not tied to anyone. Interesting. So that's some of the labor you were
talking about earlier. You know, like this is hard work it's not like uh just download it and click go like you're gonna be yeah doing some
stuff here yeah it's not too bad so i've the recent changes i've made it smoother so there's
a vnc container running in the background so you can it'll open chrome automatically you can just
go to a new tab you'll see like a desktop chrome you log into all your sites and then it'll save
those cookies automatically,
and then you just close it.
You never have to think about it again.
It'll stay logged in.
If it kicks you out of some site,
you just reopen that VNC window and log back in.
So I'm trying to make it as smooth as possible.
I do allow you to import cookies from your existing Chrome. I just strongly don't recommend it,
unless you're the only person
who's ever going to look at your archives
for the next, you know, how many years,
or if the people that you're sharing the archives with are people that you really trust.
Or if you're willing to manually sanitize.
And I think most people don't understand that risk.
So I don't make it too easy.
Is this only a single player game?
Like is there an archive scenario where it's like a group?
Like let's say Jared and I were like, man, that was cool.
Nick is awesome.
And we start our own archive, essentially.
And it's like anybody who's in and around the ChangeLog podcast universe, just had to say that, Jared, they can join in.
Or there's a mission here.
And we can, similar to the way you would have a core team member or commit rights, you can have this membership, so to speak, to an archive.
Is that out there?
Yeah.
Is that part of your plan?
Yeah, that is my plan.
That's the core mission is actually to serve that group.
So Archivebox is primarily aimed at organizations to save what they collectively care about.
And so there are users, there are permissions, there's sharing stuff, there's multiple logins.
And the idea is your org probably
has shared ability to access some resources. So your org only has to set up these credentials once
for the archiving bot. And then when people submit URLs, it doesn't archive with the person's URLs,
it archives with the archiving bot's URLs. And so an org can collectively maintain access to all the
resources that they care about. And then the org's archiving bot will also have access and will just save any URL that anyone in the org submits. And that's how
the paying customers are using Archivebox today. So I work with nonprofits that monitor
disinformation campaigns and look for evidence of war crimes on social media. As I was saying before,
it's lawyers who pay for this. They pay for evidence collection, both to catch the social networks
breaking their own terms of service
and their own rules,
to help governments with regulatory issues
around how social media is behaving,
but also to look for war crimes.
It's interesting.
So they're doing this method
of shared one collection
and they have teams of researchers
that submit URLs to the shared collection.
But you can't reveal who the researchers are
because they're researching really sensitive content.
You can't burn their identities.
Yeah, it's like a journalist and their source.
Exactly.
When you got into, when you even first had the spark of this idea,
did you think that's what you would be doing to sustain it and get paid?
Sock puppets.
No, but now that I'm working on it,
it's a surprisingly fun problem
because I get to red team.
I love security stuff.
And now I'm a red teamer.
I literally, my job is to break like CAPTCHAs
and rate limits and login walls for good cause.
Like I'm supporting, you know, anti-disinformation,
especially after the, you know, recent election.
It's motivating to actually work on what matters right now i feel
like this really matters and directly working on anti-disinformation and like mass social media
manipulation is motivating for sure what a what an interesting job you have wow so jerry uh what
are we doing about our archive box when we spin this thing up did you already spin up a new fly
machine for this i've not tried it yet.
I'm excited about Docker being
the, you know, is that one of the primary
ways that folks do spin it up and play with it?
I imagine like a Docker compose or Docker file
just generally is an easier thing
than anything else.
You know, I would think, but
the archiving crowd attracts a lot
of people who still want to do stuff the old school
way, unfortunately.
Which is zip files onto a machine? Yeah, or apt install every single dependency manually.
And some people really want to do that.
But unfortunately, a surprisingly large amount of the user base will not touch Docker and will only apt install every single dependency manually.
And so I spent the last two months writing my own runtime dependency manager for Archivebox. It's a whole new library called abxdl that uses the Python type system to basically have unique...
I went a little overboard designing this, but it was pretty fun.
Basically, Archivebox is now pluginized, so people can contribute plugins.
It's really hard for me to maintain the auto-login for Facebook and Twitter and Instagram and TikTok and YouTube and Quora and all of these.
So I want a community to come build around little scripts that do things automatically while archiving. And I'm working
with other archiving companies to sort of share a common spec for this. But part of what these
plugins need to be able to do is access dependencies. So YouTube DL or Wget or Curl or things that the
user might not have installed on their system. And so if I'm allowing people to install plugins from an app store
or ecosystem type deal, it needs to also be able to install
random packages at runtime.
And so Archivebox now has this whole built-in package manager.
And I have a rant blog post about the inevitable progress
of building a tool is that everyone eventually bakes a package manager
into their tool.
Once you go far enough in any product evolution, eventually you're going to have
to write your own package manager.
So ABXDL is both a, a runtime as well as a CLI tool.
Is that, am I reading that right?
Based on the, the repo on archive box on GitHub?
Um, I wouldn't just, it's, it's closer to like an ORM for package managers.
Gotcha.
It's just a layer between software and the system,
like Ansible or PyInfra.
In fact, it uses those under the hood.
It just gives you nice, clean Python types
for different packages and package managers
and allows you to define in a sort of flat YAML format
all of the things that a plugin needs,
regardless of whether they come from Brew or Pip
or NPM or cargo.
Or yeah.
I dig the writing here.
You say every wish you could YT,
DLP gallery,
DL,
W get curl puppeteer,
et cetera.
All in one command.
A B X D L is an all in one CLI.
Is that not the same?
Is that not the same thing?
Is that a different thing?
I mixed up my own names.
A B X D L is archive box,
but simpler. I was referring to A B X P K G, which is, okay. That's where the same thing. Is that a different thing? I mixed up my own names. ABXDL is archive box, but simpler.
I was referring to ABXPKG,
which is,
okay.
That's where the confusion is at.
Okay.
ABXDL sounds cool though.
ABXDL is a simplified archive box.
That's a one-liner.
It's a one-liner for all the tools you might need.
So like you give it a URL and it's going to figure it out.
Rip every piece of content that you possibly can out of this page by any
means necessary and put it in our folder.
That's cool.
I like that tool.
Yeah, I like that tool a lot.
So to clarify the confusion here, ABX PKG is the runtime you're talking about.
Yeah, correct. Sorry about that.
But you said ABX DL, and so I went up and found your repo and then Tangent does in a positive way, but now we're less confused.
But now we're more excited because we know two tools,
not just one.
We're getting two for one here, okay?
That's why I like you, Nick.
ABXDL is pretty cool.
So what you're saying then, if I'm reading this right,
is this ready for prime time?
Is this, you know, okay, so this is coming soon.
Which way, which tool?
ABXDL.
Yeah, ABXPKG is ready.
We've been using that for months now.
ABX DL, I just announced because it's this evolution of plug-inizing ArchiveBox.
Inevitably makes it a little bit too complicated for some people.
And so ABX is stepping in to fill in behind and basically provide a new tool that is way simpler than ArchiveBox to all the people that really don't want to spend time with Docker or setting up services or logins and all that. They just like,
give me the files now. Because that's how Archivebox started. Originally, it was like
ABXDL. It evolved so much that now we need a simpler replacement.
Yeah. To put it more simply, you write it well. ABXDL is a CLI tool to auto-detect and download
everything available from a URL.
So just like you would use,
which I use, YTDLP,
I obviously use Wget, Prefer Curl,
but either or, pick your flavor.
So if you're using this kind of tools,
you can potentially at some point in the future replace those things
if you're trying to archive with ABXDL.
Yep, it should be a fairly drop-in replacement.
It's got a few of its own flags like you know you can provide cookies you can tell it to ignore ssl warnings it's got the usual
things that you would be able to configure but i'm aiming for like direct drop-in replacement
for wget or curl i want to confess something here on the show if you don't mind. We always like confessions. One thing I do like to do sometimes
is I run my own Plexbox
and I don't always want to,
it's almost my version of archiving
now that I'm thinking about it out loud,
is I will take some music
that I like from YouTube
and it's not to take it from me
so I can give it to everybody else
and be a distributor.
It's more so I can have my copy
and I'm not spending web resources.
I'm spending LAN resources, so to speak.
It's allowed.
That's legally allowed.
Yeah, and so I use YTDLP to pull down different things
into a WAV file,
mostly like coding music and stuff like that,
that I'm like,
I want to keep going back to this YouTube URL
and have a tab open.
I would rather just have it play in my truck or play on my phone or wherever it's at.
And so, you know, Plex Amp is the iOS app.
And so I can play that from my Plex at my home wherever.
And so I YTDLP all the time.
I mean, all the time, like several times a month, all the time, you know,
but enough to be like, this is a useful tool and this is how I use it. And occasionally I'll pull down a video if I
want to archive it forever, but my file system has been the archive. So I think I'm like one step
removed from actually becoming an archive box user. That's great. That's how, that's how a lot
of people start. You start with the content you care about and hold onto that, right? Use that,
use that as motivation to get more into archiving.'t yeah don't break yourself into having to save everything just save the stuff
you want to save yeah i like the idea and premise i i think the thing i want is i wanted to catch on
and i think organizationally it's good like that's where you're sort of seeing a lot of the
movement so to speak but i still think there's opportunity elsewhere but i think that
it might just get burnt out.
I don't know, like, what would motivate somebody to do it
continuously forever if it wasn't
legacy things like we said earlier, you know?
Isn't it just a cron job after you got it all set up?
I mean, what do you got to keep doing?
Yeah, so part of it
is on me to make this easier, right?
My tool right now is not incredibly
so user-easy that you can just
set it up and it runs in the background forever.
I'm trying to get there.
And once it is at that point, then I think it'll be less important to select for people
who are really motivated to archive.
But right now, because there are still hurdles to curating and managing all the storage and
passing hard drives around and deciding who gets to look at it and scrubbing stuff out,
I am selecting on purpose more for people
who are willing to take on this workload.
There are other tools, like WebRecorder is amazing.
They have a new cloud offering that lets you do stuff.
They're the team that I'm collaborating with on this.
Behaviors spec we're calling to share these plugins
between different tools.
There's single file.
There's lots of browser extensions that make it fairly easy
to save stuff passively as you're browsing.
I think those are great options for people that are looking for sort of easy, passive archiving.
But yeah, a lot of the hard decisions don't come until you're six months into archiving and now you have a few terabytes that you need to move around between places.
How big is your personal archive?
I have, I guess there's a fuzzy line.
So I have many personal archives for different things.
I tend to start a new collection for a new campaign, I guess I'll call it.
A lot of different tools call these campaigns.
So like if I care about my YouTube favorites, for example,
that's going to be a hefty bucket of stuff.
So I'll start a dedicated collection just for that.
That's probably the biggest one. It dedicated collection just for that. That's
probably the biggest one. It's a few terabytes. It's not insane. But then I have a bunch of these
collections. And so altogether, I probably have about 20 terabytes saved in a little ZFS thing
over there on the shelf. I'm a big bare metal fan. I tend to not pay for lots of cloud hosting.
It's mirrored. I have a 3-2.1 backup, but I think that all in all,
up around 20 terabytes.
Well, friends, I'm here with a friend of mine,
Michael Greenwich, co-founder and CEO of WorkOS.
We're big fans of WorkOS here.
Michael, tell me about AuthKit.
What is this?
How does it work?
Why'd you make it?
WorkOS has been building stuff in authentication for a long time, since the very beginning.
But we really focused initially on just enterprise auth, single sign-on SAML authentication.
But a year or two into that, we heard from more people that they wanted all the auth stuff covered.
Two-factor auth, password auth, with blocking passwords that have been reused.
They wanted auth with, you know,
other third party systems. And they wanted really WorkOS to handle all the business logic around
tying together identities, provisioning users, and even more advanced things like role based
access control and permissions. So we started thinking about that more how we could offer it
as an API. And then we realized we had this amazing experience with Radix, with this API, really the component system for building front-end experiences for developers.
Radix is downloaded tens of millions of times every month for doing exactly this.
So we glued those two things together and we built AuthKit.
So AuthKit is the easiest way to add auth to any app, not just Next.js if you're building a Rails app or a Django app or a just straight up Express app or something.
It comes with a hosted login box.
So you can customize that.
You can style it.
You can build your own login experience, too.
It's extremely modular.
You can just use the backend APIs in a headless fashion.
But out of the box, it gives you everything you need to be able to serve customers.
And it's tied into the WorkOS platform.
So you can really, really quickly add any enterprise features you need. So we have a lot of companies that start using it because they anticipate they're
going to grow up market and want to serve enterprise. And they don't want to have to
re-architect their auth stack when they do that. So it's kind of a way to like future-proof your
auth system for your future growth. And we have people that have done that. People that started
off and they're like, oh, I'm just kicking the tires. I'm just doing this. And then poof, their
app gets a bunch of traction, starts growing.
It's awesome.
And they go close Coinbase or Disney or United Airlines or, you know, it's like a major customer.
And instead of saying, oh, no, sorry, we don't have any of these enterprise things and we're going to have to rebuild everything.
Just go into the WorkOS dashboard and check a box and you're done.
Aside from the fact that AuthKit is just awesome.
The real awesome thing
is that it is free for up to 1 million users. Yes, 1 million monthly active users are included
in this out of the gate. So use it from day one. And when you need to scale to enterprise,
you're already ready. Too easy. You can learn more at offkit.com or, of course, workos.com.
Big fans, check it out.
One million users for free.
Wow.
Workos.com or offkit.com.
As you're describing these YouTube favorites,
I have many playlists on many social media accounts. And I would say
the one I would probably almost covet, like love it to death almost, is my YouTube playlists.
They're all private, obviously, like only I can see them. But now I'm thinking like you said that,
I feel like if I can archive my playlists, then I know because there's times I go back to them and
it says this video is not here anymore
because it was removed.
And I'm like, well, why was it?
It was useful to me at one point.
I'm not trying to like get somebody
politically for any reason.
So like, I know it's not that kind of content.
It's just like, for some reason,
somebody upset
and it's not available to the public anymore.
And my ability to archive that,
now you're making me,
see, you're getting me.
You're getting me.
Yeah, definitely save that stuff.
YouTube, I think, is a great starting point because-
For sure.
It's also, interestingly enough, text copyright, audio copyright, video copyright, music copyright,
they're all very different fields legally.
There's not that much overlap.
Like the way those cases are handled, the way that what the precedent is in the courts
is very, very different. You have a Supreme Court judge to thank for the ability to
save video locally, who had a TiVo and was like, I don't understand why I can't just TiVo my stuff
at home. Like, who am I hurting by doing that? And so you have a fair use exemption to basically
TiVo your video content at home. Now, of course, platforms will argue you're violating their terms of service by cloning
that.
But like, realistically, the precedent is set.
You can save video that you care about at home and it's probably going to be okay as
long as you're not charging people to access it or depriving the original, like spamming
it in their public channels saying, hey, I have a free version, come over here.
It's an interesting problem in the fact that you have this archive box idea and the things that
you do to do the archiving is you as a individual or an organization, you identify something worth
archiving. So that's step one, right? Step two is having the necessary software technology,
whether it's a plugin or a CLI tool or something that goes out there and gets the thing and says, okay,
I've got the thing. And I assume as part of the ABX DL, at some point you'll have
some sort of config that says, this is where you put it. And that's the archive box. That is the
file system. That's ZFS backed, praying everybody follows your rule or at least your desires.
And then you have this ability, this viewer, so to speak,
the hallways and the rooms of the museum, right?
Those are the different, am I missing anything else
that's in the sphere of how you would interact with
or curate or view this museum slash archive?
No, you basically perfectly identified it.
There's different words used for those different
areas you know the viewers uh often called the replayer because you're like replaying a recording
but yeah that's that's basically it so the archive box as it is now when i if i went out there today
and spun up the docker because i'm i'm that kind of person i would spin up the docker version of it
what is that that's not the dl thing right i mean it is it's baked into it
as it is but this abx dl is a secondary cli tool that enhances or adds to what the archive box
will eventually do or does now currently right yeah so to dive into the nitty-gritty for like
a couple minutes so archive box internallybox internally is a Django application.
It exposes a command line interface that is the same package as the Django web app.
Like it's all in one pip package.
So you can pip install Archivebox without any of the Docker stuff.
And you immediately get the CLI.
You get a Python API.
It uses SQLite and it just saves to whatever current folder you're in. It'll create a collection, it'll create a SQLite database on disk, it'll create folders for all the archives
and logs and all that. So you don't need a continuously running container at all. If you
just want to basically replace YouTube DL, you can pip install archive box, archive box, add
HTTPS, whatever, and it'll just spin all that up locally and archive that one URL and then exit.
And then if you run another command in the same directory,
it'll add the next URL to the same collection.
You import 1,000 from Google Chrome,
it'll run them all right there and exit.
So you can use it as a CLI tool.
You can use it as a long-running app.
You can use it as a Docker container.
All of these are actually just one Django package
underneath. And that's like the first, first principles of this, because then you got the
challenge where you got orgs, you want to view it and you want to enjoy it. Well, you're not in that
setting whatsoever. Like you're probably on the web, right? You're probably in some sort of web
application. And so your viewer, your, would you call them a playback person? What was the terminology
for? Replayer. Replayer, yeah.
Yeah, replayer.
So if you've got a replayer out there, they're probably on the web.
That's a whole different problem set, right?
Yeah, so the CLI tool, because everything just saves raw straight up to the file system as raw files,
you don't actually ever have to see the ArchiveBox UI at all.
You don't have to use the replayer.
You don't have to use the admin interface.
You don't have to use anything. You can just use the file system. Or some people never see the file system at all. You don't have to use the replayer. You don't have to use the admin interface. You don't have to use the, you know, anything. You can just use the file system or some people never
see the file system at all, right? They're running it on fly IO and it's, you know, hosted file
system and they only see the web UI. And so it, yeah, fundamentally I'm serving like two different
groups. I personally use both heavily. So I'm running my own web UI, but I also very often go
into the file system because I want to play with a local LLM
and I want to train it on all my YouTube videos
or I want to train it on all the articles
that I read last month or stuff like that.
Right.
The reason why I'm asking you how to experience it
is because I'm literally thinking about,
okay, if I started to do this,
you know, one job is to archive.
Got it.
Okay, cool.
It's on my file system.
And the next job is later on, I want to experience it or replay it and be the eventual consumer
I will be of my YouTube playlist, for example.
And I'll admit it's mostly cooking videos.
All the confessions.
It's mostly cooking videos.
You know, right now I'm trying to perfect my chicken Parmigiana recipe. Like I am like, I am trying to nail it from the, the sauce, the original, you know, tomatoes
to use, you know, the garlic, all the process, you know, which olive oil, like I'm trying
to perfect it.
And so I've got a collection of videos.
And so future Adam, like once I've perfected it or my kids, you know, even a year from
now, like they will want to, they
would want to view this stuff.
But the here and now is the useful.
I think if you can make this archiving like useful today to me so that if it's useful
for me to archive and then also experience my archive means that I'll curate it better
over time because it's today useful, not tomorrow useful or some fictitious future that may
or may not even come to fruition.
That's what I'm thinking about
because I'm already doing that in a way with my music,
but I'm not using it in the way I'm using it.
I'm doing it in a way that is today useful.
And today useful is on Plex
and experiencing it as music
because that's what it is.
Plex doesn't really serve me to serve my YouTube playlist,
but this Django app or this web interface could be more
full-featured at some point so that you invite people to archive an experience today so that it
has future generation payoffs. Yeah, 100%. You're touching on a really key part of why archiving is
hard for it to spread virally is you need to convince people that it's useful today when most
people only realize archiving is important when it's too late,
once they're already are missing something. So making it really useful today is super important
to me. And I think another big part of that is search, making sure search is really good,
making sure you can quickly find like, I go to great lengths to get the subtitles for every video
and add them to the full text search. So can search by content of video. Extracting text by any means necessary is super important.
Making sure that the search engine is fast works really well. We use Sonic, which is a Rust-based
elastic search all-in-one binary replacement. It's awesome. There's other ways that we can make it
really useful now, too. We can try and do, like, not everyone wants this, but some people really want it.
AI-based summarization or categorization after the fact. So let's say you have, you know,
a thousand URLs saved. I don't want to have to go in and click through each one to find the article
that I care about. What if they all also had a column that was, you know, a two-sentence summary
of the article and the author and the byline and the date it was published extracted out. So I call these extractors and Archivebox is designed to be able to add many
extractors over time. And I envision it being like a home assistant type ecosystem or NextCloud or
WordPress ecosystem where you have tons of plugins for all the extractors of the things that you care
about. And the extractors come with their own replayers. So if you have an extractor that
specializes in getting YouTube videos,
it will also provide a nice replayer UI
to look at your YouTube videos.
If you have an extractor that gets article text
out of the page,
it should also provide a nice article reading UI.
You have an extractor that gets cooking recipes,
but it just gets the recipe part,
then you also need a replayer
that shows cooking recipes nicely.
And so this is how I imagine
the ecosystem evolving over time.
Yeah.
It's almost like an internet on top of the internet,
powered by, I would say, probably importance to somebody.
It's almost like its own index, too.
That's why I think there's a lot of the possibility,
the potential here is just tremendous
if you can put it out there in the right way.
I'm not saying the way you're doing it is wrong because you're iterating, right?
You're trying to get to this eventual long-term really useful thing.
Because if I'm an archiver and I do things well and it's useful to me and I can expose that stuff in some way, like the things that I think are important to me because of who I am or what I do or the way I think that adds layers of importance to the thing
itself. It's not about the actual content and the archiving the content is one important aspect,
but it's also what was archived, not what is it in the literal files or the content. It's like,
what was it? Who and why? Those are things that I think is like a sentiment layer that's just not
out there really. And I
think if you can find a way to expose that, you know what I mean? Then you sort of like get this
aspect of like invitation into it, either as a consumer or replayer, as you've said,
or somebody who's actually an archiver and joins in.
Another really interesting idea that other tools have played with is preserving the context in which a page was discovered.
Like, oh, I clicked these three links in a row from this Google search, and that's how I found this thing that I then decided to save.
Like, saving that whole research chain of the URLs that you found is maybe interesting context, and that makes it more valuable.
Possibly. Possibly. It's like session replay in a way for a scenario.
I can see how that adds context, but it's like uh it's like session replay in a way for a scenario i can see how that adds
context but it's also complexity yeah i don't personally see value in that necessarily except
for when i would see value in it of course it's like how did i actually find this website oh
that's right i was watching this which i watched that and that led me to this and that's why i
really don't mind youtube's algorithm honestly because it's like it's it's interesting how it
knows what i want to check out in the future.
And like my whole time, I was just full of chicken parmesan.
You know, it's like, it's endless.
It's pretty easy for you then, I guess.
It's easy. Yeah. It's easy.
Yeah. YouTube doesn't have me.
It doesn't have me figured out like it does you, Adam.
I can just show you chicken parmesan, but I'm constantly mad at it.
Is that right? That's a shame.
Yeah. I get angry at it all the time.
Like I don't want to watch this.
And I subscribed to somebody six months ago and you haven't shown me one of their
videos in three months. And I forgot they existed. You know, I'm there too. I'm, I'm with you on the
same anger point. You know, you should, uh, check out the tweaks for YouTube extension. It's totally
changed my relationship with YouTube. It lets you change the homepage algorithm. It lets you
make videos faster than twox. It lets you...
What's SideQuest right now?
What is this?
Faster than 2x?
That's blasphemy, man.
Come on.
People create those videos for you to watch.
That's right.
1x for life.
Not all videos, only the ones that are very slow.
I'm fine with faster than 1x, but faster than 2x?
Holy cow.
That's not the only reason.
They also hide a lot of clutter in the UI.
It's basically infinite configuration options for YouTube. I love that idea i will check it out my problem is
with that is i experience youtube in so many different contexts that aren't my computer
yeah my phone my tv yeah my computer other people's things so yeah for sure anyways off on
a youtube rant you were going to say something and I cut you off, Nick. Well, I think back to the earlier concept
of this index being sort of like
worth sharing as this collection together
or worth sharing of like the what of the archive.
I think that's a really important point.
And the replayers,
like one thing to think about is
if you take this to its logical extreme
and everyone archives enough content
that they care about,
that the internet is broadly copied multiple times over, what's the point of hosting
anymore? What's the point of hosting stuff on your own? Once you publish it and enough people
have archived it, just stop paying for hosting. And people already use archive.org like that today.
And it's kind of an interesting thought experiment to think about if this becomes the content
distribution mechanism for the internet,
what happens.
But I also don't think that will happen.
I think that in any social system, you have two ways to share things.
You can share by reference or you can share by copy.
The internet right now is usually share by reference.
You share a URL to something and it's referring to the original content
hosted by the creator.
SMS is share by copy. When you text someone, they have a copy of the SMS. If you delete it off your phone, it's not deleting the original content hosted by the creator. SMS is share by copy.
When you text someone, they have a copy of the SMS.
If you delete it off your phone, it's not deleting it off of their phone.
Email is share by copy, BitTorrent is share by copy.
Discord is share by reference.
You delete the Discord server, everything on it is gone.
Even though it looks like messaging, it's not share by copy.
It's kind of interesting to think about.
I think most share byy systems broadly will not succeed
in taking over as being the content distribution mechanisms for the world.
I think whether that's IPFS, whether that's BitTorrent,
whether that's anything that's share-by-copy,
I don't think it's going to become the de facto way we share content
simply because it deprives the original creators of the power
to monetize or delete their content. You can't moderate,
you can't get rid of CSAM once it's out there, you can't get rid of misinformation, you can't get rid of libel.
Artists, musicians, creators don't necessarily want to publish on a platform
where they lose control the moment they share something the first time. It's immediately
copied millions of times, they can't ever retract it or ask people to pay for it.
So I think archiving
is fundamentally limited in that societally the human scale people don't want to shift
to losing control over their content authorship yeah and so people striving to make archiving do
that to like really replace all sharing of content by any other means, I think are a little misguided.
And so it helps actually hone the focus a little more
and make it easier to work on this problem,
to not try to replace the entire internet,
because that's where it goes quickly if you don't think it through.
Well, I'm excited.
I think Adam's probably already got his Docker commands queued up.
I think you got him, Nick.
I'm a little bit more reserved in my, I'll wait until Adam sells me.
He's going to sell me something.
I'm already doing it, so it's like a better version of it, I think.
It might help me organize myself.
Yeah, this sounds like something that you're working too hard,
and actually it's going to help you work less hard.
Yeah.
So archivebox.org.
I did see that you went ahead and took the...
Or.io.
I don't have the.org.
My bad.
Archivebox.io.
Oh gosh, you're part of that crew.
Oh yeah, I have some regrets,
but.com is too expensive and.org.
I wasn't a nonprofit when I first started, so I didn't...
I was going to bring up the nonprofit.
So you actually went ahead and went through the time
and effort to get that done.
So that's a step.
I'm not my own nonprofit.
I'm a fiscally sponsored project
through the excellent Hack Club Bank.
Yeah.
I see.
So you took a shortcut.
Oh, very cool.
So does that provide you some leniency?
Because you mentioned you're trying to decide,
you know, should you go nonprofit?
Should you go profit?
Do you have leniency because you didn't... because it's a proxy that you can change later?
How does that work?
No matter what, I'm going to have to be both.
There has to be a non-profit component.
There has to be a for-profit component.
It's going to be a peer corporate structure relationship
similar to any company that does massive content re-hosting,
like Archive.org, like OpenAI, like Mozilla, like Maps.
Basically, you have a nonprofit and you have LLCs underneath it
that do anything relating to money.
The content is only ever hosted by the nonprofit,
which is not earning revenue for it,
but you can sell software that people use
that contributes to that pool of content. And so the financial motivations to basically the financial motivations are kept
separate. You're not incentivized to profit off of the copyrighted material, which I think is
important because as this eventually grows beyond just me, I don't want to have sort of
corporate structuring that is pushing it in the direction of destroying copyright.
Gotcha.
Anything else?
Anything we, a stone we have left unturned?
I didn't ask a lot of questions of you guys.
I would love to hear more about your own personal backgrounds.
You know, have you ever inherited a big legacy collection of stuff from your parents or grandparents?
Like, do you have any sort of personal interests?
Just photos.
Nothing digital. We're the first generation, I would say say probably for jared and i in digital right like we have our parents
in there but by and large for me at least all my parents are dead so do you have kids now i do have
kids yeah nice what would you what would you love to see them enjoy in, in 30 years? If you know, they could only save,
let's say a couple hundred pieces of your digital life.
Hmm.
His chicken parmesan.
Yeah.
Well,
I always have fond memories of,
I would probably say photos is probably the,
the,
the easiest one and videos,
right?
And those kind of go in the same category.
Yeah.
Like personal videos.
Yeah,
definitely videos.
I would say if I kind of put them in the
same lump you know it's the photos app everything in photos app yes that's interesting i think it's
mostly memories less like artifacts i don't know i haven't really thought about that honestly i
i do think that eventually my my copied versions of my playlists that really feature chicken parmigiana or the best steak ever
or the most amazing smash burger of your entire life those three things in particular are staples
in our household are you gonna have to send me that last one i'm a huge smash burger fan in the
last few months well you have to come to my house because that's the best one sorry about that and
you're invited uh i'll gladly make you a smash burger. I would say those kind of things I imagine my kids will want to take on because we make homemade marshmallows.
We do interesting things for the holidays.
And just generally, you know, we like to make our own food and we really appreciate that process.
And I'm trying to get my kids to think about that kind of stuff more so and what goes into the food.
Like even so far as like making your own sauce.
It's not because we're crazy.
I'm like if I can buy that sauce for whatever and I can buy the actual ingredients for like one quarter of the price and I enjoy it better and I know what went in it, that to me is like an A plus for all the things.
So yeah, I mean I would say those are the things.
Things that point to those principles, not so much the things themselves.
I think this YouTube playlist with my buddy Frank Proto might be – I see my buddy because I actually reached out to this chef literally recently.
I'll tell you.
This is really plus-plus content, but either way, I'll tell you.
So I call him a friend because he's a future friend.
His name is Frank Proto.
He's a chef.
Okay?
And I reached out to him on Instagram.
I'm like, hey, I'm a big fan.
I've made your pancakes.
So pancakes from scratch.
I've made your spaghetti.
I've made this and that.
Big fan.
How hard is it to book you for a podcast?
He's like, not hard at all.
That's his only response.
But it's not hard at all.
So long story short, a future Change Law podcast will feature a chef.
Amazing.
Yeah. Chef Frank Proto. Check him out. a future ChangeLog podcast will feature a chef amazing yeah
Chef Frank Proto
check him out
Proto Cooks
I believe is his channel
but he does some cool stuff
anything he makes
I will make
Frank's amazing
so I think those things
are things that I appreciate
and I know my kids appreciate them
because they have the
second order effects
of me making them for them
and so they'll eventually
appreciate
where I've gathered my
knowledge from. So I will eventually create my own recipe that is a culmination of 17 recipes,
you know, a trick from here, a tactic from that, or these particular tomatoes from that person's
recipe or where they got them at. Or if I want to spice it up, this is how I do it. You know,
I get the simple version and the complex version and it's all cooking related, but I think that's probably the easiest answer I can give you right now, which is something related to cooking.
Cooking is actually a shockingly popular answer to that question.
A lot of people like myself included increasingly as I'm starting the beginnings of a family.
I'm winning you over, right?
You're wanting to take on my, we can have a, on my... We can share a box, so to speak.
Yeah, that's my wife would love. Basically, photos,
some news, and a lot of cooking recipes reserved.
And also, some personal work portfolio
is important to journalists, especially
I think a lot of people that do writing for a living
see a lot of their content
sort of disappear
when the publishers go bankrupt.
So that's a common answer I get.
Yeah, everyone has a really unique
and interesting answer
usually to that question
of what do they want to save?
And then the alternate version,
if you don't mind me asking
one more follow-up,
is now take away
the 100 URL requirement, but now pretend you can't mind me asking one more follow-up, is now take away the 100 URL requirement,
but now pretend you can't save any individual piece of content, but your kids will get a model
trained on everything that you save with no limit. You could feed this model 20 terabytes of training
data. What do you limit it to now? What do you want the model to have and what don't you?
That's TMI.
No worries. the model to have and what don't you that's tmi no worries yeah i'm also gonna i'll pass on that one not because it's tmi although that's hilarious is because i would have to think really hard about
that more food for thought for people to think about because i think it sort of gets the gears
turning on perspective and yeah it's an interesting question i like that question yeah i like that
idea though i like the the premise of the question not so much the answer i'll give i like the idea of you know self it's almost like knowledge for
the future and this lm is an encapsulation of some version of the obvious answer is like you
know just copy my psyche copy my entire who i am you know full on Ready Player One, actually Ready Player Two, with the ONI headset kind of thing
and a replay of who I am.
That's the obvious best case,
but that's so weird.
Such weird implications.
But also the victors write the history.
You get a chance to rewrite your own history book.
You can cut out all the bad parts.
Let me give a different answer.
I started to think about it more.
And I realized this is a false dichotomy.
There's no reason that it can't be both.
But my answer is spend way more time with your kids and talk to them about life, about
what you think, about what you believe, and why you do what you do.
And just spend a whole bunch of time with them.
And you don't have to give them a model.
They'll already have it. Well, it might not be for your kids. It might be for your kids as kids as kids. Yeah. Well, people go, people come and go, you know? Yeah.
We don't have to like sustain our psyches into the future. Well, that really cuts deep to the
heart of archiving. Like that. I also believe this. I think that like death is an important
part of life. It's sort of the recycling engine that really tests, is this idea worth propagating or not? Because if someone doesn't
propagate it, then maybe it wasn't worth propagating. And that's sort of where I want
people to go when they think about these ideas. It's maybe seems weird coming from the archiving
guy to be like, Oh, you know, don't archive so much mortality. Yeah. But I honestly believe this.
And I think that there's some beauty in ephemerality.
And that's why I want archiving to be really intentional.
Because you are depriving the original creator of that decision to let death recycle their ideas by dragging their ideas, kicking and screaming into the next generation.
But we have to do it.
There's a balance.
There's a balance.
It's the only thing that makes life exciting because what's old is new again to so many people
because there's nobody to propagate forever.
Right.
You know, there is mortality, not immortality.
And so I can have this idea, which I thought was mine,
but it's not.
It's just recycled.
Somebody else had it.
It's recycled.
Yeah.
And it's only new to me because it's new to me.
A deep note to end on.
Yeah.
Perhaps.
It was fun i i think
archivebox.io to clarify check it out man uh if you're if you're jiving on this we do have a zolip
i'm sure there's a episode topic is that what they call it not channels it's a topic
hop in there say hello nick i see that you have a zoological archive box so if you want to dig
deeper in the community go hang out there and nick's zoological archive box but also come in
hours if you're not there already changelog.com community and comment on this episode and say
what's up and tell us what you're archiving or what you thought about this episode or
say hi to nick if he's there all that good stuff good times nick thank you yes thanks nick you. Yes, thanks, Nick. Thank you so much for having me. And I'll join
the Zulip right away. I didn't realize y'all had a Zulip. Yeah, man. Heck yeah, man. Zulip for life.
Well, this episode cuts
a little deep, you know. It makes you think. What you archive
is, in a way, what you're thinking. It's almost like search history, but not really.
It wouldn't be hard to backtrace where you came from based on what you archived.
Obviously, privacy plays a role.
But I think the time unlock that Jared mentioned is kind of interesting because at some point, it doesn't matter to you.
Contextually, it's gone.
It's not relevant, applicable.
You can't be persecuted or canceled necessarily.
Maybe future generations can be, which is interesting to think about.
But I think we all have a bit of an archivist in our blood, right?
Anyone who's in tech, anyone who's in software, anyone who's in the development of software products has got to be a bit of a pack
rat in some way, shape, or form, or someone recovering from. And this conversation around
Archivebox, this conversation with Nick, has got me personally thinking about the things that
matter to me, digitally, of course, and the things that I see, get impressed by, get changed by, and they're important to me.
And whether or not I would be sad to have not archived them or to be able to go back again to experience it or to share it with future generations.
I love this conversation.
I hope you did too.
It's got me thinking.
Okay, so archivebox.io, there is a
bonus by the way. We went a little deep, one, maybe two, maybe three layers deeper, and we gave Nick
some advice. We encouraged him and advised him on some different directions. And if you're not a
plus plus subscriber, hey, that just means that you end the show now. Okay? And that's cool.
But it's not.
Because you can easily go to changelog.com slash plus plus, become a paying subscriber.
Ten bucks a month.
A hundred bucks a year.
You drop the ads.
You get closer to that cool changelog medal.
You get bonus content like today.
And you directly support us. which is just the coolest thing ever
honestly i love it i appreciate it i know jared does too but changelog.com slash plus plus it's
better it's better because of all the reasons i've said and you get today's bonus content
with nick and that's a win once again changelog.com slash plus plus. It's better.
Okay, so some awesome
brands support us,
love us. We love them.
You should love them because they love us
and we love them and all the things.
Fly.io
Timescale.com
Wix. That's awesome.
Wix Studio. The coolest thing ever.
Wix Studio. And then thing ever. Wix Studio.
And then, of course, our friends over at WorkOS.
WorkOS.com, Michael Greenwich and team.
So awesome. WorkOS.
AuthKit. My gosh. They're killing it.
And, of course, the beat freak in residence, Breakmaster Cylinder.
Man, the beats are banging.
Thank you, BMC.
Thank you.
Okay, that's it.
The show's done.
We're off this Friday because, hey, Thanksgiving.
Enjoy your family.
Enjoy your time away.
We are.
And we'll see you next week. Peace. Thank you. not that I'm suggesting a rename,
but because you,
you,
the.org is so expensive, a name adjacent, and it might be a terrible play on words, but a good play on words, is instead of Archive Box, what if it was Archive Machine?
And then ArchiveMachine.org is available right now for $10.
Just saying.
So you haven't been entrenched enough where a name change might
be impossible. It is available and you are pursuing a nonprofit future kind of thing.
And you also have the Wayback Machine. So it's sort of like adjacent to what people already
might know. And so this is the archive machine that may power the Wayback Machine of your life
kind of idea. And the.org is available literally right now. Cool.
Yeah, Archivebox actually was a suggestion from a community member, Filippo Valsorda,
who's been a longtime supporter
and an interesting crypto guy.
We know Filippo.
Yeah, he's great.
He is awesome.
He's the longest term supporter of Archivebox
from the very beginning,
has been reliably donating 20 bucks a month.
And I know him from Reeker Center in New York.
But yeah, I think either he or someone right after him
in the same conversation thread, we were brainstorming name ideas.
It's funny that you offer that, Adam, replacing the box with machine,
because as you were describing some of the, what I would say,
brand hurdles of us understanding the current value of something like this, I thought maybe the word archive was the one.