Close All Tabs - What Happens if the Internet Archive Goes Dark?
Episode Date: March 19, 2025For decades, the Internet Archive has preserved our digital history. Lately, journalists and ordinary citizens have been turning to it more than ever, as the Trump administration undertakes an ideolog...ically-driven purge of government websites. But the Archive itself faces an existential threat. In this episode, Close All Tabs Senior Editor Chris Egusa joins Morgan to discuss his visit to the Internet Archive and its colorful founder Brewster Kahle, the legal battles that could shut it down permanently — and what losing it might mean for accountability and the preservation of history. Guest: Brewster Kahle, Founder of the Internet Archive Further reading: Inside the $621 Million Legal Battle for the ‘Soul of the Internet’ – Jon Blistein, Rolling Stone Open Internet, web scraping, and AI: the unbreakable link — Julius Cerniauskas, TechRadar Musicians demand music labels drop their Internet Archive lawsuit — Ian Carlos Campbell, Engadget Read the transcript here. Want to give us feedback on the series? Shoot us an email at CloseAllTabs@KQED.org You can also follow us on Instagram Credits: This episode was reported and hosted by Morgan Sung. Our Producer is Maya Cueva. Chris Egusa is our Senior Editor. Additional editing by Jen Chien. Original music and sound design by Chris Egusa, with additional music from APM. Mixing, mastering, and additional sound design by Brendan Willard. Audience engagement support from Maha Sanad and Alana Walker. Katie Sprenger is our Podcast Operations Manager. Holly Kernan is our Chief Content Officer. Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
Support for a Key QBD podcast comes from Xfinity.
Thanks to the Xfinity five-year price guarantee,
your guaranteed five years of reliable Wi-Fi with our best equipment,
no annual contracts, and no fees.
Plus, get online in minutes with same-day Wi-Fi.
Lock in your price and unlock the possibilities.
Xfinity, imagine that.
Restrictions apply, select plans only.
Ugh, you also having trouble with scammers trying to poke holes in your dam?
We need a phone plan that stops these pensions.
at the perimeter. That's why I switched to Google File Wireless, a wireless plan built with
industry-leading security. Google AI helps block pesky scammers so my info stays secure,
and best of all, unlimited plans start at just $35 a month. Whatever you do, your site with Google.
Explore Google File Wireless plans today. Plus taxes and government fees. Block spam known to Google
may not detect all spam calls. From KQED. This is the purge of government websites.
Since President Trump's inauguration, federal agency and military websites have been wiped.
Some are gone completely, while others have been overhauled to remove any references to so-called
woke terminology, all in this effort to comply with an executive order to end diversity,
equity, and inclusion programs in the federal government.
And it seems like in the rush to remove all of these woke words, there were maybe some
unintentional cuts, like when the Department of Defense took down a 19-year-old.
40's photo.
It was a picture of a pilot, posing in front of one of the planes that dropped the atomic bombs
over Hiroshima and Nagasaki.
But what does this photo have to do with diversity, equity, or inclusion?
Well, the plane was named the Anola Gay.
It was named after the pilot's mother, Anola Gay Tibbets.
There are still other photos of the Anola Gay available on government websites,
but in the past few months, countless pages with crucial information,
have been wiped from the internet.
Fortunately, for journalists, historians,
and anyone who cares about keeping track of facts,
there's a tool that lets us go back
and see exactly how those websites have changed.
Unfortunately, that very tool is under threat.
This is close all tabs.
I'm Morgan Sung, tech journalist,
and your chronically online friend,
here to open as many browser tabs as it takes
to help you understand how the digital world
affects our real life.
Let's get into it.
Okay, close all tabs senior editor Chris Agusa is going to walk us through this magical tool called the Wayback Machine.
Hey, Chris.
Hey, Morgan.
So I already have a tab open.
It's the current version of the State Department's safety tips for queer people traveling abroad.
And you have your own tab.
Yeah.
So it's the same webpage, same URL, but I have gone back in time, kind of.
That's where the Wayback Machine goes.
comes in. It's part of this organization called the Internet Archive, and for the past 30 years,
it's basically scraped the Internet page by page and archived it. So if you have a URL or link to a
website, you can go back and see all the ways that that website has changed. I used the Wayback Machine
to look at the page from January 5th before this executive order. And at the top of the page,
it's addressed to LGBTIQI plus travelers.
Unlike the one I'm seeing, the one that's currently live, which just says LGBT travelers.
What else does your version have?
So it has a lot of resources.
It has instructions for changing your passports gender marker, warnings about conversion therapy practices in other countries,
and also links to the National Center for Transgender Equality and other organizations.
Yeah, this current one I have on my screen doesn't have any of that.
Just no warnings about conversion therapy and definitely none of those resources for trans people.
There's actually no mention of trans people at all.
They kept a link to the Trevor Project, but they made a point to say that it's an organization for LGBT youth.
Which is wild because the Trevor Project is very involved in advocating for trans youth and gender affirming care.
Yeah, and again, this is just one page out of who knows how many that have been altered to
take the T out of LGBT. How panic should we be about the scale of erasure of public information?
Well, this has happened before. During Trump's first term, pages about climate change and the
environment were altered to soften the language, or they were just wiped entirely. But that purge
wasn't nearly as expansive or haphazard as the one we're currently living through, right? No, but I think it
has prepared a lot of us for a situation like this. This time around, a lot more people are
relying on the Wayback Machine and the Internet Archive. Okay. And in all of this mess,
the existence of the Internet Archive itself is under threat, which could spell trouble for
the future of all online libraries. So to get a better understanding of it all, Chris, you went to
the archive in person a few weeks ago. Let's start there. Let's make that our first tab. What is the
Internet Archive. You know, when I think of the Internet Archive, I'm thinking of like the Matrix,
cyberchase, when they're like running through this kind of like cloud and there are just binary
numbers everywhere. But the Internet Archive is a real physical location. Yeah, no, it's a real
place. It is not in the Matrix. It is in San Francisco in the Richmond District. It's in this very
grand building and out front has these huge Greek columns that kind of line the entrance of it.
They actually chose the building in part because it resembles the archives logo, which is
the columns of the Library of Alexandria.
The Library of Alexandria.
I mean, that is like the Greek idea of a universal library.
That's a pretty lofty idea to aspire to.
It is.
And the organization isn't shy about their ambitions.
Their stated mission is to provide, quote, universal.
access to all knowledge.
According to their website, the archive currently contains, and I'm just going to reel off
a bunch of numbers here, 835 billion web pages.
I think last I checked, that number is actually close to a trillion.
Wow.
44 million books and texts, 15 million audio recordings, 10.6 million videos, and on and on.
Wow.
Yeah, it's a lot.
Like, I can't even quite conceptualize what a trillion webpages even looks like.
Okay, Chris, tell me, what was the archive actually like?
It was actually really cool.
Brister Kale is the founder of the archive, and he was really excited to give me a tour
and introduce me to all kinds of old media devices they'd collected over the years.
Edison invented these cylinders in 1880, mostly, you know, things that you've never heard.
So, yeah, right when we got there, we go up some stairs, and he shows me this very vintage, beautiful old,
gramophone. It's a Victor talking machine five from 1927. So it's an old 78 RPM player, no electricity,
is a crank, has a horn. So spinning up. I don't know if you recognize that song, but it immediately
made me think of Twin Peaks. Yes. Yes, it does. I am right in the middle of my rewatch right now.
I need to do a rewatch in David Lynch's honor for sure. Yeah. And then,
when he started playing this, he then started dancing around the room, which is just like...
Okay, Audrey Horn.
Yeah, yeah.
So after visiting this little museum area, Brewster takes me into this huge room and it has this
beautiful domed ceiling.
It used to be a Christian science church.
The whole building did.
There are all these lines of pews that are facing where the pulpit used to be.
And in place of that pulpit, there's this huge projector.
screen. And so they use this place for like movie screenings, for local community events and things.
But there is one thing that draws your eye more than anything else in the room.
That is the statues.
Statues?
Yeah. They are these hundreds of terracotta figures. They are about waist high. And each one of them
has distinct features in clothing. And they're all kind of facing forward like their congregants in
this hall of worship.
Haunting.
So it's a bit of an eerie scene, but according to Brewster,
if you work for the archive for three years, then we make a little statue.
Wow.
Basically tributes to the people that made the organization happen.
So not like frames photos, just a three-foot statue.
Yes, exactly.
And they're really detailed.
Another employee was with us on the tour.
His name is Chris Freeland.
It's weird to be standing here in front of a terracotta statue of yourself.
But here we are.
It does look like me.
And that's also uncanny.
Everyone says they got the beard right, which is making me sad since it is covered in gray and no longer brown or red brown like it was when I was, you know, 20 years younger, but here we are.
And all these statues are standing in the pews and around the outside of the room facing the front sort of at attention.
You know what this reminds me of?
It's that terracotta army that's like protecting the tomb of like the first emperor of China.
and they're meant to protect the emperor and the afterlife.
And I guess it's fitting because these statues look like they're, I don't know,
protecting the internet ephabera long after it's gone.
It's actually very appropriate because behind all of these statues in the very back of the room
is where the servers live.
It tells me that they hold 145 petabytes of data,
which I don't deal in petabytes.
It's the one after terabyte.
So it's a lot.
And yeah, these are the servers where all of the billions of web pages, videos, and audio, where all of it lives.
There's a cool moment, actually, like, when you stare at these devices, you see all of these twinkling blue lights flashing and flickering across them.
Like hundreds a second.
And here is what Brewster told me about that.
Every time a light blinks is somebody uploading or downloading something from the Internet Archive.
I think that the technology reflects the people that.
Make it? Yeah, so let's make it beautiful.
That sounds almost magical. I mean, I get how this could be like a religious experience.
Yes, and I almost compare it to the way that old cathedrals were meant to evoke that sense of awe.
It was hard not to feel a little awestruck being in that place.
These servers hold some non-trivial percentage of all of the published works of humankind.
Okay, so tell me about Brewster, the guy who founded this whole thing, started this,
The Statue Army and this like cathedral of servers.
So as you can already tell, Brewster is pretty eccentric.
And he kind of comes from that old school vision of the internet where he thinks it should be free and open.
He really feels that now power is way too concentrated in a few big companies.
Companies often don't sell anything anymore.
They just license it.
Now, if you use Netflix or Spotify, you don't even have a video library.
like used to with DVDs or you don't have MP3s on your device or records in your collections.
So there's been this shift by the large-scale publishers towards ongoing control of materials and surveillance of what it is being viewed.
They believe that most things should be open and available to the public, and especially that old things should be preserved, even old web pages.
And so that's where you get the Wayback Machine.
We have the World Wide Web on Archive.org available back to 1996, so you can go and find your old web pages, your old GeoCity sites or whatever it is that you've done in the past.
But it also is relevant to people currently.
Journalists are using it a lot to find.
Well, what did that person say?
And they're saying something kind of different, and they said they never said that.
Well, no, wait a minute.
We found that in the television news archive.
And you can search and find based on television transcripts back to 2009.
And you can tell that Brewster is old school because he uses the phrase World Wide Web, which I love.
So vintage of him.
Yeah.
It is kind of miraculous.
I do feel that you can go back and look at a website like that and see exactly what it looked like.
There's no other place that that exists.
Right.
Okay.
So clearly the Internet Archive is this incredible resource.
Is it true that it might shut down?
Well, let's talk about it.
Yeah, let's open a new tab on that, but right after this break.
Support for a key QBD podcast comes from Xfinity.
Thanks to the Xfinity five-year price guarantee,
you're guaranteed five years of reliable Wi-Fi with our best equipment,
no annual contracts, and no fees.
Plus, get online in minutes with same-day Wi-Fi.
Lock in your price and unlock the possibilities.
Xfinity, imagine that.
Restrictions apply, select plans only.
So good, so good, so good.
Everything you want for summer is at Nordstrom rack stores now and up to 60% off.
Stock up and save on the brands you love like Vince, Sam Edelman, frame, and free people.
Join the Nordy Club to unlock exclusive discounts, shop new arrivals first, and more.
Plus, buy online and pick up at your favorite rack store for free.
Great brands, great prices.
That's why you rack.
Okay, new tab.
Internet Archive Lawsuits.
So let's talk about these lawsuits that the Internet Archive is facing.
They're not about the Wayback Machine or the web page archiving, right?
They're about a totally different part of the Archives operations.
Yeah. So the Internet Archive also has these huge operations where they preserve old physical media.
In some cases, the stuff they're preserving is very clearly public material that is for
public access, like they have this program called Democracy's Library, where they go and digitize
all the print records for all kinds of government agencies. But they also digitize things like
books and music, and sometimes that includes copyrighted material. So there are two specific
lawsuits at the center of this. The first was a case called Hachette v. Internet Archive,
and that was brought against the Internet Archive by book publishers. They objected to
Archives practice of digitizing books and lending them out digitally, even though many of them
were out of print.
Okay, but that sounds like a normal library thing.
Yeah, it kind of is, though there is a wrinkle to it because of this program they did
in 2020 during the pandemic lockdowns.
So before, the archive operated kind of how libraries normally do.
They have a certain number of licenses, you can check each book out, but during this time,
it became unlimited access for anyone.
Though I will say that Brewster and his team strongly dispute the idea that the case was about this pandemic era program at all.
They say the lawsuit had been planned before that program ever started.
Either way, after a lengthy appeals process, the judge did rule against them,
and the judgment required the internet archive to pay publishers an undisclosed amount.
And even though the lawsuit was about like these specific 127 copyrighted,
works, the archive ended up removing over 500,000 books from their digital collection,
which free speech and pro-access people were very upset about because a lot of these books
aren't available anywhere else.
And that brings us to this next lawsuit.
This was brought on by the music industry, two major record labels, Universal Music
Group, and Sony Music.
What can you tell us about this case?
So this suit was brought against them in 2023, and it centers around another one of the
Internet Archives programs.
This one is called the Great 78 Project.
And 78 stands for 78 RPM Records.
It's a format that was super popular from the 1890s to like the late 1950s.
And this program was this massive communal undertaking to digitize and preserve these
very old 78s.
They digitized and cataloged more than 400,000 of these recordings since starting the
project in 2017.
And they made those recordings available to the public to stream on their website.
And the key here is how they thought about this project.
They felt like they were undertaking the preservation of a defunct technology and the sound of American culture in a bygone era.
Those older materials that were sort of foundational of what did America sound like are so obsolete that we went and we circulated in the industry conferences, say, okay, there's going to be this project.
the Great 78 Project, and libraries and archives,
a hundred different ones came together to go inform this.
The industry knew about it.
They were all supportive that when we talked to them, it was all great.
But the record labels saw things differently.
They definitely did.
In their lawsuit, the labels called the Great 78 Project,
quote, wholesale theft of generations of music.
And they claim that by making the record,
available to stream for free, that the Internet Archive was displacing streams that generate royalties
on platforms like Spotify or Apple Music, royalties that could have gone to either the platforms
themselves or the copyright holders like the artists.
Well, do they have a point?
So obviously, I'm not a legal expert, but here's what the record labels say.
In the suit, they're focusing on 4,000 specific recordings that do have copyright protection.
They are commercially available, and many of them are still very popular, including Bing Crosby's White Christmas, which is the best-selling single of all time.
So I think it's going to be tough.
And we know what the actual amounts that they're suing for is, right?
Yeah.
So it's $621 million.
And just to put it in perspective, the Internet Archives operating budget is a tiny fraction of that, just around $30 million.
If we're found guilty of being a library, and then that will cost us, yes, it would snuff the Internet archive.
And that may be the point.
That's pretty bleak.
Yeah, it is.
I will also add that regardless of where you land on, okay, was this copyright infringement or not, the details of the case strike me as kind of strange.
So first thing is there were a notably small number of streams.
per audio file in question, like hundreds or, you know, in some cases a few thousand,
but not like hundreds of thousands or millions.
And so if you actually convert the number of streams to a dollar amount based on like
how much Spotify royalties pay, you're generally looking at a couple of dollars per audio
file, maybe a bit more in a few cases.
But, you know, clearly publishers are not experiencing dramatic monetary loss due to
these relatively small number of streams, right?
But the record companies, they still decided to sue for the maximum amount under the law,
which is $150,000 per record, even though they could have sued for less.
They also never even asked the Internet Archive to take the records down.
They never received a request.
They were just slapped with this lawsuit.
And if we had gotten that list, we would have taken it down.
And we did.
Once they sued it, you just give us the list.
and we would have taken them down.
But the other thing is, and I'm not saying that this argument will hold up in court,
but like I think about a platform like YouTube, right?
YouTube gets copyrighted material uploaded to it constantly.
And the way it works is that when an interested copyright holding party requests that they remove certain content,
that content then gets taken down.
I mean, that's the way the internet basically works.
And those 78s are on YouTube.
So we basically have a, they're after something else.
He thinks the publishing companies are going after the library system itself,
the ability for people to access materials for free.
The bigger picture that's going on and the real contest is not about money.
It's actually about control.
Can libraries own anything in the digital world?
Is there a digital ownership?
that's the central characteristic.
And there's a question, is the United States going to have libraries have their traditional roles of buying, preserving, lending, and inner library loan?
But the case itself likely won't move forward until later this year.
So we'll have to wait and see how that develops.
All right, change of gears a little bit.
we are a tech show, so I feel like we're almost contractually obligated to mention AI somehow
in almost every episode that we make. But I don't know, Chris, that feels like a new tab.
I think so. Okay. New tab. Internet Archive and AI legal battles. AI companies have also been hit
by big lawsuits from publishers, and you may not think of it at first, but AI companies like ChatGPT
and the Internet Archive have some similarities. They both use.
tools to scrape the web for data and text and other content. Of course, what they do is different.
The Internet Archive stores and preserves it, while AI companies use it to train their models.
What's Brewster's take on AI? So he's a big proponent, actually. We're using the AI technologies
for a bunch of what may seem like mundane tasks, but are super helpful, like putting metadata
on all these government documents. He says that one of the big problems with
The site like the Internet Archive is that there's just so much stuff on there.
Organization can be a struggle, and people visiting the site can get overwhelmed.
AI can make all of that easier by tagging and categorizing the billions of pieces of media they have to make them more easily findable.
I mean, if you go to archive.org and anecdotally, people say, you know, you kind of arrive and it's just huge and it's a mass and holy crow and I don't know where to start.
And so if we could make that on-ramp easier, wouldn't that be fantastic?
And as far as the lawsuits against AI companies, he thinks that the laws are too in favor of publishers and copyright holders and that they should be relaxed to allow AI companies to operate more easily.
We don't have regulatory clarity.
So there are now 80 lawsuits around the AI world.
So it's going to be just who has more lawyers.
And that's going to end up with just a.
few gigantic players.
I mean, I'm actually so surprised that he's pro-AI.
We've talked a lot about how AI has ushered in this era where everything is essentially
editable.
So, yeah, for somebody who's so preoccupied with the preservation and accurate recording
of history, I was surprised that he'd be so on board with technology that seems to be
like the antithesis of that in some ways.
Yeah.
Well, one thing that is clear is that the outcomes of these AI lawsuits,
could impact the Internet Archive because they're both about the enforcement of copyright law.
Right. It seems like there's this trade-off where if you want a free and accessible internet,
where information is free and accessible, you also have to expect it to be scrapable.
100%. And the Internet Archive does similar kinds of scraping techniques that AI companies do, like you said.
Overall, it seems that he thinks that's a trade-off worth making.
Okay. So we have the Internet.
archive, this organization that provides all these public services that the internet has become
dependent on. And we also have this massive lawsuit that threatens to shut the organization down.
I mean, it feels like we're in a moment where that possibility is more concerning than ever.
We have political turbulence, disinformation, these new AI technologies that are making it
harder and harder to get the truth. I mean, yeah? Do you think that's a new tab?
Okay. You know what? You're so right, Chris. Do you want to do the honors? I would love to. Let's open a
tab. What happens if the Internet Archive goes away?
We talked about this at the top of the episode, but the Internet Archive plays such a critical
role in our information ecosystem. And like Brewster says, our ability to go back and check
the record. I mean, that's what we lose when we lose the Internet Archive. Yeah, it's such an
important issue, especially right now. Brewster says that after each presidential term,
they go through and catalog all of the government websites,
including the ones we talked about earlier.
We have since the year 2004 gone and done an end-of-term crawl
to go and record all the federal websites that we possibly can
to go and download and preserve what it looked like before the change
and then right away after the change.
And are there changes?
Yes.
Are there always changes?
Yes.
Are there changes that you agree with?
It depends on how you voted.
But the idea of libraries were there to preserve the record.
I think the logical question is, what if it does shut down?
What then?
Has Brewster even entertained this idea?
I think it's hard for him to go there.
Like, this is his life's work.
But he has definitely thought about the threat that looms if our ability to preserve our understanding of the past goes away.
So he references George Orwell's dystopian vision of the future from the book 1984.
The image of the memory hole is just the idea that next to your desk is this hole that you can go and put the only copy of that newspaper in an incinerator and be able to change history is upon us.
The average life of a webpage is 100 days before it's changed or deleted.
If we do not actively collect them and preserve them and keep them accessible,
we are living in the memory hole universe of George Orwell.
Well, okay, is there any hope here?
Is there only option to just give it and crawl into a memory hole and accept it?
I don't think we have to accept the memory hole,
and I certainly hope that we don't.
But I'll wrap up with an observation about Brewster himself.
So what struck me about him the most was he just has this unrelenting optimism.
He seems to truly love what he does, and he believes in it so strongly.
And he's cultivated this team around him that really shares in that vision.
So even with the looming threat of this extinction-level lawsuit coming up from the music publishers,
it's like he can't quite bring himself to imagine that the Internet Archive could really go away.
I think we're doing fine.
I think that there might be pieces of the Internet Archive that are chiseled away by very powerful interests.
But the idea of a library or even just the Internet Archive as an organization has got lots of support.
So, you know, can the Internet Archive go away?
Yes.
Would it be a bad thing?
I would think so.
But I think the real issue.
are going to be whether the legislatures and the judiciary go inside with people's access to information in some way or another.
We'll see that play out over the next 25 years of the Internet Archive's life.
For Brewster, it's about the Internet Archive, of course, but it's also so much more.
I don't know the words exactly, but they're just in every librarian,
mind, those to control the past, control the present, those control the present, control the future.
And the idea of a library is part of an ecosystem of how society remembers. It's how it thinks of
itself. If you were to erase the Internet Archive and the libraries, which is in many ways
happening now, then we will live in a danger of having people be able to
recast what happened.
And as a society that believes in universal education and the fulfillment of individual possibility,
we just can't let that happen.
So, are you ready to close these tabs?
Let's close these tabs.
Close All Tabs is a production of KQED Studios and is reported and hosted by me, Morgan's son.
Our producer is Maya Kweba.
Chris Aguza is our senior editor.
Jen Chian is KQED's director.
of podcast and helps edit the show. Original music and sound design by Chris Agusa. Additional music by
APM. Mixing, mastering, and additional sound design by Brendan Willard. Audience engagement support
from Mahasanad and Alana Walker. Katie Springer is our podcast operations manager and Holly Kernan
is our chief content officer. Support for this program comes from Beirang Hu and supporters
the KQED Studios fund. Some members of the KQED podcast team are represented by the Screen Actors Guild,
American Federation of Television and Radio Artists, San Francisco Northern California local.
Keyboard sounds were recorded on my purple and pink dust silver K-84 wired mechanical keyboard
with Gator on red switches. If you have feedback or a topic you think we should cover, hit us up
at close all taps at kQED.org. Follow us on Instagram at close all taps pod. And if you're enjoying the show,
give us a rating on Apple Podcasts or whatever platform you use. Thanks for listening. Support for KQED
Podcasts comes from Xfinity.
Thanks to the Xfinity five-year price guarantee,
your guaranteed five years of reliable Wi-Fi with our best equipment,
no annual contracts, and no fees.
Plus, get online in minutes with same-day Wi-Fi.
Lock in your price and unlock the possibilities.
Xfinity, imagine that.
Restrictions apply.
Select plans only.
Ambition comes in all shapes and sizes.
At First Citizens Bank, we roll with your goals.
because we're built for what you're building.
Fit for your ambition for citizens back.
Relax and let Ralph's Delivery handle your grocery shopping this week.
We start with only the freshest items, then review your list and carefully choose each one.
Then we pack it all up and deliver it in as little as 30 minutes so you can feel confident
it's what you ordered.
Fresh groceries, your way with Ralph's delivery and pickup.
And right now, enjoy free delivery.
on orders over $50.
Ralph's, fresh for everyone.
