The AI Daily Brief: Artificial Intelligence News and Analysis - AI's Threat to History
Episode Date: March 4, 2024A reading and discussion inspired by https://www.nytimes.com/2024/01/28/opinion/ai-history-deepfake-watermark.html ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news ...and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're discussing AI's threat to the past.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown That Network for more information about our YouTube, our Discord, and our newsletter.
Hello, friends. For this Sunday episode, we have another Long Reads.
Normally I try to just do one Long Reads episode, but this week we had a little bit of travel,
and I wanted to at least have some episode.
And frankly, this whole Google conversation around Gemini and bias and quote-unquote wokeness,
and whatever else has been imposed upon it, just hasn't gone away.
It has had dramatic financial impact for Google in the form of reduced share price.
It is clearly causing consternation at the highest levels of that organization, given the CEO's
letter to staff earlier in the week.
And what I argued last weekend when we discussed this is that part of why this is such a big
deal is not that it's just triggering an American culture war, which it is.
And that's certainly part of the noise around it.
But beyond that, it's also triggering
an awareness, a fundamental understanding in practice of what an AI could do when it comes to our
understanding of our own past, when it comes to the ownership of history. Interestingly,
back in January, Jacob Shapiro and Chris Matman, respectively the managing director of the
empirical studies of Conflict Project and the Director of the Information Retrieval and Data
Science Group at the University of Southern California, wrote a piece for the New York Times
called AI is Coming for the Past too. We're going to read it now in the
context of what we've seen with Gemini and my argument that this is about ownership of history,
and that's the thing that people are really scared of. The authors write,
We don't have to imagine a world where deepfakes can so believably imitate the voice of politicians
that they can be used to gin up scandals that could sway elections. It's already here.
Fortunately, there are numerous reasons for optimism about society's ability to identify
fake media and maintain a shared understanding of current events. While we have reason to believe the
future may be safe, we worry that the past is not.
not. History can be a powerful tool for manipulation and malfeasance. The same generative AI that can
fake current events can also fake past ones. While new content may be secured through built-in systems,
there is a world of content out there that has not been watermarked, which is done by adding
imperceptible information to a digital file so that its provenance can be traced. Once watermarking
at creation becomes widespread, and people adapt to distrust content that is not watermarked,
then everything produced before that point in time can be much more easily called into question.
And this will create a treasure trove of opportunities for backstopping false claims with generated documents,
from photos placing historical figures in compromising situations,
to altering individual stories in historical newspapers, to changing names on deeds of title.
While all of these techniques have been used before, countering them is much harder
when the cost of creating near-perfect fakes has been radically reduced.
This forecast is based on history.
There are many examples of how economic and political powers manipulated the historical record to their own ends.
Stalin purged disloyal comrades from history by executing them, and then altering photographic records to make it appear as if they never existed.
Slovenia, on becoming an independent country in 1992, erased over 18,000 people from the Registry of Residents,
mainly members of the Roma minority and other ethnic non-Slavans.
In many cases, the government destroyed their physical records, leading to their loss of homes,
pensions, and access to other services, according to a 2003 report by the Council of Europe Commissioner for Human Rights.
False documents are a key part of many efforts to rewrite the historical record.
The infamous protocols of the elders of Zion, first published in a Russian newspaper in 1903,
purported to be meeting minutes from a Jewish conspiracy to control the world.
First discredited in August 1921 as a forgery plagiarized from multiple unrelated sources,
protocols featured prominently in Nazi propaganda and have long been used to justify
anti-Semitic violence, including a citation in Article 32 of Hamas's 1988 founding covenant.
In 1924, the Zinnavieve letter said to be a secret communique from the head of the Communist
International in Moscow to the Communist Party of Great Britain to mobilize support for normalizing relations
with the Soviet Union was published by the Daily Mail four days before a general election.
The resulting scandal may have caused labor the election. The letter's origin has never been
proved, but its authenticity was questioned at the time, and an official investigation in the
1990s concluded that it was most likely the work of white Russians, a conservative political
faction led at the time by Russian emigreys opposed to the communist government.
Decades later, Operation Infection, a Soviet disinformation campaign, used forged documents to spread
the idea that the United States had invented HIV, the virus that causes AIDS as a biological weapon.
And in 2004, CBS News withdrew a controversial story because it could not authenticate the documents,
which were later discredited as forgeries. They called into question the earlier service by George W. Bush,
then the President, in the Texas Air National Guard. As it becomes easier to generate historical
disinformation and as the sheer volume of digital fakes explodes, the opportunity will be available to,
to reshape history or at least to call our current understanding of it into question.
The prospects of political actors using generative AI to effectively reshape history,
not to mention fraudsters creating spurious legal documents and transaction records, are frightening.
Fortunately, a path forward has been laid by the same companies that created the risk.
In indexing a large share of the world's digital media to train their models,
the AI companies have effectively created systems and databases that will soon contain all of
humankind's digitally recorded content, or at least a meaningful approximation of it.
They could start work today to record watermark versions of these primary documents,
which include newspaper archives and a wide range of other sources,
so that subsequent forgeries are instantly detectable.
Such work faces some barriers.
Google's digital library's effort to scan millions of the world's library books
and make them readily accessible online,
ran into intellectual property limits,
rendering the historical archive unworkable for its intended purpose
of making these text searchable by anyone with an internet connection.
These same intellectual property concerns are causing creators and companies
to fret about both the training data provided to generative AI
and its implications when used to generate content.
Given this freighted history, including Google's failed investment
in its digital libraries project,
who will step up and pay for a similar massive effort
that would create immutable versions of historical data.
Both government and industry have strong incentives to do so,
and many of the intellectual property concerns
around providing a searchable online archive
do not apply to creating watermarked and timestamped versions of documents,
because those versions need not be made publicly available
to serve their purpose.
One can compare a claimed document to the recorded archive by using a mathematical transformation
of the document known as a hash.
The same technique the global internet form to counter terrorism uses to help companies screen
for known terrorist content.
Aside from creating an important public good and protecting citizens from the dangers posed
by manipulation of historical narratives, creating verified records of historical documents can be
valuable for the large AI companies.
New research suggests that when AI models are trained on AI-generated data, their performance
quickly degrades, thus separating what is actually part of the historical record from
newly created facts may be critical. Preserving the past will also mean preserving the training data.
The associated tools that operate on it and even the environment that the tools were run in.
Vince Surf and early internet pioneer has called this type of record digital vellum,
and we need it to secure the information environment. Such a vellum will be a powerful tool.
It can help companies to build better models by enabling them to analyze what data to include
to get the best content and help regulators to audit bias and harmful content in the models.
Tech giants are already conducting similar efforts to record the new content their models are
creating, in part because they need to train their models on human-generated text, and the data produced
after the adoption of large language models may be tainted with generated content. The time has come to
extend this effort back in time as well, before our politics, too, becomes severely distorted by
generated history. So if you've made it this far, you probably have a sense of why I'm connecting
the dots between this op-ed and what we've seen with Google over the last couple weeks. The authors here
are concerned with the intentional misrepresentation of history for some political purpose. One of the big
questions surrounding Google Gemini's image misrepresentations of history in the form of adding
gender and ethnic diversity in historical areas where there was none, is whether it in fact
amounts to an intentional manipulation of history. Some see it as simply overcorrecting from what is
a generally laudable goal to make sure that bias embedded in the source material doesn't get propagated
by the LLMs. Others, notably people like Mark Andresen from A16Z, argue that this, while perhaps
extreme is very intentional on the part of big tech to tell this untrue story about the past.
I think two people could be on totally different sides of that question and still ultimately
look at this situation and see it as an early warning sign around the relationship between
LLMs and history. For those who think Google was being intentional, well, they might say,
imagine if they got away with it. Imagine if over time people just started to treat this as an actual
historical record. What else would it get wrong for the sake of some pre-planned goal dictated by the
creators of the software. For those on the other hand, who take Google for simply making good faith efforts
to deal with the question of bias in training data, the situation still triggers an awareness of the
power of these tools to influence how we think about history. What if they might ask, it had been
much more subtle? What if they might ask, it had been to more nefarious ends? What if Google,
for example, had a gripe with a particular group whose history it was able to alter? And even if one
assumes, again, assuming a ton of good faith that a corporation whose main goal is profit
wouldn't ever want to do that, the fact that they have the power to in the first place,
by virtue of ownership of these tools, brings up a question of whether these tools are too big
to be owned by a single corporation. We have been flirting for a very long time with the
question of whether digital commons in the form of social media sites have gotten too significant
in society to be treated like private companies were in the past.
That question is getting even more profound, with even bigger implications for the answer,
in the context of generative AI.
And that, again, is why this is such a bigger issue than just this one particular circumstance
and having some extra diverse founding fathers.
Thanks again to Jacob Shapiro and Chris Matman for a thought-provoking essay,
and until next time, peace.
