The AI Daily Brief: Artificial Intelligence News and Analysis - AI's Threat to History

Episode Date: March 4, 2024

A reading and discussion inspired by https://www.nytimes.com/2024/01/28/opinion/ai-history-deepfake-watermark.html ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news ...and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI breakdown, we're discussing AI's threat to the past. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown That Network for more information about our YouTube, our Discord, and our newsletter. Hello, friends. For this Sunday episode, we have another Long Reads. Normally I try to just do one Long Reads episode, but this week we had a little bit of travel, and I wanted to at least have some episode. And frankly, this whole Google conversation around Gemini and bias and quote-unquote wokeness, and whatever else has been imposed upon it, just hasn't gone away.
Starting point is 00:00:44 It has had dramatic financial impact for Google in the form of reduced share price. It is clearly causing consternation at the highest levels of that organization, given the CEO's letter to staff earlier in the week. And what I argued last weekend when we discussed this is that part of why this is such a big deal is not that it's just triggering an American culture war, which it is. And that's certainly part of the noise around it. But beyond that, it's also triggering an awareness, a fundamental understanding in practice of what an AI could do when it comes to our
Starting point is 00:01:17 understanding of our own past, when it comes to the ownership of history. Interestingly, back in January, Jacob Shapiro and Chris Matman, respectively the managing director of the empirical studies of Conflict Project and the Director of the Information Retrieval and Data Science Group at the University of Southern California, wrote a piece for the New York Times called AI is Coming for the Past too. We're going to read it now in the context of what we've seen with Gemini and my argument that this is about ownership of history, and that's the thing that people are really scared of. The authors write, We don't have to imagine a world where deepfakes can so believably imitate the voice of politicians
Starting point is 00:01:54 that they can be used to gin up scandals that could sway elections. It's already here. Fortunately, there are numerous reasons for optimism about society's ability to identify fake media and maintain a shared understanding of current events. While we have reason to believe the future may be safe, we worry that the past is not. not. History can be a powerful tool for manipulation and malfeasance. The same generative AI that can fake current events can also fake past ones. While new content may be secured through built-in systems, there is a world of content out there that has not been watermarked, which is done by adding imperceptible information to a digital file so that its provenance can be traced. Once watermarking
Starting point is 00:02:30 at creation becomes widespread, and people adapt to distrust content that is not watermarked, then everything produced before that point in time can be much more easily called into question. And this will create a treasure trove of opportunities for backstopping false claims with generated documents, from photos placing historical figures in compromising situations, to altering individual stories in historical newspapers, to changing names on deeds of title. While all of these techniques have been used before, countering them is much harder when the cost of creating near-perfect fakes has been radically reduced. This forecast is based on history.
Starting point is 00:03:02 There are many examples of how economic and political powers manipulated the historical record to their own ends. Stalin purged disloyal comrades from history by executing them, and then altering photographic records to make it appear as if they never existed. Slovenia, on becoming an independent country in 1992, erased over 18,000 people from the Registry of Residents, mainly members of the Roma minority and other ethnic non-Slavans. In many cases, the government destroyed their physical records, leading to their loss of homes, pensions, and access to other services, according to a 2003 report by the Council of Europe Commissioner for Human Rights. False documents are a key part of many efforts to rewrite the historical record. The infamous protocols of the elders of Zion, first published in a Russian newspaper in 1903,
Starting point is 00:03:44 purported to be meeting minutes from a Jewish conspiracy to control the world. First discredited in August 1921 as a forgery plagiarized from multiple unrelated sources, protocols featured prominently in Nazi propaganda and have long been used to justify anti-Semitic violence, including a citation in Article 32 of Hamas's 1988 founding covenant. In 1924, the Zinnavieve letter said to be a secret communique from the head of the Communist International in Moscow to the Communist Party of Great Britain to mobilize support for normalizing relations with the Soviet Union was published by the Daily Mail four days before a general election. The resulting scandal may have caused labor the election. The letter's origin has never been
Starting point is 00:04:19 proved, but its authenticity was questioned at the time, and an official investigation in the 1990s concluded that it was most likely the work of white Russians, a conservative political faction led at the time by Russian emigreys opposed to the communist government. Decades later, Operation Infection, a Soviet disinformation campaign, used forged documents to spread the idea that the United States had invented HIV, the virus that causes AIDS as a biological weapon. And in 2004, CBS News withdrew a controversial story because it could not authenticate the documents, which were later discredited as forgeries. They called into question the earlier service by George W. Bush, then the President, in the Texas Air National Guard. As it becomes easier to generate historical
Starting point is 00:04:56 disinformation and as the sheer volume of digital fakes explodes, the opportunity will be available to, to reshape history or at least to call our current understanding of it into question. The prospects of political actors using generative AI to effectively reshape history, not to mention fraudsters creating spurious legal documents and transaction records, are frightening. Fortunately, a path forward has been laid by the same companies that created the risk. In indexing a large share of the world's digital media to train their models, the AI companies have effectively created systems and databases that will soon contain all of humankind's digitally recorded content, or at least a meaningful approximation of it.
Starting point is 00:05:31 They could start work today to record watermark versions of these primary documents, which include newspaper archives and a wide range of other sources, so that subsequent forgeries are instantly detectable. Such work faces some barriers. Google's digital library's effort to scan millions of the world's library books and make them readily accessible online, ran into intellectual property limits, rendering the historical archive unworkable for its intended purpose
Starting point is 00:05:52 of making these text searchable by anyone with an internet connection. These same intellectual property concerns are causing creators and companies to fret about both the training data provided to generative AI and its implications when used to generate content. Given this freighted history, including Google's failed investment in its digital libraries project, who will step up and pay for a similar massive effort that would create immutable versions of historical data.
Starting point is 00:06:14 Both government and industry have strong incentives to do so, and many of the intellectual property concerns around providing a searchable online archive do not apply to creating watermarked and timestamped versions of documents, because those versions need not be made publicly available to serve their purpose. One can compare a claimed document to the recorded archive by using a mathematical transformation of the document known as a hash.
Starting point is 00:06:33 The same technique the global internet form to counter terrorism uses to help companies screen for known terrorist content. Aside from creating an important public good and protecting citizens from the dangers posed by manipulation of historical narratives, creating verified records of historical documents can be valuable for the large AI companies. New research suggests that when AI models are trained on AI-generated data, their performance quickly degrades, thus separating what is actually part of the historical record from newly created facts may be critical. Preserving the past will also mean preserving the training data.
Starting point is 00:07:00 The associated tools that operate on it and even the environment that the tools were run in. Vince Surf and early internet pioneer has called this type of record digital vellum, and we need it to secure the information environment. Such a vellum will be a powerful tool. It can help companies to build better models by enabling them to analyze what data to include to get the best content and help regulators to audit bias and harmful content in the models. Tech giants are already conducting similar efforts to record the new content their models are creating, in part because they need to train their models on human-generated text, and the data produced after the adoption of large language models may be tainted with generated content. The time has come to
Starting point is 00:07:33 extend this effort back in time as well, before our politics, too, becomes severely distorted by generated history. So if you've made it this far, you probably have a sense of why I'm connecting the dots between this op-ed and what we've seen with Google over the last couple weeks. The authors here are concerned with the intentional misrepresentation of history for some political purpose. One of the big questions surrounding Google Gemini's image misrepresentations of history in the form of adding gender and ethnic diversity in historical areas where there was none, is whether it in fact amounts to an intentional manipulation of history. Some see it as simply overcorrecting from what is a generally laudable goal to make sure that bias embedded in the source material doesn't get propagated
Starting point is 00:08:14 by the LLMs. Others, notably people like Mark Andresen from A16Z, argue that this, while perhaps extreme is very intentional on the part of big tech to tell this untrue story about the past. I think two people could be on totally different sides of that question and still ultimately look at this situation and see it as an early warning sign around the relationship between LLMs and history. For those who think Google was being intentional, well, they might say, imagine if they got away with it. Imagine if over time people just started to treat this as an actual historical record. What else would it get wrong for the sake of some pre-planned goal dictated by the creators of the software. For those on the other hand, who take Google for simply making good faith efforts
Starting point is 00:08:55 to deal with the question of bias in training data, the situation still triggers an awareness of the power of these tools to influence how we think about history. What if they might ask, it had been much more subtle? What if they might ask, it had been to more nefarious ends? What if Google, for example, had a gripe with a particular group whose history it was able to alter? And even if one assumes, again, assuming a ton of good faith that a corporation whose main goal is profit wouldn't ever want to do that, the fact that they have the power to in the first place, by virtue of ownership of these tools, brings up a question of whether these tools are too big to be owned by a single corporation. We have been flirting for a very long time with the
Starting point is 00:09:39 question of whether digital commons in the form of social media sites have gotten too significant in society to be treated like private companies were in the past. That question is getting even more profound, with even bigger implications for the answer, in the context of generative AI. And that, again, is why this is such a bigger issue than just this one particular circumstance and having some extra diverse founding fathers. Thanks again to Jacob Shapiro and Chris Matman for a thought-provoking essay, and until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.