The AI Daily Brief: Artificial Intelligence News and Analysis - Can AI Detect AI? And Other Frequently Asked Questions

Starting point is 00:00:00 Today on the AI breakdown, we're reading what is actually a very useful frequently asked questions around artificial intelligence. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our Discord, our YouTube channel, and our newsletter. Hey, hello, friends. Welcome back to the AI breakdown. It is Saturday, which means it's time for another long read. Today we are once again in Professor Ethan Mollock's world. his blog is called One Useful Thing. You can find it at One Useful Thing.org. And this week he published a piece called What People Ask Me Most, also some answers, an FAQ of sorts. Here's the setup.

Starting point is 00:00:47 Ethan writes, I've been talking to a lot of people about generative AI, from teachers to business executives to artists to people actually building LLMs. In these conversations, a few key questions and themes keep coming up over and over again. Many of those questions are more informed by viral news articles about AI than by the real thing. I can't blame people for asking because, for whatever reason, the company's actually building and releasing large language models, often seem allergic to providing any sort of documentation or tutorial besides technical notes. I was given much better documentation for the generic garden hose I bought on Amazon than for the immensely powerful AI tools being released by the world's largest companies. So it's no surprise that rumor has been

Starting point is 00:01:25 the way that people learn about AI capabilities. Now, even then caveats that of course he doesn't have perfect information, that he makes mistakes, that things are changing fast, but that he will do his best to answer these most common questions. So what we are going to do is we're going to go through the essay section by section, question by question, and in some places I'll give Ethan's exact answer word for word, in other places I'll sum it up, and where I think it's relevant, I'll add my own little notes. His first section is about detecting AI. And interestingly, in his disclaimer, although he said take all of his other answers with a grain of salt, he said on this point about AI detectors, he is absolutely sure that he is correct. So, with that

Starting point is 00:02:02 in mind, first question, can you detect AI writing? Ethan gives simply a one-word answer. No. Question. But what about AI writing detectors that claim to do that? Ethan writes, AI detectors don't work. To the extent that they work at all, they can be defeated by making slight changes to text. And what might be worse, they have high false positive rates, and they tend to accuse people of using AI when they don't use AI, especially students to whom English is a second language. The falsely accused have no recourse because they can't prove they didn't use AI. You can't detect AI writing automatically. Even OpenAI says you can't.

Starting point is 00:02:36 The question continues, but I am sure I am really good at detecting AI writing myself. Look, I am going to cut you off here. You might think you are good at detecting AI writing, but you are just okay at detecting bad AI writing, and you combine that with your own biases and heuristics about who might be using AI. I'm sure the teachers who know their students well can guess at who might be cheating as they always could, but you are going to miss a lot of cheaters. who are doing it more subtly, which is a problem of fairness in and of itself. I hate to say it,

Starting point is 00:03:01 but homework as we know it is over. We educators are going to have to adjust. There are plenty of paths forward, but it is not going to include cheat-proof homework. Now, I want to hang on this point for a moment because I share Ethan's conviction here, on this idea that homework as we know it is over. Now, yes, I know there are certain domains of knowledge where, of course, AI can't solve everything, but I think that the broader idea is that there is an absolute, unescapable paradigm shift here, in which these tools now exist and the genie is not going back in the bottle. The entire impulse to have AI detection software, specifically for writing and especially in the educational use case, is frankly a clinging to an old world which simply doesn't

Starting point is 00:03:43 exist anymore. The good news is that homework the way that traditional schools have done it isn't some perfect invention that's inextricable from human learning. It's just a a convention that we happen to have adopted. But I think it's an interesting example of how there are all these things that have come to be totally accepted as normal, that are just basically null and void in the era of artificial intelligence. But there's still a little bit more in Ethan's section on detecting AI, so let's move back to his essay. Question, what about AI generated images? Ethan writes, well, there are more techniques to detect AI images, they are already very hard to identify just by looking, and in the long term, likely impossible. All the hints you think you know,

Starting point is 00:04:20 like bad hands on fingers are no longer true. So once again, Ethan is making the point that in a world where we can't easily identify what has been created by AI, we have to change our assumptions to begin with the fact that anything that we're consuming might have been created with AI. Ethan's next section is called using AI. His first question is, who knows how to best use AI to help me with my work? Ethan writes, I have good news and bad news. The answer is probably nobody. That is bad news because there is no instruction manual out there that will tell you how to best apply AI to your job or school, so there is really no one to help you get the most out of this tool, or to teach you to avoid its specific pitfalls in your area of expertise. This can be challenging

Starting point is 00:04:58 because AI has a jagged frontier. It is good at some tasks and bad at others in ways that are difficult to predict if you haven't used AI a lot. The good news is that by using it a lot, you can figure out the best way to use AI. Now, once again, this is an area where I agree wholeheartedly with Ethan, but I also personally think that this is likely to change fairly quickly. I know, for example, that I am churning on ways to better help people figure out how to use AI to help them with their school and their work. I think it's a fascinating frontier, an incredibly important thing to spend time on, but I think that Ethan's answer that right now there's not really anyone that you can easily point to is correct. So what then does Ethan recommend in terms of ways to get good

Starting point is 00:05:37 at using AI? Simply put, he recommends using it. Get your hands on GPT4 or Google Bard or Anthropics Claude. Then he says, use it to do everything you are legally and ethically allowed to use it for. Generating ideas? Ask the AI for suggestions. In a meeting, record the transcript and ask the AI to summarize action items. Writing an email, work on drafting it with AI help. Now, one really important thing that Ethan notes here, on top of what is already good advice, is that this is a never-ending process. Just when you think you understand how a particular modality of artificial intelligence works and you figured out what you can use it for, something new comes out that changes that dramatically. The example that he gives is the difference

Starting point is 00:06:19 between something like mid-journey, where you have to learn over time with lots of trial and error the perfect prompts to get what you want, versus the new Dolly 3 tool that's embedded in ChatGPT. As Ethan describes, Dolly 3 works very differently than other previous AI image tools because you tell ChatGPT4 what you want and the AI decides what to create. For example, he fed it the entire article that I'm excerpting now and asked it to come up with its own ideas for what illustration would be good cover art. It came up with an image of a stage that had on one side AI myths and on the other AI reality. Last in the using AI section is the question, I found something AI can't do.

Starting point is 00:06:55 Does that mean that it is outside the jagged frontier? Ethan writes, maybe, but I wouldn't feel too certain that a capability is outside the realm of AI until you have spent some time with different approaches. And if it truly is impossible for AI to do, wait a few months and try it again when a new model comes out. And now a word from today's sponsor. Are you interested in how two top-of-mind trends AI and crypto can work together? If so, I have the perfect podcast recommendation for you.

Starting point is 00:07:23 Web3 with A16Z Crypto, the chart-topping show brought to you by venture firm Andresen Horowitz. Web3 with A16Z Crypto is your definitive resource for the future of the internet. Whether you're already building in these spaces or simply curious about what's next. If you need a place to start, they recently released an excellent episode with Stanford Cryptography Professor Dan Bonay and form a Google Xer Aliyaa in conversation with host Sonal Choxi about the intersection of AI and crypto. From fighting deepfakes and proving humanity to large language models like ChatchipT, they cover it all. I highly recommend checking it out, especially if you'd like to learn more about how AI and crypto will impact our everyday lives. Beyond crypto and AI, this show is for creators seeking

Starting point is 00:08:03 more ways to truly own their work, for business leaders trying to prepare for the future today, and for innovators exploring trending tech topics. So go ahead, listen to Web3 with AsiC 16 Z crypto, wherever you get your podcasts. All right, so far we've been through detecting AI and using AI, and I think that you can probably see why I decided to share some of this piece. It's got a lot of utility packed in word for word. And so now we move on to the policy stuff to use Ethan's phrase. Here we have an additional disclaimer and caveat that Ethan is not a lawyer, and that the perspective is coming from someone who just watches these issues quite a bit. Question one, our company won't let us use

Starting point is 00:08:39 AI because we don't want our data stolen. Is that right? Now, in response to the In response to this, Ethan agrees that it is a challenge right now that all of the big LLMs haven't been particularly forthcoming about what materials their models were trained on, in part because that might have been copyrighted material. And yet he writes, The privacy issue that many people talk to me about is likely less of a barrier than you think. As a default, AI companies say they may use your interactions with their chatbots to refine

Starting point is 00:09:03 their model, though it is extremely hard to extract any one piece of data from the AI, making direct data leaks unlikely. But it is relatively easy to get more privacy. Individual users of chatGBT can turn on a privacy mode where the company says they will not retrain or train AI on your data. But large organizations have even more options, including HIPA-compliant version of the major AIs. All of the big AI companies want organizations to work with them, so it is not surprising that all of them are eager to offer data guarantees. Now, Ethan's conclusion is that the short answer is that data privacy is probably not as big a concern as it might seem at first glance, but here I'm going to depart slightly.

Starting point is 00:09:36 First of all, when it comes to individual consumers, we unfortunately, I think, think have had far too many times where terms of services and expectations about data privacy were broken or stretched to the limit to really be comfortable just believing that a little switch on chat GPT means that there's no guarantee that what we're saying isn't going to get back to humans on the other side of the system. A few years ago that may have sounded conspiratorial, and ultimately there's not much more that a company can do than pledge that they're not going to use data and say yes, this button turns it off, but that isn't necessarily going to increase trust a priori. I think that there's going to be a big trust barrier that remains, in other words.

Starting point is 00:10:12 When it comes to companies, I think Ethan is absolutely right that all of the big AI model companies are trying to give enterprises assurances that their data is safe, but we are certainly seeing a situation in which many of those enterprises are choosing to either, A, customize their own solutions because it's the most controllable environment, or B, work with vendors that they already trust with their data rather than looking too younger, perhaps more risky startups. It's a really interesting dynamic, and one that I think is going to continue to shape how the field develops. The second question in this section is, what's the deal with copyright and AI? Ethan writes, as I understand it, current U.S. copyright rules around AI material are sort of unclear and in flux.

Starting point is 00:10:50 However, large AI companies seem eager to ensure their customers that using their AI output commercially is safe. For example, Adobe and Microsoft offer legal guarantees that if you are sued over the output of their AIs, they will protect you at least under some circumstances. But also remember that legal use is an always going to be ethical use, especially as we consider cases where AI work displaces human labor or produces art in the style of a living artist. The thing that I like best about this answer is that it really does break apart two separate things. One is the legality of a situation, which is going to be fought out in cases that I'm sure will go all the way up to the Supreme Court, and the other is societal expectations and norms, which are going to be a much different

Starting point is 00:11:26 and frankly messier fight that plays out over years in the Court of Public Opinion. The final section in Professor Mollocks FAQ, The Future. question. Aren't AIs like GPT4 getting worse with time? Anyone who is on Twitter over the summer will have seen this idea quite a bit, and what's more, it seemed to be confirmed by an academic paper. But Ethan says not so fast. He writes, no, this turned out to be an incorrect conclusion from a working paper examining the performance of AI on certain math problems. Professor Arvin Narayanan and Professor Saish Kapoor found that AI models are not getting worse at these sort of problems, but they are changing, which alters the way you need to prompt the AI. You see,

Starting point is 00:12:04 what you call GPD4 or Bard or Bing today is not the same thing as what Bard or Bing or GPT4 was a few months ago. Models are continually getting additional training and tuning that improves performance in some ways while also changing behavior in others. It is part of why it is so hard to treat AIs like normal software, and sometimes easier to treat it like a person even though it isn't. Next question. Won't AI development ground to a halt as the internet fills with AI data or as it runs out of data to train on? Ethan writes, I hear this a lot. It may be true. Some papers argue that we will be out of training data in the next decade or two, or even by 2026 if we restrict ourselves to high-quality data, and another paper suggests that AI models will indeed start to struggle

Starting point is 00:12:42 as the web fills up with AI content. But many computer scientists argue that neither of these are actually long-term problems, and offer various solutions, including ways of training AIs on data that the AI makes up. Ultimately, these issues are unlikely to stop LLMs from improving over the next couple years, which I think is what people are really asking when they ask me this question. If you listen to my show on the State of AI report a couple days ago, you might have heard me talk about this particular issue and how I think it is going to be a bigger source of discussion in 2024. I think Ethan is probably right that what people who are asking him that are asking about is the immediate term. But I think that the longer term is a really fascinating question with a lot of disagreement among very smart people. In that state of AI report episode, I referenced documents that came out when Lama 2 was launched that seemed to suggest that an unreleased version that had been trained exclusively.

Starting point is 00:13:30 exclusively on AI synthetic data, had actually outperformed all the other models. If that's true, it obviously has huge implications for this discussion. And so, at least for me, I am eagerly awaiting to hear more about that, if and when meta decides to share it. Finally, let's get to the last question in this essay. How good does AI get? Ethan writes the only reasonable answer, honestly, I have no idea. He then continues, and I suspect no one else does either, given the debates among prominent

Starting point is 00:13:57 AI experts. Right now, models get better as they get larger, which requires more data and more computers and more money. At some point, technical, economic, or regulatory limits are likely to kick in and slow the advance of AI. But at the same time, there is a lot of experimentation on how to make smaller models perform like bigger ones, and similar experiments on how to make larger models perform even better. I suspect there is a lot of room left for rapid improvement. What all of this means is absolutely unclear. Do we reach the feared slash long-for level of artificial general intelligence, where AIs are smarter than humans? Thus, depending on who you ask, creating a machine that will start starving, killing, or ignoring humanity? Do we, quote-unquote, just get order of magnitude improvements

Starting point is 00:14:35 in AI that are already performing at high human levels on many tasks? Do AI stop improving quickly? There is no clear consensus, which, uncomfortably, means that we should be thinking about all three scenarios. The only thing I know for sure is that the AI you are using today is the worst AI you are ever going to use, since we are in for at least one major round of AI advances and likely many more. I will leave on this note as well. One of the reasons that you hear such diverse perspectives on the AI safety debate on this show is that I, like Ethan, apparently, think that the only reasonable answer to a complete unknown

Starting point is 00:15:09 and frankly unknowable is to create space for all different possibilities, to treat them as serious and is worthy of consideration and exploration. Unfortunately, the debate around future AI outcomes, particularly AGI, has calcified almost immediately into a brittle, caustic battle, in which both sides view each other with extraordinary skepticism. And of course, in this, I'm even leaving out the people in the middle who want us to be focusing on AI risks, but different ones than the extinction risk question. I think it is possible to keep all of these different perspectives in mind and proceed accordingly. But it takes a lot of will and consideration. And so hopefully having a lot of different perspectives on it, at least

Starting point is 00:15:45 helps give you guys the tools to make up your own minds. In any case, one more big thanks to Professor Ethan Mollick for writing this essay. Again, check out his blog at 1.1. Unusefulthing.org. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Can AI Detect AI? And Other Frequently Asked Questions

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.