Big Technology Podcast - Can The Web Survive Generative AI? — With Matthew Prince

Starting point is 00:00:00 How will the web fight back against the wave of generative AI that is ingesting all the content on the internet, but not paying for it? We're joined today by Matthew Prince. He's the CEO and co-founder of Cloudflare and has been on the warpath attempting to write the ship. Matthew, great to see you. Welcome to the show. Thanks for having me. Can you start by giving us a sense as to what is happening to the web with the rise of generative AI? We've talked about it a bit on the show already, but I want to hear it from you. Yeah, absolutely. So the business model fundamentally of the Internet over the last 30 years has really been driven by search. You search for something that generates traffic. It takes you to content that someone has created. And then that content owner, that content creator, can drive value really in one of three ways. They can sell the content itself, sell a subscription to it. We see plenty of that these days. They can put ads up against it. Or they can just get the ego hit of knowing that. somebody cares about and is reading their stuff. And that's really how the web has been built today.

Starting point is 00:01:06 It's been built chasing that traffic. What we're seeing, though, is that for the first time in history, searches across the major search engines, Google in particular, are actually on the decline. And what's replacing it is more and more people turning to AI. And the difference with AI is rather than giving you 10 blue links that you click to and find the answer, now what AI does is, it tries to give you the answer itself. And that's meaning that people aren't going to those original sources. And if they don't go to the original sources, then that means that you can't sell a subscription anymore.

Starting point is 00:01:41 You can't put ads up against it. You don't even know that people are actually getting value from your stuff. And so what we're really worried about at Cloudflare is if the incentives for creating content go away, why is anyone going to create content in a new AI-driven future? So talk a little bit about how many pages. these AI bots or search engines have crawled in the past and how much traffic they've delivered for each crawl and where it's gone to today. Yeah, you know, I think that the the deal that Google made with the web starting 30 years ago

Starting point is 00:02:14 when Larry and Sergey started working on the project was basically let us copy your content and in exchange will send you traffic that, again, you can drive value in, you know, one of those three ways from. And we have very reliable data at Cloudflare. going back 10 years, looking just at Google. And the metric that has stayed very consistent over time is how much Google crawls the web. They've actually crawled at a very consistent rate over the last 10 years. Over that same 10 years, we've actually added 2 billion internet users. So we were at 4 billion internet users about 10 years ago. Today we're about 6 billion

Starting point is 00:02:52 internet users. So you'd imagine it's actually gotten easier to get traffic over that period of time. But that's not what's happened. What instead has happened is, that back in the day, if you sort of take 10 years ago as the litmus test, today, it's almost 10 times as hard to get a click to get a visitor from Google to your site. What's changed? The answer is that Google has started providing more answers directly on the page. So if you search for something like, when was Cloudflare founded? There will be an answer box at the top that will say September 27, 2010 is the day that we launched. And you don't have to click to any link. In fact, about 75 percent of queries to Google now get answered on Google itself. And what's changed

Starting point is 00:03:33 in even just the last six months that's accelerated this is they've rolled out AI overviews. And we've tracked this from region to region to region to region. What we see is as AI is giving you the answer without you having to read with the original content, the amount of traffic that Google is sending to these sites has gone down and down and down. And that's the good news for publishers. If Google has gotten 10 times harder to get traffic from over the last 10 years. Open AI is a whole different beast. In Open AI's case, it's 750 times harder to get traffic than it was from Google just 10 years ago. In the case of something like Anthropic, it's 30,000 times more difficult to get that traffic. So why is that? The answer is I think people are trusting the AI. They're reading this derivative

Starting point is 00:04:19 content and they're not going back to the original source. But the problem is if you're not reading that original source, then the original sources have no way of generating value. They can't sell subscriptions. They can't sell ads. They can't get the ego hit. And that over time is strangling the very incentives on why content is being created. And that's the problem that we started to really focus on about 18 months ago. And then just today on July 1st, we announced that we are hard blocking the AI crawlers unless they will actually compensate content creators for the content that they're creating. Okay, and we're definitely going to get into your technological solution, so that's coming. But let's talk a little bit more about this problem. So I think the number that

Starting point is 00:04:59 you shared recently was Anthropic will crawl something like 60,000 pages for one click. That's right. For one click. That's sent. And Open AI was somewhere in the like 10,000. Do you remember 10,000 range? Yeah, it's 1,500 pages now for every one click that they send you. And, you know, I have to say, I'm surprised that publishers are seeing a problem now, only because these AI products are really in their infancy. I mean, Anthropic Claw isn't used by very many people at all. When you think about the scope of the web, Open AI has 500 million weekly active users.

Starting point is 00:05:30 It's pretty good, but really nothing compared to the amount of traffic that you see on the web every day. And I guess Google must be the problem. So just explain why this is already showing up for publishers because this is the infancy of generative AI. Yeah, I think it is. But it's one of these sea changes that we can just see happening.

Starting point is 00:05:51 So again, for the first time in history, searches to Google actually dropped in the last period over period. This is your data? Google has actually reported this, and it actually came out in the Apple trial as well, where they're seeing more of this traffic actually going to other sources. And so I agree that it is a drop, but what we're seeing is that is the trend,

Starting point is 00:06:12 that is the direction that things are heading. And even Google itself is looking more like an AI chatbot and less like a traditional search engine. And so if that's the case, I think that the time for publishers to panic is now. If we wait where more and more traffic gets strangled, less and less is going to it. Again, I think that that's just going to mean that over time we'll have more consolidation in the media industry. We'll have less and less content.

Starting point is 00:06:34 We'll have actually more salacious headlines as people are chasing the content that is left that's out there. And we need to actually make a change to make sure that we can continue to support publishers. Because I do believe the future of the web is going to be an AI-driven future, not a search-driven future and that AI driven future just doesn't have the same incentives and doesn't support the same business model that the old search driven web did. Okay, I'm going to poke at this a little more. Yeah. You mentioned that you can now search Google when was Cloudfare founded and you'll get the answer. That's something that Google's been doing for a long time. You could ask like when was

Starting point is 00:07:09 Martin Luther King Jr's birthday? Even before generative AI, they were giving you these answers. So is it that the magnitude has changed? And if so, from the standpoint of a consumer, could this be good? I mean, it's pretty annoying to type this question into Google. When was Cloudfair founded? And then have to click to Cloudfair's website to get the answer that Google could just surface for you. And so much of the web has sort of become effectively the service of Google queries where websites don't really need to exist. Well, so absolutely, this has been happening for a while. And if you look at up until six months ago, the ratio 10 years ago of crawls from Google to clicks was two crawls, one click.

Starting point is 00:07:52 Six months ago, it was up to six crawls, one click. And that's all because of the answer box. What the AI overviews, which they've rolled out over that time, have done is they've taken it now to 18 crawls to one click. So yes, it is a situation of, you know, the frog boiling in water. But that's, it has gotten progressively worse. And I think across the media industry, it's gotten harder and harder to actually survive as a, as a publisher. And so what I worry about is, yeah, you know, publishers are struggling at, they were struggling at six to one. They're struggling at 18 to one. I think they're dead at 250 or 1,500 to 1 that we're seeing with Open AI and completely dead at 60,000 to 1 we're seeing with something like Anthropic. And so that is the direction

Starting point is 00:08:37 that things are going. And that's a challenge. I think you're exactly right on the other point as well, that the challenge here is that this is actually a better user experience. That's why more of the web is going to turn to AI. It is great that you can type something in and you can get back an actual response as opposed to having to hunt for it yourself. That's a better user interface. And so that absolutely is going that direction. I'm not arguing, and I don't think anyone is arguing, that we should just get rid of AI or that we should go back to sort of 10 blue links on Google. What I am saying, though, is that the fuel that runs all of these AI systems, the reason that Google can tell you when Cloudflare was started or what Martin Luther King's birthday was or something like that is because somebody is doing the work of that original content creation. That original content is the fuel that fuels Google.

Starting point is 00:09:26 It fuels all of the AI companies. And if we strangle off the business model of those places, if we strangle off the incentives for content to create, for content creators to create content, then, we're actually going to end up strangling the AI systems as well because if there's no content to train on, then the AI systems are going to be pretty stupid for that. And so I think everybody agrees that there has to be incentives that allow content creators to continue to be compensated. The question is, what does that incentive structure look like? And that's, again, what we've been really spending a lot of time trying to figure out. Can I just want to ask you one more question about crawls. I think that sometimes, you know, you crawl to like,

Starting point is 00:10:06 put a website, like in search, you would crawl to put a website in your search engine or a page in your search engine from a website. Are these generative AI bots crawling to do something similar just to surface the information from these pages or some of the crawling being done in service of training their models? Because if that's the case, it's actually not as big of a deal because it's just being fed into training. I think the problem is taking that like direct query to answer behavior and sort of bringing it into the search engine. So do you know

Starting point is 00:10:40 is it training or is it just surfacing answers? So I think there are two different parts of this. There's definitely training and then there's what is closer to a search like experience. If you're familiar with this, it would be something like rag or something where you're actually getting that real-time data in order to

Starting point is 00:10:56 augment the foundational model. I think in both cases, though, you're actually costing the content creator something. There is literally they're paying for that traffic. They're paying for that load that the crawlers are pulling off of it. They're also, it is the intellectual property. It's the data. It's the content of these providers that they're using to train the models. And so there's value that the AI companies are getting. If there weren't, they wouldn't be crawling. Right. But there's no return

Starting point is 00:11:24 of any compensation or any reward. Again, in the old days of Google, the tradeoff was, let us copy your content and an exchange will give you traffic. What has happened is the frog is boiled in water. And now everyone is saying, let us copy your content and we will give you nothing in return. And so what we're saying is simply we need a better deal, a deal for a new AI driven future. And that should say if you are getting value from the thing that I created, then you should compensate me in some way for it. And it may be tiny amounts of money, but at some scale, that actually turns into something that can allow content creator to continue to have an incentive to create content over time. If we don't do that, if we don't give content creators the incentives to create content, they'll stop

Starting point is 00:12:08 creating content. So I think you're bringing up a key point here, which is if people are like, well, you know, I'm not necessarily seeing publisher content show up every time I'm on an LLM, what you are seeing sometimes is the product of the publisher that's been used for training. And even if it's like under fair use, totally fine because it's being transformed and, you know, something crawled from big technology or the New York Times is now being used to, you know, help basically because they're just trying to figure out what word comes next in the English language, you know, give you an answer about summer camp. The publishers are actually enabling that.

Starting point is 00:12:42 And every time an AI crawler hits a publisher website, they have to pay. And do you work with Wikipedia? Because they've been loud about this, that like the server costs that they have to pay have increased exponentially. But those aren't human visitors. They're AI bots crawling Wikipedia. So there's a real cost to just, supporting this crawl. And before we even talk about intellectual property, before we talk about anything else, like the content creators, the publishers are having to bear that cost. And so it had just a simple fairness level, like, why should they be bearing the cost in order to train, you know, these multi-billion dollar, you know, AI companies that are out there. There

Starting point is 00:13:24 should be some value, which is given back. But I think it's even beyond that. I don't even that we have to get to, I mean, you use legal terms like fair use, and I think that's very much up in the air right now. We literally had two different California cases that came out on both sides of that issue. Is training on content fair use or not? And I think it's going to be a coin flip where different courts are going to say different things. And I don't think it's a clear answer there. But I think it's a more fundamental thing, which is if you're doing something to create value, you should be getting some sort of compensation for that. If somebody else is, is posing a cost on you, you should be able to charge them to offset some of that cost.

Starting point is 00:14:03 And if something's not, if someone's not willing to pay that, then they shouldn't be taking your content in the first place. Up until now, and everyone's focused, you know, the New York Times is suing and, I mean, a bunch of people are doing it. Everyone's focused on the legal issue. I actually think that before we even get to the legal issue, this first step is actually take the technical steps to give content creators back control over the content that they're creating and let them have the choice.

Starting point is 00:14:28 on do I want to give access or not? Do I want to charge for this or not? And then done correctly, there should be a marketplace where content creators and AI companies come together and say, hey, I created this piece of content. I think it's super valuable. And the AI company says, yeah, maybe it is or maybe it's not, but here's what we're willing to pay.

Starting point is 00:14:45 And maybe they meet the clearing price. Maybe they don't meet the clearing price. But that marketplace needs to exist because otherwise, there's no way to convey value, there's no way to derive value from content creation. And again, I just need to have. hammer this point home. If we don't give content creators an incentive to create content, they'll stop creating content. And it sounds like, by the way, so you're not a skeptic of the AI

Starting point is 00:15:07 technology. You believe that this AI, generative AI thing is going to work. Not only that, I mean, it is, it is already clear that it's going to be the interface of the future of the web. So we're going to move from what has been the dominant internet face of the future of the past of the web, which was search, to what the interface of the future of the web was going to be, which is very much going to be AI. So I believe AI is going to get better and better and better. I actually think that done correctly, content can be created in such a way that will make AI better and that you can create incentives for doing that.

Starting point is 00:15:41 What I worry about is in order for AI to get better, you have to have original content. People have to be going out and creating that. And right now, we're strangling off all of the incentives for that content creation, which not only hurts content creators, it will ultimately hurt. hurt the AI companies as well. So I was speaking with someone who works in data labeling or data creation for large language models last night. And anticipation of this conversation, I was like, you know, one day, what you're doing might look almost exactly like what web publishers are doing, where like you might be hiring PhDs and having them like write their

Starting point is 00:16:20 information and you feed that right into a, into an LLM's training set. and there might be, let's say, historians. So if you take like a world history website, the historians that are writing the web pages for that world history website, they must be just like maybe one day they're going to be writing those world history articles instead of publishing them to the web,

Starting point is 00:16:42 selling them or feeding them right into chat cheap T. Do we lose anything if the web goes away and it's just content creators selling stuff to large language models? Yeah, you know, I think the black, mirror kind of dystopian future is not that, you know, content will stop being created, journalists will stop existing, and researchers will stop existing. I think the black mirror future is that we actually go back to something like the time of the Medici's, where we have

Starting point is 00:17:11 maybe five big AI companies, and they each employ a set of journalists and a set of researchers and a set of folks that they become effectively the institutions of knowledge, and they have they have salaries for all their their academics that are on staff. They probably each have different, you know, maybe one of them is the conservative AI company and one of them's the liberal AI company. You can, again, you can very much see that has actually been the natural state of media and the natural state of controlling information for quite some time. And you could imagine that all of that research actually consolidates behind each individual AI company and every different academic out there is just a, is just, is basically, is, is basically,

Starting point is 00:17:53 an employee of Open AI or Anthropic or Google or Microsoft. I think that's a pretty bad outcome because, again, I think that we, the web has been so amazing at distributing and democratizing access to information that I think we want to create that incentive. And so I think what we're trying to do is say, what's the step, you know, a few steps before the sort of, you know, all the academics are employed by one of the AI companies. And I think the answer is you allow. the AI companies to pay for the content that is actually valuable to them, that fills in their

Starting point is 00:18:28 models and makes their models better. And then you create incentives for independent journalists, independent researchers to actually be able to create that content, to augment those AIs, while still, you know, being valuable. This won't happen, but my sort of, you know, optimistic version of the future is humans should get content for free again, because we kind of paywalled way too much, frankly. And robots should pay a ton for it. Because again, every time a robot ingests something, it's in service of hundreds of thousands, if not millions of different humans. So robots should pay for that content. We should get back to a place where then humans get that for free. Again, that's, I think it's going to be hard for us to get there. But that's, again, the future that

Starting point is 00:19:11 I think is actually the kind of optimal future. So someone hearing you and looking at this through critical lens might say, look, Matthew, publishers depending on web traffic are barking up the wrong tree. That selling eyeballs for CPM fractions has not been a good business for a long time. In fact, we had a guest on the show that recently said, listen, like, he's a journalist, but he's like, if I thought that traffic was the way to go, I'd be out of business a long time ago. And what you really need is an audience that will, let's say, subscribe to your newsletter or listen to your podcast, maybe come to your events, and we've already moved past this business model of trading traffic for dollars, in which case this isn't an existential threat. What would you say to that?

Starting point is 00:19:56 I think, I mean, even then you're still trading traffic for dollars. You're just trading it for subscription dollars, not ad dollars. That will go away as well, because what will happen is the AI company will ingest the podcast and then summarize it on their page. And why would they ever buy a subscription to your podcast? Why would they ever sign up for your newsletter? if their AI agent can just simply say, tell me everything that was relevant in this particular podcast or newsletter. Because there's an experience of listening that's enjoyable. People do that in some point for entertainment and the leisure value. And that's how they learn. I think the AI companies will do a very good job at creating that experience as well. So you think they'll just create like

Starting point is 00:20:35 competing. Oh, absolutely. I know, I used to think that this was like such a pie in the sky and lunatic idea until I listened to Notebook L.M. Yeah. And like we've had multiple people on my YouTube page, be like, did you license your voice to Notebook LM? And I'm like, no, but the fact that you're saying that is pretty concerning. Totally. And again, I think that's the inevitable future. We're going to want to have hyper-customized podcasts that are in exactly the voice that we find the most reassuring. And AIs are going to create that for us. And again, they're going to be fed by original content creators that are out there that give them the ideas, give them what to talk about, give them the news of the day. What I think is we have to move even past the business model of subscriptions. We've got to

Starting point is 00:21:16 get to something else where you as a content creator are being compensated for the content. The way I think about it is every one of these LLMs is a little bit like a block of Swiss cheese. They've got, you know, a lot of stuff there, but there are big holes that are in it. And content that is valuable for them are the ones where they actually fill in those holes in the Swiss cheese. And so what I would imagine in the future is that you're able to actually surface what are the places where there are holes in the Swiss cheese as an AI and then allow content creators to create

Starting point is 00:21:48 content that fills that in. My favorite example of this is I was in Stockholm a couple of weeks ago, meeting with Danieleck because there really is nobody who has done more to compensate creators at scale than Daniel. Dan is the founder of Spotify, and they've done just an amazing job at doing this. And he told me a story in our long conversation. He said, you know, one of the things that we do at Spotify, is we actually take the searches that people run at Spotify, you know, that are things like, I want a song with a reggae beat about how much it, you know, sucks when your, you know,

Starting point is 00:22:24 your sister runs away with your car. And has happened. Yeah, whatever. And it turns out that they don't have good things to fill that in. And there are content creators out there that are making tens of millions of dollars a year, just creating content for those searches that don't have good results right now. now because Spotify surfaces that list of things where they don't give those results. I think that that's actually beautiful.

Starting point is 00:22:49 I think that's actually really amazing where they are showing where is there something that there is human need for and then how can we actually then create content to fill that human need and then monetize it, you know, through what they're doing. I think the same opportunity exists in the AI space where these AIs actually are able to say, this is a, I can tell you how valuable this new piece of content is for me and you can rank it. And then that allows you to create a marketplace where they can say, listen, that new piece of information is so valuable that I'm willing to pay you for that. And I think that done correctly, that then gets us to more original content creation. It gets us to less sort of Me Too copycat style

Starting point is 00:23:34 journalism, same thing in research. It gets us to maybe a place where we're, we're doing a original research and getting rewarded for being more original as opposed to being more salacious. Yeah, it's interesting. YouTube has a similar thing where there's an insights or an inspiration tab. And they give you like the title, a description, and the thumbnail. And they're like, people are searching for this. You go out and make it. Yeah, that's exactly right. And I think that that's actually as that incredibly valuable thing. That that's making humanity better as opposed to, you know, yet another story that's just chasing, you know, the most salacious headline that you, that you can get. So,

Starting point is 00:24:10 you're talking about this idea where publishers might sell their, the ability to crawl to AIs. That is also assuming that content is scarce. And so I want to run this other idea by you, which is that if we had the same amount of content that we have today, that's a great idea. But what we're seeing now is this explosion of content creation that's made through generative AI. Like it's kind of funny.

Starting point is 00:24:36 Like every time you see like these suggestions that we're talking about, YouTube's making any suggestions because clearly there's traffic to be had. I'm sure there are already YouTubers today that are feeding that into chat GPT, spitting out a script, running that through V-O-3 and Google, and then posting the videos and cashing in on traffic. So there's just going to be, and we're in the middle, I believe, of this explosion of content. Actually, you probably have better data on that than my suppositions. It almost feels like a D-DOS of the web where like, you know, if the ability to create content is constrained by a human's ability to create content, then you have something to bring to these AI companies. But if

Starting point is 00:25:14 human plus bot content starts to become the norm, there's going to be so much, then even if you're creating high quality stuff, it's not going to matter very much to these generative AI companies. What do you think about that? I think it's still, so first of all, I think that there's the pure AI generated content. There's lots of research that shows that training AI on AI data is sort of like that old Michael Keaton film multiplicity, where basically every copy of something gets worse and worse and worse. And again, that feels like that's going to still be still be the case for quite some time. May in the future, robots be able to go out and do, you know, interesting reporting from the field. May they be able to do, you know, interesting research,

Starting point is 00:25:55 for sure. But today, I think that that interesting research, that interesting original content, that interesting insight that comes from the work that right now only journalists and researchers and others can do is still the most important thing for filling in those gaps in the in the swish cheese of a i's what is just again high volume low value content my hunch is that that's if we score it correctly going to be exactly what it what it is which is low value content and so it should it should be rewarded very minimally i i like to ski so i live part of the year in park city Utah. You're in the right place. I care. I care enormously about the snow forecast. There is a forecaster in Utah named Evan Thayer. He writes these incredibly precise weather forecast where he

Starting point is 00:26:45 will literally tell you it's going to snow this much on this run and this much on this run. And again, I actually pay for his content because that's super value for me. I am going to be more willing in the future to pay for an AI that has actually licensed Evan's content back from him than I would to pay for an AI that doesn't have that content because, again, that content is going to be, you know, super useful and unique and valuable to me. And so I think actually what it will do is we, as we have more AI systems that are out there, is it will cause you to look for more original creative content. And that's going to be the thing that the AI is going to be the most willing to pay for. And that, again, I think is actually a beautiful thing where we're, we're instead of creating incentives to create more and more salacious headlines and chase traffic, we're creating incentives to create knowledge that fills in those places in sort of the Swiss cheese where

Starting point is 00:27:37 there might be holes. Taken in aggregate, all of the AIs are probably a pretty good representation of what human knowledge looks like. And so if we can score them and say, okay, here are the gaps in human knowledge and here are the places we need to fill in, that actually gives a really rich place for creators to look to create content which advances human knowledge. So, you know, is working on weather forecasting right now. This example that you gave of Evan Thayer, the forecaster in Utah, are we that far away from just telling an AI, hey, like you're tapping into the deep mind model on weather forecast. I want to ski this route today.

Starting point is 00:28:16 What's happening? I think they're probably pretty far away from that. But again, I think Evan is going to be always better using the tools of AI plus his local knowledge to make this better. AI just becomes a tool that creative people use. use in order to tell stories better, get better information, do more research. And again, I am skeptical that in the short term, at least, that we're going to have real value that is created by training on purely generated content.

Starting point is 00:28:48 Okay. So we've talked about your solution. Let's dive into the technological side of it a little bit. We are a tech podcast, so we should do that. So Cloudflare, security company, helps websites stay up on the web, despite all the threats. Yep. And let's just at the very beginning kind of talk about like the threats that you see to websites, who's trying to take them down.

Starting point is 00:29:09 Yeah. What's happening on that? Yeah. So protecting websites as part of our business. So is protecting employees as they go out across the internet. So Clubflare is fundamentally kind of a network that is built with all the performance, reliability, security, availability, and privacy guarantees that frankly the internet should have been built with, had we all known what it was going to become.

Starting point is 00:29:33 But obviously back in the 60s, 70s, and 80s when we were laying down all these protocols, we didn't think about those things. And so Cloudflare is basically reverse engineering the Internet in order to give it those performance availability, security, reliability, and privacy guarantees on top of what is there. And so today, one of the main uses for Cloudflare would be to you're putting a website or a web application or anything online, you want to make sure that it's safe from different sorts of threats. And so what are the threats that we see? I mean, every day we go to war with the Chinese government, the Russian government, the North Koreans. I mean, everyone is trying

Starting point is 00:30:15 to hack into our customers because who are our customers? Some of the largest banks in the world, some of the largest governments in the world. And they are all constantly under threat and constantly under attack from these organizations. The media companies actually were a pretty small part of our business. We had some media companies that used us, but it wasn't a big piece of it. What happened starting really 18 months ago is that those companies said, hey, I know we hired you in order to stop the Chinese hackers, but we have this new threat that's there. And frankly, my initial reaction was publishers. They're always whining about the Knicks, new technology, like what's what's going on. And over and over, they said just

Starting point is 00:30:54 pull the data, pull the data, pull the data. And it was only when we actually saw the data and saw how AI companies were taking content without giving anything of value in return, that they were actually adding enormous amounts of load. And in some cases, taking whole websites down because of the amount of traffic that they were sending to it. Right. They basically dedos the websites. Deals the websites. You know, again, not intentionally. But that was the point at which we said, listen, maybe there is something that we can do here. And, and, you know, at first, I think a lot of the publishers were saying, oh, this is, this is so hard. There's no way we can stop it. You know, there are these nerds and they live in Palo Alto and they're so smart, what are we ever

Starting point is 00:31:33 going to possibly do about it? And I just kept saying, guys, we go to war with the Chinese hackers. Like, we can stop some nerds with the C corporation. And I think it took a while for that message to really get through. But now that it has, you know, it's been really rewarding to see that the vast majority of the world's publishers, major publishers, have said, this is, we need to change the model. We need to be compensated for our content. And Cloudflare has the right idea in terms of the technical solution to do that. By the way, folks, $60 billion company listed publicly. So it's one of the bigger cybersecurity companies on the New York Stock Exchange. But I want to ask you, okay, so we're going to get into this technological solution. But what you said is

Starting point is 00:32:17 interesting because what if these AI bots, do you ever think there's a world world with these AI bots and just not just the publishers, but the banking websites as well? Are you like a natural enemy to having everything go through that because if everything goes through chat GPT, then these other sites that you secure might not need your services. I think, I mean, there's going to be some gatekeeper for how agents and other things access various services online. And I think that the challenges in each of those cases are different. In the case of a bank, you might want to say, I want to have guardrails that are in place.

Starting point is 00:32:51 I want to make sure that this is actually a customer that's accessing account. I want to make sure that they can only conduct transactions that have been authorized by an actual human being or something like that. Clefler actually provides those guardrails and makes it so that a bank can say, I want to expose my infrastructure to AI, but do it in a way which is safe and secure. I think publishers have a different challenge. And so, you know, in our case, a way of thinking is, is like, we have a whole bunch of developer documents, which are on our website. We want those to be an AI. We want coding platforms when someone says, oh, I want to use Cloudflare to build X, Y, or Z for it to be able to spit that out.

Starting point is 00:33:31 What we've done is we've actually tried to identify with real narrow precision, what are those pages that are on the web that have some indication that they are going to be monetized? And generally, that is, look at is it behind a paywall or does it have some sort of an ad unit on it, like a banner ad? or some sort of ad that's there. If we detect that, then we're blocking it by default. But we're not doing this. Again, there's value for AI, and we want to make sure that AI is actually getting the data that people want to have in it. So the About Us page on the New York Times probably should go into the AI system.

Starting point is 00:34:09 But in brand new article, you know, with breaking news, probably should be restricted. And again, unless the AI company is actually paying for that content. I guess the way I want to ask it is if everything goes into chat GPT, what's left for you to protect thinking, thinking outside of the media world? Well, again, I think that 80% of the AI companies are our customers of ours. And so we protect them as well. Okay, sounds good. I just wanted to ask that.

Starting point is 00:34:33 I was curious about it. But let's talk. Okay, now, so you're going to build a technological solution that will block crawling. Yes. And so Robots TXT, which is this code that you put in like the header of your site, if you don't want to be crawled, that wasn't working. Yeah, I mean, I think robots at TXT has two. problems. The first is some people just ignore it. And so if you ignore it, then you can still

Starting point is 00:34:57 crawl all you want. And there's some, there's some even some big legitimate companies that completely ignore robots at TXT. And we're really good at basically being able to say, okay, here's what robots like TXT says. How are you actually following what those, those, what sort of the rules of the road are? And if the answer is yes, then robots at TXC is a great solution. But in the cases where somebody is ignoring it, then we need to actually put in place additional technical barriers to restrict their access. And so that's exactly what we're doing. The second problem with robotics TXC is it's not granular enough. So take the Google bot, for example.

Starting point is 00:35:37 Google's crawler does five different things, at least. One is it checks if you have an ad on a page, make sure that if you're putting an ad for Procter & Gamble product up, It's not against a pornographic site or something like that. So it does brand safety checks. The second thing that it does is crawls to index for traditional search, the 10 blue links that are out there. The third is that it crawls to create answers that are in the answer box. The fourth is that it crawls to create answers that are in the AI overview,

Starting point is 00:36:07 the newer thing that they've rolled out. And the fifth is that it crawls in order to ingest content in order to put it into Gemini. It's a lot of crawling. A lot of crawling, all through one crawler. And for lots of different reasons, they don't want to split that out into various crawlers. But right now, they basically make you have a choice. They say you can either block Google entirely, in which case you can't run ads, you don't appear in search, but you don't appear in the AI overviews or Gemini or other things. Or they've recently added a tiny flag, which basically

Starting point is 00:36:34 just says, I'm not going to use this data just for the Gemini piece, but you still appear in AI overviews, you still appear in answer box. We think there needs to be more granularity where there is a difference between taking content and transforming it. And a license should say you can't do that without my permission versus just taking that content in order to do brand safety checks, taking that content in order to do traditional search. And so what we're proposed and we're working with the IETF as well as regulators is extensions to robots. TXT to give it that granularity. And that actually then allows us to further test to watch, you know, if does this robot behave in an appropriate way. And if the answer is yes, then maybe it gets more permissions to do things

Starting point is 00:37:18 online. If the answer is no, then we will put more restrictions and blockades in place to stop what are, again, badly behaving robots. So what you're going to do now is, in addition to that, put a wall up, technological wall, that's right. No crawling. Sorry, enough of you haven't respected robots, DXT, no entrance. That's right. And so that the original, like we're all familiar with like 404 errors when something is not found success on the internet as a 200 response that comes back to you. There's actually an original, one of the protocols set out a 402 response. And that response says payment required. And so we're actually tapping into that exact original specification to say when a robot tries to access a page where there's an intent

Starting point is 00:38:03 to monetize it. So it's either behind a subscription or it's got ads on it, that there is an ability for us to say 402 payments required. And then there's a negotiation. In some cases, and at first, First, that's going to be largely large publishers with large AI companies doing deals, like what Reddit has done or what the New York Times has done or what others have done where they have licensed the content and then certain robots get access to that. But in other cases, and I think over time, that will be a dynamic process where maybe a smaller AI company or a smaller publisher will say, hey, here's what I would charge for this content. Cloudflare will surface like how valuable that content would be for that particular AI.

Starting point is 00:38:39 And then the AI companies can decide, is that worth it or not? And it might be a very small transaction, maybe a fraction of a penny or maybe a few cents. Or in some cases, content that is really valuable might be worth hundreds or thousands or millions of dollars. You could imagine Taylor Swift, you know, is about to release a brand new song and the lyrics get published. How valuable is that for an app for teen girls who are lonely and want to talk about things? Probably pretty valuable and especially valuable if you could have exclusive access to it for some window of time. And so that's the sort of thing where I think a marketplace over time can develop, where original valuable content will get compensated and there will be a clearing price in the market

Starting point is 00:39:22 once we have that scarcity that's created by that wall. Okay, so it's not just a blocker. It's also this marketplace where you're going to have publishers that will sell their contents. So that's a way where you could have useful, effective chatbots and potentially a flourishing web. Exactly. And that, I think, is what we're trying to play for. Again, my utopian vision of the future is robots should pay a lot for content and humans should get it for free.

Starting point is 00:39:46 Right. And so to kick this off on June 3rd, as the day turned July 1st, you had a party on the top of the World Trade's One World Trade Center where a bunch of publishers pressed a red button to get this thing going. And that includes some very big names, Condonast, Time, the Associated Press, the Atlantic Adweek, and Fortune are all going to be part of this. And a lot more. Frankly, there hasn't been a publisher that we've talked to who hasn't said that this is a change that needs to happen. You're on the right path for it. And so across the board, not only the kind of 20% plus of the web that sits behind the Cloudflare already, but I think another 20 to 30% that are these major publishers that are out there are all on board in doing that.

Starting point is 00:40:32 And what I think has been encouraging is at the same time, we've been having conversations with the large AI companies. And all of them agree that content creators need to be compensated for their content. They all agree on that. The devil's in the details and some of them are pushing back in various ways. But I've been really encouraged that as we have talked to the largest leading AI companies, the largest technology companies in the world, they're actually leaning into this. They all recognize the content creators need to be compensated. And I think over the months to come, that's when the hard work will go down around

Starting point is 00:41:05 how do we actually create this marketplace in a way which is fair for all of the different providers in the ecosystem, treats everybody in a way that has a level playing field, still allows new entrants, doesn't just reward the largest companies with the biggest budgets that are out there, make sure that, you know, legacy providers like Google are treated the same as, you know, newer providers that are there. That's all going to be really tough. But I am incredibly encouraged by the conversations I'm having, not just with the publishers who are all on board, but actually with the AI companies who recognize that something needs to change. That's interesting that they're recognizing this, because the sense that you get is

Starting point is 00:41:43 you hear these announcements of deals like Open AI paying X million to the Wall Street Journal to be able to include their articles or Dow Jones. And the sense you get is that they're just kind of payoffs to not get sued. Like Sam Altman very happy, very clearly is not happy with the New York Times pursuing Open AI and especially the actions that the Times are taking in their lawsuit, like forcing open AI to preserve their chat logs, which I think is wrong. But it is interesting. So what do you think about is there, are we going to see an evolution from these one-off deals to this marketplace style world? Well, I think that, I mean, we've seen this story many times before. I mean, Napster was along. It was a Wild West.

Starting point is 00:42:26 There was a bunch of lawsuits from, you know, the publishing, the music industry, targeting Napster and the like. And then along comes iTunes. which starts out as $0.99 a song, but eventually evolves into what is much more, much closer to a Spotify model of a subscription and a pool of funds that then get distributed out to all the creators. So I think we've seen this story before. And I think that one of the things that's really important is that Open AI and others are willing to pay for content. They do the deals that are there. And I don't think it's right to just say, we'll do a deal to avoid lawsuits. Again, I think that when you talk to leading A.

Starting point is 00:43:05 companies, they understand that people are doing the work to create content, they need to get compensated for that content. And if it's not going to be through subscriptions or ads or ego, it's got to be through something else. And so exactly how that happens, we'll figure out. But what I know won't work is if Open AI is paying for your content, but you're giving it away for free to everyone else. It's not going to work. Open AI eventually is like, listen, we want, we want to support you. We want to help you out. But we can't be the suckers. We can't be the only one's paying where you're giving stuff away for free. And so scarcity is needed in order to actually have value in any kind of market. And so I think that the people who have actually leaned into this

Starting point is 00:43:46 the most heavily are the ones that have the existing deals with some but not all of the AI companies because they realize that for those deals to be valuable, for them to renew, for them to renew for more, there has to actually be scarcity where they're getting something of value. You can't You can't charge open AI, but give it away for free to anthropic. Something needs to actually restrict it and say, everyone needs to pay, everyone needs to be on a level playing field and figure out what that looks like going forward. Could there be some collateral damage with the solution like you're implementing? For instance, I'm looking at the names of these publications, Condonast Time, the AP, the Atlantic. I imagine they get a lot of traffic from search as it is today.

Starting point is 00:44:26 So if you put this blocker up, does that impact their SEO, for instance? Yeah, so we've been very, very careful to say that the traditional search today is not blocked. And even AI-driven search today isn't blocked. But you're going to see us give publishers the tools to differentiate between search indexing and derivative content. So the way I would think about this is the Google experience today. It may be that it publishes, I still want to appear in the 10 Blue Links, but I don't want to be in the AI of reviewer. the answer box. And the granularity of being able to say, okay, Google, I understand you use one bot, but we need that to be treated similarly. And again, I am hopeful, and in my conversations

Starting point is 00:45:06 with Google, I am increasingly hopeful that they understand the importance of this and giving that granularity. But if for some reason they don't, I am also 100% certain that regulators are paying a ton of attention to this and that around the world you will see them force Google to split their crawler out into announcing exactly what it is doing. Again, I think that that's kind of the, hopefully we get to an agreement with Google way before that has to happen. But that's inevitably, I think Google is going to have to say, you know, if you don't want us to use your content for derivatives, you have a way of controlling that while still appearing in search. Okay, a couple big picture questions before we leave. How much bigger is the web getting? And is the web sort

Starting point is 00:45:54 of accelerating the size increases that we see. Yeah, I mean, it's actually been, but by all the measures that we can see, it's actually kind of plateaued and is actually flattened out in terms of, in terms of, in terms of content, you see fewer domains getting registered, you see fewer new websites going online. I think a lot of that has moved to individual platforms. So more of that on a YouTube, more of that on a Facebook, more of that on a TikTok that is there. And I think part of that is because those tools have. provided content creators easy monetization tools to allow them to to to not have to think about

Starting point is 00:46:31 some of those problems. I think that in an ideal future, you would want content creators to be able to be free from those platforms to earn more themselves and but still have abilities to monetize that content in interesting ways. And so again, I think there are lots of people who are working on on that problem. I actually think Google has been one of the organizations that It has, again, created what was the business model of the last 30 years of the web. But the business model of the next 30 years of the web is going to be different. And we've got to think about it in a different way. It's not going to be banner ads.

Starting point is 00:47:04 It's not really probably going to be subscriptions. It's going to be something different. And so this is our attempt at one solution, but I doubt it will be the only one that emerges. Now I'm curious what I'm going to do because just the one person content operation. Well, you should certainly be charging AI to license your voice. Can I sign up to your product? For sure, absolutely. Okay, I'm going to email you after this.

Starting point is 00:47:26 And then when it comes to cybersecurity, obviously you talked about how you're dealing with all these governments that would like to hack into sites across the web. Have they been able to use generative AI tools or automated coding to become more effective at what they do? Yeah, I mean, I think that anytime a new technology comes out, bad guys are going to use it as well as good guys. And so we have seen, and we will continue to see some horror stories around, you know,

Starting point is 00:47:51 the family that was tricked by some gang into wiring their life savings because someone that sounded like their daughter called and said, I've been arrested in Mexico, you know, I need to pay to get out or other things. I think we were seeing a real rise in, especially out of North Korea, North Koreans posing as if they were applicants to various jobs. And then that is, you know, allowing them access, which they can then, you know, used to do any number of nefarious things. All of that, again, assisted by AI. So I think that's been sort of on the bad guy's side. The good news, though, is that the good guys, you know, folks like

Starting point is 00:48:36 Cloudflare, we have been using AI as well in order to not only detect these things, but get smarter at detecting attacks earlier in the process. That's working for you. At the end of the day, who wins in the AI risks, whoever has access to the most data? And I just think that the good guys are always going to have access to a lot more data than the bad guys. And so far, I feel like we have made the web more secure with AI over the course the last two and a half years and stayed way ahead of the attackers. Although, again, there are going to be horrible stories. There are going to be problems that are there. I think that it is going to be harder and harder to trust that something that you're seeing online is actually, you know, real. And we'll have to turn to

Starting point is 00:49:16 other ways that are more secure about verifying things like identity. and authentication. Okay, last question for you. We have 60 seconds. You mentioned you're a believer in this technology. What does the next couple of years in AI look like to you? Are we going to hit AGI anytime soon? Like, what's the time on you're thinking about?

Starting point is 00:49:34 I mean, I don't, I don't, I'm not, I am, I am, so I believe today that 99 out of cents out of every dollar spent on AI is just being lit on fire. But that one cent that's out there is going to generate real return. it's very hard to figure out what's kind of just a total waste of time versus what's not. You know, we see a lot of data about how much, you know, AI systems are really are being used not so much for businesses today. A lot of the business applications have been very tough to take on, but a lot of times just for like loneliness and social interactions and things like that. So I would imagine that a lot more of those things are going to develop and those will be sort of the first uses. I think the business application is actually going to take longer.

Starting point is 00:50:21 And in places where it's easier to verify the output as being legitimate, it's going to be easier. So coding, like we see that our engineers are significantly more productive by using with using AI tools than they were before. That's not causing us to hire any less engineers. It's just meaning that every engineer we hire is that much more productive. We have a huge backlog of things to do. And, and AI is helping us do that. On the other hand, you know, I am still quite skeptical that. the AI customer support agent, that is a much harder problem or the AI lawyer. That is a much

Starting point is 00:50:54 harder problem because it's just harder to tell whether something actually worked or didn't. There's no debugger in those spaces in order to figure out of what the AI is creating was actually true. And so I think you're going to see just huge leapfrogs in what are things like coding. But I think it's going to take longer for us to do things that are a little bit more difficult to verify. Very interesting. You're deeply optimistic about the technology, but still think. 99% are lit on fire wasted yeah it's going to be very interesting to check out matthew prince great to see you thank you for coming on the show thanks for having me on all everybody thank you for watching and listening we'll see you next time on big technology podcast

Big Technology Podcast - Can The Web Survive Generative AI? — With Matthew Prince

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.