Big Technology Podcast - Can The Web Survive Generative AI? — With Matthew Prince
Episode Date: August 15, 2025Matthew Prince is the CEO of Cloudflare. He joins Big Technology podcast to discuss whether the web can withstand a wave of generative AI companies scraping content, summarizing it, and not cutting th...e original creators into the revenue. Tune in to hear Prince masterfully dissect the problem and put forth a solution.
Transcript
Discussion (0)
How will the web fight back against the wave of generative AI that is ingesting all the
content on the internet, but not paying for it? We're joined today by Matthew Prince. He's the CEO
and co-founder of Cloudflare and has been on the warpath attempting to write the ship.
Matthew, great to see you. Welcome to the show. Thanks for having me. Can you start by giving us a sense
as to what is happening to the web with the rise of generative AI? We've talked about it a bit
on the show already, but I want to hear it from you.
Yeah, absolutely. So the business model fundamentally of the Internet over the last 30 years has really been driven by search. You search for something that generates traffic. It takes you to content that someone has created. And then that content owner, that content creator, can drive value really in one of three ways. They can sell the content itself, sell a subscription to it. We see plenty of that these days. They can put ads up against it. Or they can just get the ego hit of knowing that.
somebody cares about and is reading their stuff. And that's really how the web has been built today.
It's been built chasing that traffic. What we're seeing, though, is that for the first time in history,
searches across the major search engines, Google in particular, are actually on the decline.
And what's replacing it is more and more people turning to AI. And the difference with AI is rather
than giving you 10 blue links that you click to and find the answer, now what AI does is,
it tries to give you the answer itself.
And that's meaning that people aren't going to those original sources.
And if they don't go to the original sources, then that means that you can't sell a subscription
anymore.
You can't put ads up against it.
You don't even know that people are actually getting value from your stuff.
And so what we're really worried about at Cloudflare is if the incentives for creating
content go away, why is anyone going to create content in a new AI-driven future?
So talk a little bit about how many pages.
these AI bots or search engines have crawled in the past and how much traffic they've
delivered for each crawl and where it's gone to today.
Yeah, you know, I think that the the deal that Google made with the web starting 30 years ago
when Larry and Sergey started working on the project was basically let us copy your content
and in exchange will send you traffic that, again, you can drive value in, you know, one of those
three ways from.
And we have very reliable data at Cloudflare.
going back 10 years, looking just at Google. And the metric that has stayed very consistent
over time is how much Google crawls the web. They've actually crawled at a very consistent
rate over the last 10 years. Over that same 10 years, we've actually added 2 billion internet
users. So we were at 4 billion internet users about 10 years ago. Today we're about 6 billion
internet users. So you'd imagine it's actually gotten easier to get traffic over that period of
time. But that's not what's happened. What instead has happened is,
that back in the day, if you sort of take 10 years ago as the litmus test, today, it's almost
10 times as hard to get a click to get a visitor from Google to your site. What's changed? The answer
is that Google has started providing more answers directly on the page. So if you search for
something like, when was Cloudflare founded? There will be an answer box at the top that will say
September 27, 2010 is the day that we launched. And you don't have to click to any link. In fact,
about 75 percent of queries to Google now get answered on Google itself. And what's changed
in even just the last six months that's accelerated this is they've rolled out AI overviews. And we've
tracked this from region to region to region to region. What we see is as AI is giving you the answer
without you having to read with the original content, the amount of traffic that Google is sending
to these sites has gone down and down and down. And that's the good news for publishers. If Google
has gotten 10 times harder to get traffic from over the last 10 years. Open AI is a whole different
beast. In Open AI's case, it's 750 times harder to get traffic than it was from Google just
10 years ago. In the case of something like Anthropic, it's 30,000 times more difficult to get that
traffic. So why is that? The answer is I think people are trusting the AI. They're reading this derivative
content and they're not going back to the original source. But the problem is if you're not reading
that original source, then the original sources have no way of generating value. They can't sell
subscriptions. They can't sell ads. They can't get the ego hit. And that over time is strangling
the very incentives on why content is being created. And that's the problem that we started to
really focus on about 18 months ago. And then just today on July 1st, we announced that we are
hard blocking the AI crawlers unless they will actually compensate content creators for the content
that they're creating. Okay, and we're definitely going to get into your technological solution,
so that's coming. But let's talk a little bit more about this problem. So I think the number that
you shared recently was Anthropic will crawl something like 60,000 pages for one click. That's right.
For one click. That's sent. And Open AI was somewhere in the like 10,000. Do you remember 10,000
range? Yeah, it's 1,500 pages now for every one click that they send you.
And, you know, I have to say, I'm surprised that publishers are seeing a problem now, only because
these AI products are really in their infancy.
I mean, Anthropic Claw isn't used by very many people at all.
When you think about the scope of the web,
Open AI has 500 million weekly active users.
It's pretty good, but really nothing compared to the amount of traffic
that you see on the web every day.
And I guess Google must be the problem.
So just explain why this is already showing up for publishers
because this is the infancy of generative AI.
Yeah, I think it is.
But it's one of these sea changes
that we can just see happening.
So again, for the first time in history,
searches to Google actually dropped in the last period over period.
This is your data?
Google has actually reported this,
and it actually came out in the Apple trial as well,
where they're seeing more of this traffic actually going to other sources.
And so I agree that it is a drop,
but what we're seeing is that is the trend,
that is the direction that things are heading.
And even Google itself is looking more like an AI chatbot
and less like a traditional search engine.
And so if that's the case, I think that the time for publishers to panic is now.
If we wait where more and more traffic gets strangled, less and less is going to it.
Again, I think that that's just going to mean that over time we'll have more consolidation
in the media industry.
We'll have less and less content.
We'll have actually more salacious headlines as people are chasing the content that is left
that's out there.
And we need to actually make a change to make sure that we can continue to support publishers.
Because I do believe the future of the web is going to be an AI-driven future, not a search-driven
future and that AI driven future just doesn't have the same incentives and doesn't support the
same business model that the old search driven web did. Okay, I'm going to poke at this a little more.
Yeah. You mentioned that you can now search Google when was Cloudfare founded and you'll get
the answer. That's something that Google's been doing for a long time. You could ask like when was
Martin Luther King Jr's birthday? Even before generative AI, they were giving you these answers.
So is it that the magnitude has changed? And if so, from the standpoint of a
consumer, could this be good? I mean, it's pretty annoying to type this question into Google. When was
Cloudfair founded? And then have to click to Cloudfair's website to get the answer that Google
could just surface for you. And so much of the web has sort of become effectively the service
of Google queries where websites don't really need to exist.
Well, so absolutely, this has been happening for a while. And if you look at up until six months
ago, the ratio 10 years ago of crawls from Google to clicks was two crawls, one click.
Six months ago, it was up to six crawls, one click. And that's all because of the answer box.
What the AI overviews, which they've rolled out over that time, have done is they've taken it now
to 18 crawls to one click. So yes, it is a situation of, you know, the frog boiling in water.
But that's, it has gotten progressively worse. And I think across the media industry, it's gotten harder and
harder to actually survive as a, as a publisher. And so what I worry about is, yeah, you know,
publishers are struggling at, they were struggling at six to one. They're struggling at 18 to
one. I think they're dead at 250 or 1,500 to 1 that we're seeing with Open AI and completely
dead at 60,000 to 1 we're seeing with something like Anthropic. And so that is the direction
that things are going. And that's a challenge. I think you're exactly right on the other point as
well, that the challenge here is that this is actually a better user experience. That's why more
of the web is going to turn to AI. It is great that you can type something in and you can get back
an actual response as opposed to having to hunt for it yourself. That's a better user interface.
And so that absolutely is going that direction. I'm not arguing, and I don't think anyone is arguing,
that we should just get rid of AI or that we should go back to sort of 10 blue links on Google.
What I am saying, though, is that the fuel that runs all of these AI systems, the reason that Google can tell you when Cloudflare was started or what Martin Luther King's birthday was or something like that is because somebody is doing the work of that original content creation.
That original content is the fuel that fuels Google.
It fuels all of the AI companies.
And if we strangle off the business model of those places, if we strangle off the incentives for content to create, for content creators to create content, then,
we're actually going to end up strangling the AI systems as well because if there's no content
to train on, then the AI systems are going to be pretty stupid for that. And so I think
everybody agrees that there has to be incentives that allow content creators to continue to be
compensated. The question is, what does that incentive structure look like? And that's, again,
what we've been really spending a lot of time trying to figure out. Can I just want to ask you
one more question about crawls. I think that sometimes, you know, you crawl to like,
put a website, like in search, you would crawl to put a website in your search engine or a page
in your search engine from a website. Are these generative AI bots crawling to do something
similar just to surface the information from these pages or some of the crawling being done in
service of training their models? Because if that's the case, it's actually not as big of a deal
because it's just being fed into training. I think the problem is taking that like direct
query to answer
behavior and
sort of bringing it into the search engine. So do you know
is it training or is it just surfacing
answers? So I think there are two
different parts of this. There's definitely training
and then there's what is closer to a search
like experience. If you're
familiar with this, it would be something like rag
or something where you're actually
getting that real-time data in order to
augment the foundational
model. I think in both cases, though,
you're actually costing the content
creator something. There is
literally they're paying for that traffic. They're paying for that load that the crawlers are
pulling off of it. They're also, it is the intellectual property. It's the data. It's the content
of these providers that they're using to train the models. And so there's value that the AI
companies are getting. If there weren't, they wouldn't be crawling. Right. But there's no return
of any compensation or any reward. Again, in the old days of Google, the tradeoff was,
let us copy your content and an exchange will give you traffic.
What has happened is the frog is boiled in water.
And now everyone is saying, let us copy your content and we will give you nothing in return.
And so what we're saying is simply we need a better deal, a deal for a new AI driven future.
And that should say if you are getting value from the thing that I created, then you should compensate me in some way for it.
And it may be tiny amounts of money, but at some scale, that actually turns into something that can allow content creator to continue to have an incentive to create content over time.
If we don't do that, if we don't give content creators the incentives to create content, they'll stop
creating content.
So I think you're bringing up a key point here, which is if people are like, well, you know,
I'm not necessarily seeing publisher content show up every time I'm on an LLM, what you are seeing
sometimes is the product of the publisher that's been used for training.
And even if it's like under fair use, totally fine because it's being transformed and, you know,
something crawled from big technology or the New York Times is now being used to, you know,
help basically because they're just trying to figure out what word comes next in the English language,
you know, give you an answer about summer camp. The publishers are actually enabling that.
And every time an AI crawler hits a publisher website, they have to pay. And do you work with
Wikipedia? Because they've been loud about this, that like the server costs that they have to pay
have increased exponentially. But those aren't human visitors. They're AI bots crawling Wikipedia.
So there's a real cost to just,
supporting this crawl. And before we even talk about intellectual property, before we talk about
anything else, like the content creators, the publishers are having to bear that cost. And so
it had just a simple fairness level, like, why should they be bearing the cost in order to
train, you know, these multi-billion dollar, you know, AI companies that are out there. There
should be some value, which is given back. But I think it's even beyond that. I don't even
that we have to get to, I mean, you use legal terms like fair use, and I think that's very much
up in the air right now. We literally had two different California cases that came out on both
sides of that issue. Is training on content fair use or not? And I think it's going to be a
coin flip where different courts are going to say different things. And I don't think it's a clear
answer there. But I think it's a more fundamental thing, which is if you're doing something to
create value, you should be getting some sort of compensation for that. If somebody else is, is
posing a cost on you, you should be able to charge them to offset some of that cost.
And if something's not, if someone's not willing to pay that, then they shouldn't be taking
your content in the first place.
Up until now, and everyone's focused, you know, the New York Times is suing and, I mean,
a bunch of people are doing it.
Everyone's focused on the legal issue.
I actually think that before we even get to the legal issue, this first step is actually
take the technical steps to give content creators back control over the content that they're
creating and let them have the choice.
on do I want to give access or not?
Do I want to charge for this or not?
And then done correctly, there should be a marketplace
where content creators and AI companies come together
and say, hey, I created this piece of content.
I think it's super valuable.
And the AI company says, yeah, maybe it is
or maybe it's not, but here's what we're willing to pay.
And maybe they meet the clearing price.
Maybe they don't meet the clearing price.
But that marketplace needs to exist
because otherwise, there's no way to convey value,
there's no way to derive value from content creation.
And again, I just need to have.
hammer this point home. If we don't give content creators an incentive to create content,
they'll stop creating content. And it sounds like, by the way, so you're not a skeptic of the AI
technology. You believe that this AI, generative AI thing is going to work. Not only that,
I mean, it is, it is already clear that it's going to be the interface of the future of the web.
So we're going to move from what has been the dominant internet face of the future of the past of
the web, which was search, to what the interface of the future of the web was going to be,
which is very much going to be AI.
So I believe AI is going to get better and better and better.
I actually think that done correctly, content can be created in such a way that will make AI better
and that you can create incentives for doing that.
What I worry about is in order for AI to get better, you have to have original content.
People have to be going out and creating that.
And right now, we're strangling off all of the incentives for that content creation,
which not only hurts content creators, it will ultimately hurt.
hurt the AI companies as well. So I was speaking with someone who works in data labeling or
data creation for large language models last night. And anticipation of this conversation,
I was like, you know, one day, what you're doing might look almost exactly like what
web publishers are doing, where like you might be hiring PhDs and having them like write their
information and you feed that right into a, into an LLM's training set.
and there might be, let's say, historians.
So if you take like a world history website,
the historians that are writing the web pages
for that world history website,
they must be just like maybe one day
they're going to be writing those world history articles
instead of publishing them to the web,
selling them or feeding them right into chat cheap T.
Do we lose anything if the web goes away
and it's just content creators selling stuff
to large language models?
Yeah, you know, I think the black,
mirror kind of dystopian future is not that, you know, content will stop being created,
journalists will stop existing, and researchers will stop existing. I think the black mirror
future is that we actually go back to something like the time of the Medici's, where we have
maybe five big AI companies, and they each employ a set of journalists and a set of researchers
and a set of folks that they become effectively the institutions of knowledge, and they have
they have salaries for all their their academics that are on staff. They probably each have
different, you know, maybe one of them is the conservative AI company and one of them's
the liberal AI company. You can, again, you can very much see that has actually been the
natural state of media and the natural state of controlling information for quite some time. And
you could imagine that all of that research actually consolidates behind each individual AI company
and every different academic out there is just a, is just, is basically, is, is basically,
an employee of Open AI or Anthropic or Google or Microsoft.
I think that's a pretty bad outcome because, again, I think that we, the web has been so
amazing at distributing and democratizing access to information that I think we want to create
that incentive.
And so I think what we're trying to do is say, what's the step, you know, a few steps
before the sort of, you know, all the academics are employed by one of the AI companies.
And I think the answer is you allow.
the AI companies to pay for the content that is actually valuable to them, that fills in their
models and makes their models better. And then you create incentives for independent journalists,
independent researchers to actually be able to create that content, to augment those AIs,
while still, you know, being valuable. This won't happen, but my sort of, you know, optimistic
version of the future is humans should get content for free again, because we kind of paywalled way
too much, frankly. And robots should pay a ton for it. Because again, every time a robot ingests
something, it's in service of hundreds of thousands, if not millions of different humans. So robots
should pay for that content. We should get back to a place where then humans get that for free.
Again, that's, I think it's going to be hard for us to get there. But that's, again, the future that
I think is actually the kind of optimal future. So someone hearing you and looking at this through
critical lens might say, look, Matthew, publishers depending on web traffic are barking up
the wrong tree. That selling eyeballs for CPM fractions has not been a good business for a long time.
In fact, we had a guest on the show that recently said, listen, like, he's a journalist, but he's like,
if I thought that traffic was the way to go, I'd be out of business a long time ago. And what you
really need is an audience that will, let's say, subscribe to your newsletter or listen to your
podcast, maybe come to your events, and we've already moved past this business model of trading
traffic for dollars, in which case this isn't an existential threat. What would you say to that?
I think, I mean, even then you're still trading traffic for dollars. You're just trading it
for subscription dollars, not ad dollars. That will go away as well, because what will happen
is the AI company will ingest the podcast and then summarize it on their page. And why would
they ever buy a subscription to your podcast? Why would they ever sign up for your newsletter?
if their AI agent can just simply say, tell me everything that was relevant in this particular
podcast or newsletter. Because there's an experience of listening that's enjoyable. People do that in
some point for entertainment and the leisure value. And that's how they learn. I think the AI companies
will do a very good job at creating that experience as well. So you think they'll just create like
competing. Oh, absolutely. I know, I used to think that this was like such a pie in the sky and lunatic idea
until I listened to Notebook L.M. Yeah. And like we've had multiple people on my YouTube page, be like,
did you license your voice to Notebook LM? And I'm like, no, but the fact that you're saying that
is pretty concerning. Totally. And again, I think that's the inevitable future. We're going to want to have
hyper-customized podcasts that are in exactly the voice that we find the most reassuring. And AIs
are going to create that for us. And again, they're going to be fed by original content
creators that are out there that give them the ideas, give them what to talk about, give them the news
of the day. What I think is we have to move even past the business model of subscriptions. We've got to
get to something else where you as a content creator are being compensated for the content.
The way I think about it is every one of these LLMs is a little bit like a block of Swiss
cheese.
They've got, you know, a lot of stuff there, but there are big holes that are in it.
And content that is valuable for them are the ones where they actually fill in those holes
in the Swiss cheese.
And so what I would imagine in the future is that you're able to actually surface what are the places
where there are holes in the Swiss cheese as an AI and then allow content creators to create
content that fills that in.
My favorite example of this is I was in Stockholm a couple of weeks ago, meeting with Danieleck
because there really is nobody who has done more to compensate creators at scale than Daniel.
Dan is the founder of Spotify, and they've done just an amazing job at doing this.
And he told me a story in our long conversation.
He said, you know, one of the things that we do at Spotify,
is we actually take the searches that people run at Spotify, you know, that are things like,
I want a song with a reggae beat about how much it, you know, sucks when your, you know,
your sister runs away with your car.
And has happened.
Yeah, whatever.
And it turns out that they don't have good things to fill that in.
And there are content creators out there that are making tens of millions of dollars a year,
just creating content for those searches that don't have good results right now.
now because Spotify surfaces that list of things where they don't give those results.
I think that that's actually beautiful.
I think that's actually really amazing where they are showing where is there something
that there is human need for and then how can we actually then create content to fill that
human need and then monetize it, you know, through what they're doing.
I think the same opportunity exists in the AI space where these AIs actually are able to
say, this is a, I can tell you how valuable this new piece of content is for me and you can rank
it. And then that allows you to create a marketplace where they can say, listen, that new piece
of information is so valuable that I'm willing to pay you for that. And I think that done correctly,
that then gets us to more original content creation. It gets us to less sort of Me Too copycat style
journalism, same thing in research. It gets us to maybe a place where we're, we're doing a
original research and getting rewarded for being more original as opposed to being more salacious.
Yeah, it's interesting. YouTube has a similar thing where there's an insights or an
inspiration tab. And they give you like the title, a description, and the thumbnail.
And they're like, people are searching for this. You go out and make it. Yeah, that's exactly
right. And I think that that's actually as that incredibly valuable thing. That that's making
humanity better as opposed to, you know, yet another story that's just chasing, you know,
the most salacious headline that you, that you can get. So,
you're talking about this idea where publishers might sell their, the ability to crawl
to AIs.
That is also assuming that content is scarce.
And so I want to run this other idea by you, which is that if we had the same amount
of content that we have today, that's a great idea.
But what we're seeing now is this explosion of content creation that's made through generative
AI.
Like it's kind of funny.
Like every time you see like these suggestions that we're talking about, YouTube's making
any suggestions because clearly there's traffic to be had. I'm sure there are already YouTubers
today that are feeding that into chat GPT, spitting out a script, running that through V-O-3
and Google, and then posting the videos and cashing in on traffic. So there's just going to be,
and we're in the middle, I believe, of this explosion of content. Actually, you probably have
better data on that than my suppositions. It almost feels like a D-DOS of the web where like,
you know, if the ability to create content is constrained by a human's
ability to create content, then you have something to bring to these AI companies. But if
human plus bot content starts to become the norm, there's going to be so much, then even if
you're creating high quality stuff, it's not going to matter very much to these generative
AI companies. What do you think about that? I think it's still, so first of all, I think that
there's the pure AI generated content. There's lots of research that shows that training AI on AI
data is sort of like that old Michael Keaton film multiplicity, where basically every
copy of something gets worse and worse and worse. And again, that feels like that's going to still be
still be the case for quite some time. May in the future, robots be able to go out and do, you know,
interesting reporting from the field. May they be able to do, you know, interesting research,
for sure. But today, I think that that interesting research, that interesting original content,
that interesting insight that comes from the work that right now only journalists and researchers
and others can do is still the most important thing for filling in those gaps in the in the swish
cheese of a i's what is just again high volume low value content my hunch is that that's if we
score it correctly going to be exactly what it what it is which is low value content and so it should
it should be rewarded very minimally i i like to ski so i live part of the year in park city
Utah. You're in the right place. I care. I care enormously about the snow forecast. There is a
forecaster in Utah named Evan Thayer. He writes these incredibly precise weather forecast where he
will literally tell you it's going to snow this much on this run and this much on this run.
And again, I actually pay for his content because that's super value for me. I am going to be more
willing in the future to pay for an AI that has actually licensed Evan's content back from him
than I would to pay for an AI that doesn't have that content because, again, that content is going to be, you know, super useful and unique and valuable to me.
And so I think actually what it will do is we, as we have more AI systems that are out there, is it will cause you to look for more original creative content.
And that's going to be the thing that the AI is going to be the most willing to pay for.
And that, again, I think is actually a beautiful thing where we're, we're instead of creating incentives to create more and more salacious headlines and chase traffic, we're creating
incentives to create knowledge that fills in those places in sort of the Swiss cheese where
there might be holes. Taken in aggregate, all of the AIs are probably a pretty good representation
of what human knowledge looks like. And so if we can score them and say, okay, here are the gaps
in human knowledge and here are the places we need to fill in, that actually gives a really
rich place for creators to look to create content which advances human knowledge. So, you know,
is working on weather forecasting right now.
This example that you gave of Evan Thayer, the forecaster in Utah, are we that far away from
just telling an AI, hey, like you're tapping into the deep mind model on weather forecast.
I want to ski this route today.
What's happening?
I think they're probably pretty far away from that.
But again, I think Evan is going to be always better using the tools of AI plus his local
knowledge to make this better.
AI just becomes a tool that creative people use.
use in order to tell stories better, get better information, do more research.
And again, I am skeptical that in the short term, at least, that we're going to have
real value that is created by training on purely generated content.
Okay.
So we've talked about your solution.
Let's dive into the technological side of it a little bit.
We are a tech podcast, so we should do that.
So Cloudflare, security company, helps websites stay up on the web, despite all the threats.
Yep.
And let's just at the very beginning kind of talk about like the threats that you see to websites,
who's trying to take them down.
Yeah.
What's happening on that?
Yeah.
So protecting websites as part of our business.
So is protecting employees as they go out across the internet.
So Clubflare is fundamentally kind of a network that is built with all the performance,
reliability, security, availability, and privacy guarantees that frankly the internet should have been built with,
had we all known what it was going to become.
But obviously back in the 60s, 70s, and 80s when we were laying down all these protocols,
we didn't think about those things.
And so Cloudflare is basically reverse engineering the Internet in order to give it
those performance availability, security, reliability, and privacy guarantees on top of what is there.
And so today, one of the main uses for Cloudflare would be to you're putting
a website or a web application or anything online, you want to make sure that it's safe from
different sorts of threats. And so what are the threats that we see? I mean, every day we go to war
with the Chinese government, the Russian government, the North Koreans. I mean, everyone is trying
to hack into our customers because who are our customers? Some of the largest banks in the
world, some of the largest governments in the world. And they are all constantly under threat
and constantly under attack from these organizations. The media companies actually were a pretty
small part of our business. We had some media companies that used us, but it wasn't a big
piece of it. What happened starting really 18 months ago is that those companies said,
hey, I know we hired you in order to stop the Chinese hackers, but we have this new threat
that's there. And frankly, my initial reaction was publishers. They're always whining about the
Knicks, new technology, like what's what's going on. And over and over, they said just
pull the data, pull the data, pull the data. And it was only when we actually saw the data and
saw how AI companies were taking content without giving anything of value in return, that they
were actually adding enormous amounts of load. And in some cases, taking whole websites down
because of the amount of traffic that they were sending to it. Right. They basically dedos the
websites. Deals the websites. You know, again, not intentionally. But that was the point at which we said,
listen, maybe there is something that we can do here. And, and, you know, at first, I think a lot of
the publishers were saying, oh, this is, this is so hard. There's no way we can stop it. You
know, there are these nerds and they live in Palo Alto and they're so smart, what are we ever
going to possibly do about it? And I just kept saying, guys, we go to war with the Chinese
hackers. Like, we can stop some nerds with the C corporation. And I think it took a while for that
message to really get through. But now that it has, you know, it's been really rewarding to see that
the vast majority of the world's publishers, major publishers, have said, this is, we need to change
the model. We need to be compensated for our content. And Cloudflare has the right idea in terms of
the technical solution to do that. By the way, folks, $60 billion company listed publicly.
So it's one of the bigger cybersecurity companies on the New York Stock Exchange. But I want to
ask you, okay, so we're going to get into this technological solution. But what you said is
interesting because what if these AI bots, do you ever think there's a world world with these AI
bots and just not just the publishers, but the banking websites as well? Are you like a natural
enemy to having everything go through that because if everything goes through chat GPT,
then these other sites that you secure might not need your services.
I think, I mean, there's going to be some gatekeeper for how agents and other things
access various services online.
And I think that the challenges in each of those cases are different.
In the case of a bank, you might want to say, I want to have guardrails that are in place.
I want to make sure that this is actually a customer that's accessing account.
I want to make sure that they can only conduct transactions that have been authorized by an actual human being or something like that.
Clefler actually provides those guardrails and makes it so that a bank can say,
I want to expose my infrastructure to AI, but do it in a way which is safe and secure.
I think publishers have a different challenge.
And so, you know, in our case, a way of thinking is, is like, we have a whole bunch of developer documents, which are on our website.
We want those to be an AI.
We want coding platforms when someone says, oh, I want to use Cloudflare to build X, Y, or Z for it to be able to spit that out.
What we've done is we've actually tried to identify with real narrow precision, what are those pages that are on the web that have some indication that they are going to be monetized?
And generally, that is, look at is it behind a paywall or does it have some sort of an ad unit on it, like a banner ad?
or some sort of ad that's there.
If we detect that, then we're blocking it by default.
But we're not doing this.
Again, there's value for AI, and we want to make sure that AI is actually getting the data
that people want to have in it.
So the About Us page on the New York Times probably should go into the AI system.
But in brand new article, you know, with breaking news, probably should be restricted.
And again, unless the AI company is actually paying for that content.
I guess the way I want to ask it is if everything goes into chat GPT, what's left for you
to protect thinking, thinking outside of the media world?
Well, again, I think that 80% of the AI companies are our customers of ours.
And so we protect them as well.
Okay, sounds good.
I just wanted to ask that.
I was curious about it.
But let's talk.
Okay, now, so you're going to build a technological solution that will block crawling.
Yes.
And so Robots TXT, which is this code that you put in like the header of your site,
if you don't want to be crawled, that wasn't working.
Yeah, I mean, I think robots at TXT has two.
problems. The first is some people just ignore it. And so if you ignore it, then you can still
crawl all you want. And there's some, there's some even some big legitimate companies that
completely ignore robots at TXT. And we're really good at basically being able to say, okay,
here's what robots like TXT says. How are you actually following what those, those, what sort of
the rules of the road are? And if the answer is yes, then robots at TXC is a great solution.
But in the cases where somebody is ignoring it, then we need to actually put in place additional technical barriers to restrict their access.
And so that's exactly what we're doing.
The second problem with robotics TXC is it's not granular enough.
So take the Google bot, for example.
Google's crawler does five different things, at least.
One is it checks if you have an ad on a page, make sure that if you're putting an ad for Procter & Gamble product up,
It's not against a pornographic site or something like that.
So it does brand safety checks.
The second thing that it does is crawls to index for traditional search,
the 10 blue links that are out there.
The third is that it crawls to create answers that are in the answer box.
The fourth is that it crawls to create answers that are in the AI overview,
the newer thing that they've rolled out.
And the fifth is that it crawls in order to ingest content in order to put it into Gemini.
It's a lot of crawling.
A lot of crawling, all through one crawler.
And for lots of different reasons, they don't want to split that out into various
crawlers. But right now, they basically make you have a choice. They say you can either block
Google entirely, in which case you can't run ads, you don't appear in search, but you don't appear
in the AI overviews or Gemini or other things. Or they've recently added a tiny flag, which basically
just says, I'm not going to use this data just for the Gemini piece, but you still appear in AI
overviews, you still appear in answer box. We think there needs to be more granularity where there is
a difference between taking content and transforming it. And a license should say you can't do that
without my permission versus just taking that content in order to do brand safety checks,
taking that content in order to do traditional search. And so what we're proposed and we're
working with the IETF as well as regulators is extensions to robots. TXT to give it that granularity.
And that actually then allows us to further test to watch, you know, if does this robot behave in an
appropriate way. And if the answer is yes, then maybe it gets more permissions to do things
online. If the answer is no, then we will put more restrictions and blockades in place to stop
what are, again, badly behaving robots. So what you're going to do now is, in addition to that,
put a wall up, technological wall, that's right. No crawling. Sorry, enough of you haven't respected
robots, DXT, no entrance. That's right. And so that the original, like we're all familiar with like
404 errors when something is not found success on the internet as a
200 response that comes back to you. There's actually an original, one of the protocols set out
a 402 response. And that response says payment required. And so we're actually tapping into that
exact original specification to say when a robot tries to access a page where there's an intent
to monetize it. So it's either behind a subscription or it's got ads on it, that there is an ability
for us to say 402 payments required. And then there's a negotiation. In some cases, and at first,
First, that's going to be largely large publishers with large AI companies doing deals,
like what Reddit has done or what the New York Times has done or what others have done
where they have licensed the content and then certain robots get access to that.
But in other cases, and I think over time, that will be a dynamic process where maybe a smaller
AI company or a smaller publisher will say, hey, here's what I would charge for this content.
Cloudflare will surface like how valuable that content would be for that particular AI.
And then the AI companies can decide, is that worth it or not?
And it might be a very small transaction, maybe a fraction of a penny or maybe a few cents.
Or in some cases, content that is really valuable might be worth hundreds or thousands or millions of dollars.
You could imagine Taylor Swift, you know, is about to release a brand new song and the lyrics get published.
How valuable is that for an app for teen girls who are lonely and want to talk about things?
Probably pretty valuable and especially valuable if you could have exclusive access to it for some window of time.
And so that's the sort of thing where I think a marketplace over time can develop, where
original valuable content will get compensated and there will be a clearing price in the market
once we have that scarcity that's created by that wall.
Okay, so it's not just a blocker.
It's also this marketplace where you're going to have publishers that will sell their
contents.
So that's a way where you could have useful, effective chatbots and potentially a flourishing web.
Exactly.
And that, I think, is what we're trying to play for.
Again, my utopian vision of the future is robots should pay a lot for content and humans should get it for free.
Right.
And so to kick this off on June 3rd, as the day turned July 1st, you had a party on the top of the World Trade's One World Trade Center where a bunch of publishers pressed a red button to get this thing going.
And that includes some very big names, Condonast, Time, the Associated Press, the Atlantic Adweek, and Fortune are all going to be part of this.
And a lot more.
Frankly, there hasn't been a publisher that we've talked to who hasn't said that this is a change that needs to happen.
You're on the right path for it.
And so across the board, not only the kind of 20% plus of the web that sits behind the Cloudflare already,
but I think another 20 to 30% that are these major publishers that are out there are all on board in doing that.
And what I think has been encouraging is at the same time, we've been having conversations with the large AI companies.
And all of them agree that content creators need to be compensated for their content.
They all agree on that.
The devil's in the details and some of them are pushing back in various ways.
But I've been really encouraged that as we have talked to the largest leading AI companies,
the largest technology companies in the world, they're actually leaning into this.
They all recognize the content creators need to be compensated.
And I think over the months to come, that's when the hard work will go down around
how do we actually create this marketplace in a way which is fair for all of the different
providers in the ecosystem, treats everybody in a way that has a level playing field,
still allows new entrants, doesn't just reward the largest companies with the biggest budgets
that are out there, make sure that, you know, legacy providers like Google are treated
the same as, you know, newer providers that are there. That's all going to be really tough.
But I am incredibly encouraged by the conversations I'm having, not just with the publishers
who are all on board, but actually with the AI companies who recognize that something needs to
change. That's interesting that they're recognizing this, because the sense that you get is
you hear these announcements of deals like Open AI paying X million to the Wall Street Journal
to be able to include their articles or Dow Jones. And the sense you get is that they're just
kind of payoffs to not get sued. Like Sam Altman very happy, very clearly is not happy with
the New York Times pursuing Open AI and especially the actions that the Times are
taking in their lawsuit, like forcing open AI to preserve their chat logs, which I think is
wrong. But it is interesting. So what do you think about is there, are we going to see an
evolution from these one-off deals to this marketplace style world? Well, I think that, I mean,
we've seen this story many times before. I mean, Napster was along. It was a Wild West.
There was a bunch of lawsuits from, you know, the publishing, the music industry, targeting Napster
and the like. And then along comes iTunes.
which starts out as $0.99 a song, but eventually evolves into what is much more,
much closer to a Spotify model of a subscription and a pool of funds that then get distributed out
to all the creators. So I think we've seen this story before. And I think that one of the things
that's really important is that Open AI and others are willing to pay for content. They do the
deals that are there. And I don't think it's right to just say, we'll do a deal to avoid
lawsuits. Again, I think that when you talk to leading A.
companies, they understand that people are doing the work to create content, they need to get
compensated for that content. And if it's not going to be through subscriptions or ads or ego,
it's got to be through something else. And so exactly how that happens, we'll figure out. But
what I know won't work is if Open AI is paying for your content, but you're giving it away
for free to everyone else. It's not going to work. Open AI eventually is like, listen, we want,
we want to support you. We want to help you out. But we can't be the suckers. We can't be the only
one's paying where you're giving stuff away for free. And so scarcity is needed in order to actually
have value in any kind of market. And so I think that the people who have actually leaned into this
the most heavily are the ones that have the existing deals with some but not all of the AI
companies because they realize that for those deals to be valuable, for them to renew, for them to
renew for more, there has to actually be scarcity where they're getting something of value. You can't
You can't charge open AI, but give it away for free to anthropic.
Something needs to actually restrict it and say, everyone needs to pay, everyone needs to be on a level playing field and figure out what that looks like going forward.
Could there be some collateral damage with the solution like you're implementing?
For instance, I'm looking at the names of these publications, Condonast Time, the AP, the Atlantic.
I imagine they get a lot of traffic from search as it is today.
So if you put this blocker up, does that impact their SEO, for instance?
Yeah, so we've been very, very careful to say that the traditional search today is not blocked.
And even AI-driven search today isn't blocked.
But you're going to see us give publishers the tools to differentiate between search indexing and derivative content.
So the way I would think about this is the Google experience today.
It may be that it publishes, I still want to appear in the 10 Blue Links, but I don't want to be in the AI of reviewer.
the answer box. And the granularity of being able to say, okay, Google, I understand you use
one bot, but we need that to be treated similarly. And again, I am hopeful, and in my conversations
with Google, I am increasingly hopeful that they understand the importance of this and giving that
granularity. But if for some reason they don't, I am also 100% certain that regulators are paying
a ton of attention to this and that around the world you will see them force Google to split their
crawler out into announcing exactly what it is doing. Again, I think that that's kind of the,
hopefully we get to an agreement with Google way before that has to happen. But that's inevitably,
I think Google is going to have to say, you know, if you don't want us to use your content for
derivatives, you have a way of controlling that while still appearing in search. Okay, a couple
big picture questions before we leave. How much bigger is the web getting? And is the web sort
of accelerating the size increases that we see.
Yeah, I mean, it's actually been, but by all the measures that we can see,
it's actually kind of plateaued and is actually flattened out in terms of, in terms of, in terms
of content, you see fewer domains getting registered, you see fewer new websites going online.
I think a lot of that has moved to individual platforms.
So more of that on a YouTube, more of that on a Facebook, more of that on a TikTok that is there.
And I think part of that is because those tools have.
provided content creators easy monetization tools to allow them to to to not have to think about
some of those problems. I think that in an ideal future, you would want content creators to be
able to be free from those platforms to earn more themselves and but still have abilities to
monetize that content in interesting ways. And so again, I think there are lots of people who
are working on on that problem. I actually think Google has been one of the organizations that
It has, again, created what was the business model of the last 30 years of the web.
But the business model of the next 30 years of the web is going to be different.
And we've got to think about it in a different way.
It's not going to be banner ads.
It's not really probably going to be subscriptions.
It's going to be something different.
And so this is our attempt at one solution, but I doubt it will be the only one that emerges.
Now I'm curious what I'm going to do because just the one person content operation.
Well, you should certainly be charging AI to license your voice.
Can I sign up to your product?
For sure, absolutely.
Okay, I'm going to email you after this.
And then when it comes to cybersecurity,
obviously you talked about how you're dealing with all these governments
that would like to hack into sites across the web.
Have they been able to use generative AI tools or automated coding
to become more effective at what they do?
Yeah, I mean, I think that anytime a new technology comes out,
bad guys are going to use it as well as good guys.
And so we have seen, and we will continue to see some horror stories around, you know,
the family that was tricked by some gang into wiring their life savings because someone
that sounded like their daughter called and said, I've been arrested in Mexico, you know,
I need to pay to get out or other things.
I think we were seeing a real rise in, especially out of North Korea, North Koreans posing
as if they were applicants to various jobs.
And then that is, you know, allowing them access, which they can then, you know,
used to do any number of nefarious things. All of that, again, assisted by AI. So I think that's been
sort of on the bad guy's side. The good news, though, is that the good guys, you know, folks like
Cloudflare, we have been using AI as well in order to not only detect these things, but get
smarter at detecting attacks earlier in the process. That's working for you. At the end of the day,
who wins in the AI risks, whoever has access to the most data? And I just think that the good guys are
always going to have access to a lot more data than the bad guys. And so far, I feel like
we have made the web more secure with AI over the course the last two and a half years and stayed
way ahead of the attackers. Although, again, there are going to be horrible stories. There are
going to be problems that are there. I think that it is going to be harder and harder to trust
that something that you're seeing online is actually, you know, real. And we'll have to turn to
other ways that are more secure about verifying things like identity.
and authentication.
Okay, last question for you.
We have 60 seconds.
You mentioned you're a believer in this technology.
What does the next couple of years in AI look like to you?
Are we going to hit AGI anytime soon?
Like, what's the time on you're thinking about?
I mean, I don't, I don't, I'm not, I am, I am, so I believe today that 99 out of cents out of
every dollar spent on AI is just being lit on fire.
But that one cent that's out there is going to generate real return.
it's very hard to figure out what's kind of just a total waste of time versus what's not.
You know, we see a lot of data about how much, you know, AI systems are really are being used not so much for businesses today.
A lot of the business applications have been very tough to take on, but a lot of times just for like loneliness and social interactions and things like that.
So I would imagine that a lot more of those things are going to develop and those will be sort of the first uses.
I think the business application is actually going to take longer.
And in places where it's easier to verify the output as being legitimate, it's going to be easier.
So coding, like we see that our engineers are significantly more productive by using with using AI tools than they were before.
That's not causing us to hire any less engineers.
It's just meaning that every engineer we hire is that much more productive.
We have a huge backlog of things to do.
And, and AI is helping us do that.
On the other hand, you know, I am still quite skeptical that.
the AI customer support agent, that is a much harder problem or the AI lawyer. That is a much
harder problem because it's just harder to tell whether something actually worked or didn't. There's
no debugger in those spaces in order to figure out of what the AI is creating was actually true.
And so I think you're going to see just huge leapfrogs in what are things like coding. But I think
it's going to take longer for us to do things that are a little bit more difficult to verify.
Very interesting. You're deeply optimistic about the technology, but still think.
99% are lit on fire wasted yeah it's going to be very interesting to check out
matthew prince great to see you thank you for coming on the show thanks for having me on
all everybody thank you for watching and listening we'll see you next time on big technology podcast