CyberWire Daily - Root access to the great firewall. [Research Saturday]
Episode Date: December 13, 2025Daniel Schwalbe, DomainTools Head of Investigations and CISO, is sharing their work on "Inside the Great Firewall." This two-part research project analyzes an extraordinary 500–600GB leak that expos...es the internal architecture, tooling, and human ecosystem behind China’s Great Firewall. Across both parts, you break down thousands of leaked documents, source code repositories, diagrams, packet captures, and telemetry that reveal how systems like the Traffic Secure Gateway, MAAT, Redis-based analytics, and modular DPI engines work together to censor, surveil, and fingerprint users at scale. Taken together, the research shows how the Great Firewall functions not just as a technical system, but as a living censorship-industrial complex that adapts, learns, and coordinates across government, telecoms, and security vendors. The research can be found here: Inside the Great Firewall Part 1: The Dump Inside the Great Firewall Part 2: Technical Infrastructure Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
You're listening to the Cyberwire Network, powered by N2K.
Most environments trust far more than they should, and attackers know it.
Threat Locker solves that by enforcing default deny at the point of execution.
With Threat Locker Allow listing, you stop unknown executables cold.
With ring fencing, you control how trusted applications behave, and with
Threat Locker, DAC, defense against configurations, you get real assurance that your environment
is free of misconfigurations and clear visibility into whether you meet compliance standards.
Threat Locker is the simplest way to enforce zero-trust principles without the operational pain.
It's powerful protection that gives CISO's real visibility, real control, and real peace of mind.
Threat Locker makes zero-trust attainable, even for small security teams.
See why thousands of organizations choose Threat Locker to minimize alert fatigue,
stop ransomware at the source, and regain control over their environments.
Schedule your demo at Threatlocker.com slash N2K today.
Hello everyone and welcome to the CyberWire's Reader's Reindeer.
Research Saturday. I'm Dave Bittner, and this is our weekly conversation with researchers and
analysts tracking down the threats and vulnerabilities, solving some of the hard problems and
protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us.
Basically, a data leak happened in September of this year, so a few months ago,
which was an unprecedented amount of very specific details
on how the great firewall actually works.
And of course, when such a data leak is exposed to the public,
it's always worth having a look.
And my research team was kind of chomping at the bit.
Like, can we look at this?
And we said, yeah, of course, with all the requisite precautions.
Sometimes these dumps might be through-trap or otherwise,
but this appear to be genuine.
and a treasure trove of information about something that's generally been kept very secret.
That's Daniel Schwabby, Head of Investigations and CISO at Domain Tools.
The research we're discussing today is titled Inside the Great Firewall.
There's not a whole lot of public information about the Great Firewall and how it does its things.
A lot of research has been done, just trying to empirically figuring it out.
But in this particular situation,
the over 500 gigabytes of internal data about the infrastructure
and how it's organized, et cetera, was relief,
and we dug into the data in order to write about it.
Well, can you give us some insights
how you start digging into a dataset that is that large?
How do you go about it?
Yeah, that can certainly be over.
We, you know, first took a high-level look at, like, okay, what files are?
There were, like, you know, diagrams and text specifications.
So you cluster those into kind of one category.
And then, you know, whenever there were, you know, particular outlines of, like, human interaction, like, this is who controls it, et cetera.
You put them in a different bucket.
And then you start, you know, going through them.
You will have to do a little bit of keyword searching.
We intentionally didn't use any, like, LLM tools because we didn't.
didn't want to further proliferate information.
But we have some of our own tools we can feed information to
and then do a quick analysis to kind of hone in on what are the sort of large chunks,
the human part of it, the technical design, and then potential what that could actually
mean in terms of the real world.
Well, let's dig in here from your research.
How do you describe the overall architecture of the Great Firewall?
I'm actually, this might be controversial, but I've actually quite
impressed. I used to work for an organization that basically, while it wasn't an ISP, ran a carrier-grade network. So I've struggled with how to do security on a 100-gabit link, and that is no small feat. Granted, my experience was probably about 10 or so years ago, so technology has come a long way. But even back then, trying to do any kind of like just anti-malware inspection of real-time in the traffic was a huge,
undertaking and cost millions and millions of dollars in equipment to be able to do that properly.
Now, in a state sponsor situation such as in China, the funding is less of an issue, but the sheer
scale of the infrastructure is quite impressive. The fact that they figured out how to build
this, if you will, digital wall that any connection sourced from the mainland in China
has to go through. But also to map it out with...
there is central control, which of course is important if you want to, you know, block certain
types of information from leaving the country. So you have to have a central point of command
control, but then it can still be spread out to regions, and it gives regional governments
some level of insight and blocking ability as well. And the fact that they managed to design
this at scale that large, and it's fairly effective. I'm actually quite impressed.
Yeah, the report talks about things like the traffic secure gateway and deep packet inspection.
Can we dig into some of those details and how they work?
Yes.
A lot of the technology is being used is what's been used as regular cybersecurity best practices for years.
So basically, what deep packet inspection means is the way the Internet is designed,
When information is transmitted from one point to another,
it gets chunked into little datagrams called packets.
And the idea is if one or two of them get lost on the way,
you can either ask for a retransmission
or it's not important because you can make it up from the context.
So that gives these small packets that travel over the Internet
that contain the information.
Now, deep packet inspection essentially means in real time,
you intercept this particular packet,
you peek inside and glean what information might be,
included inside, whether there's like a malware hash or is it particular destinations and sources
that are talking to each other. It's all very interesting. But doing that at speed to not slow
the internet down significantly where somebody might become suspicious or if it's a customer
complain, like why is my connection so slow? Doing that at scale is important. But it gives you an idea
what two points on the internet are talking about to each other. Of course, there's things like
encryption that makes this a lot harder.
but there are other techniques you can use
in order to get an idea of what,
even if the H-TPS connection is encrypted,
you can still get an idea of what the source is trying to reach
on the outside Internet
and make blocking decisions based on that.
Yeah, well, in the research,
you all talk about this notion
of fingerprinting encrypted traffic.
Unpack that for us.
You talk about invisible identifiers.
Why does that matter?
Yeah. So, you know, of course, privacy on internet is certainly important to me, and I think a lot of people start caring about that a lot more as of late. And so basically, the internet wasn't really designed with encryption in mind. The early days, everything was transmitted in clear text, so you wouldn't really be concerned about somebody maliciously intercepting your traffic to see what was going on. Well, later on, we added some of those layers.
and one of them that is very popular
is the secure HTTP protocol, HGPS,
which uses TLS encryption, transport layer security.
Basically, you connect with a web server,
you exchange pieces of information
and encrypted tunnel between your computer
and the web server is created
where all the data with that particular site is being exchanged,
but outside observers would not be able to tell
who it is that you're talking to
and what information you're exchanging.
extracting the information inside the encrypted tunnel is much harder
because the cryptography is pretty strong.
And so doing that on the fly is still not trivial.
There are entities around this world that probably can do it,
but at scale is very difficult.
So what you still want to know is
who might a particular user on your network
that you might be concerned with or have other thoughts about,
you want to know who they're talking to on the outside.
Part of what TLS encryption, the protocol, introduced is the ability to obfuscate what virtual server you're talking about.
What do I mean by that is on the internet, you might have a web server that has an IP address,
but it could answer for multiple domains.
So, for example, we have domaintools.com, but it could also answer for something like domaintools.net, et cetera.
So from a strictly network connection, all you're seeing is this computer reached out to this IP address, but we don't know what the domain that is loading might be associated with that.
And so there are techniques you can do a de-offuscation, essentially, by fingerprinting certain sites, by looking at the data that the browser sends, et cetera, you might be able to glean information of what specific,
website out of the potential dozens that could be present on a particular IP address,
what that website is, which then gives you a good idea of what might this particular user be
up to.
So we're talking about looking at metadata then.
Yes.
Got it.
Now, one of the things that your research highlights is that this is not a static thing,
that this system has adaptive capabilities.
Can you explain that to us?
Yes.
I mean, anything at that size and scale has to be modular.
You can't rely on basically a single technology here.
If there's a failure or something, then the whole internet goes down for a particular country.
That wouldn't be very practical.
I mean, for better or worse, the Internet drives commerce around the world.
And as we've seen here in the States from recent cloud provider outages, if one of them goes offline for a few hours a day,
a large part of the population is having a bad day.
So the ability to sustain a functioning internet is as highest priority.
So fault tolerance to a degree has to be there.
And so the way it seems to be designed based on the information in the dump is that the
modularization of it means that certain parts could potentially be instructed to take out one action
where another part is completely unaffected or if there might be a regional protest movement
or something, the administration of that particular region could say,
we're going to block any and all mentioning of the following keywords, et cetera,
but that might not necessarily be applied globally to the entire thing.
So different part of the country might not even be aware this is happening
because otherwise that might give an idea.
You want to control information, specifically within the country from one point to the other.
You also have to be concerned, what do entities within your network talk to each other,
hey, something's going on over here.
And by having this modular design
that's pushed pretty far to the edge
down to the regional government
and the ability to affect blocking there
is very central to the strategy
that they're employing.
We'll be right back.
AI is transforming every industry,
but it's also creating new,
risks that traditional frameworks can't keep up with. Assessments today are fragmented,
overlapping, and often specific to industries, geographies, or regulations. That's why Black
Kite created the BKGA3 AI Assessment Framework, to give cybersecurity and risk teams a unified,
evolving standard for measuring AI risk across their own organizations and their vendors'
AI use. It's global research-driven built to evolve with the threat landscape and free to use
because Black Kite is committed to strengthening the entire cybersecurity community. Learn more at
blackkite.com. At Capital One, we're more than just a credit card company. We're people just like you
who believe in the power of yes. Yes to new opportunities. Yes.
to second chances, yes to a fresh start.
That's why we've helped over 4 million Canadians get access to a credit card,
because at Capital One, we say yes, so you don't have to hear another no.
What will you do with your yes?
Get the yes you've been waiting for at Capital One.ca.ca.js. Yes, Terms and conditions apply.
Another thing the investigation mentions is you are referred to it as a state industrial
censorship complex with vendors and telecom carriers and regional nodes and central policy hubs.
What part do these various folks play and how significant is that for the maintenance and
evolution of the system? Yeah, it's an excellent question. From what we can glean from the
dump, from the data, is that basically any entity that provides internet access to
to end users within the country is by hook or crook
conscripted into helping this effort.
Like, there's no opting out.
You want to do business in China as an internet service provider.
You agree to participate in this scheme.
That's the only way it works.
Same thing is with mobile providers.
They're still in the way internet service providers,
even though they provide telephony as well.
But basically, that's the large part of the population
accesses the internet for mobile devices.
So wherever that gets routed before it hits the open internet, it has to be in there as well.
And so internet service providers play a key role.
Manufacturers of hardware that helps to route the internet, you know, transmit the traffic, etc.
Those all ideally have to be optimized for that purpose.
And there are a number of manufacturers in the country that it appears to be based on the information.
that was leaked, are actively cooperating and building hardware specifically that is beneficial
to the type of network inspection at high rates that is needed to sustain this operation.
So now we've got internet service providers, we've got hardware manufacturers,
various different entities that are in the chain of bringing internet access to an end user
wherever it may go.
And because of the power of the state, and you're not going to do business in China without explicit approval of the state apparatus, they can exercise this control over the various pieces in order to make this all work.
If we were to try to do something even remotely close, let's say, in the United States, because ISPs are independent entities, it would be very difficult to compel them to do so.
same with hardware manufacturers.
They all have regular customers
who will probably object vigorously
to a hardware manufacturer
basically building in
a better way to sniff the traffic.
It's been attempted various different ways,
but unless you have the full control
end-to-end over the infrastructure,
it would be almost impossible to pull off.
But based on the information in the dump,
it sure appears like
they've done a pretty good job
at getting that all working.
So help me understand here, are there global providers of these sorts of things, hardware services that are, are they making custom versions for the Chinese market?
So to my knowledge, it's focused on the actual Chinese manufacturers, you know, Huawei is one of them, of course, that's been in the news off and on over the years, but there's several others.
I don't believe that there are outside China-based manufacturers
that do very specific modifications for the country.
In order to be able to sell there,
you may need to take some notes from the regime,
but there's also a lot of companies
who just simply opt to not sell in the market
because they don't want to be forced to introduce
potential backdoors or additional hardware in things.
That's not to say that you couldn't buy particular hardware on the open market
and then modify it for your own purposes after the fact.
But at scale, it would have basically required a manufacturer to cooperate.
And there's enough of the technology and know-how within the country
that they can lean on their domestic manufacturers pretty strongly
without having to involve foreign companies.
Well, given this information, how does this affect countermeasures?
Things like VPNs or proxies or those sorts of circumvention tools.
Do they work?
Yes. Yes and no.
It certainly used to be much more of a cat and mouse game where, because anything that large,
there's going to be potential small loopholes or flaws in the design that you can exploit given enough time.
And so certain VPNs, a certain way of tunneling, et cetera, has been possible.
And if it gets detected and figured out how it's done, then it gets blocked.
So you cannot keep moving.
However, the specific technical details that were released in this data dump
will actually give individuals or entities who want to enable more unfiltered access
for people in the country.
they might be able to use that to do even better job at circumventing things
because the specific technical details of how VPNs are detected,
how certain activity or patterns are detected that then cause downstream blocking
or being flagged for further reuse something,
that's been made public in the dump and could absolutely be used as a blueprint
and how to do a better job circumventing.
We haven't seen much of that yet, but it's only been a couple months,
so I suspect it's coming.
Suppose I'm on an enterprise security team or maybe a global threat intelligence team.
Is there anything in this data dump that helps inform how I might work with or monitor traffic from China?
Yes, I would definitely think so.
It depends on the level of sophistication of the entity and also their threat model.
But there's enough technical information in there that would give your pretty good
idea, especially if you're seeing web connections coming from mainland China, what those look
like. They're all going through the great firewall. So it gives you a better idea about is
something going through the firewall or did somebody find a temporary way to basically circumvent it
or get around it? Because the pattern and the fingerprint of stuff that's coming in are likely
just slightly different enough
that with this additional information
of what to look for,
you might be able to tell
the one activity from the other.
You mentioned it at the outset
that you were impressed
by what you saw in this information.
How so?
How did it surprise you?
Just a sheer scale.
I mean, we knew the thing existed
and there's been some research,
external research that had been done on it
just by probing the various
defenses, et cetera. There was never any specific information. Everything was basically
assumptions based on observations, et cetera. But to actually have the documentation that
appears to be legitimate, it's important to say, to have the documentation and see things
like, yep, I thought this is how they were going to do this. Oh, no, this is completely different
than, you know, maybe I would have thought up. I'm not a network engineer, so I'm not saying
like my design would have been, you know, the world's greatest. But I've been doing this for 25
years, I've seen enough designs where I'm like, yeah, the faster the traffic, the bigger the bandwidth,
the much more challenging this becomes. And so the, like, I guess me being impressed was how to actually
force this into being at the scale that it is and it working as reasonably well as it appears
to be. That's the impressive part. Yeah. What do you suppose this does to the future here?
I mean, this information being revealed, certainly I would imagine the powers that be.
and China aren't happy about this.
Do you suspect that there'll be any sort of pivoting here,
or is this a system, you know, it's a battleship
that's hard to turn on a dime?
Yeah, I think it's probably somewhere in the middle.
Absolutely, it's a big operation that, you know,
just to completely, you know, start from scratch
and throw away all of the old paradigms,
that's not going to work.
Or if so, it would take a really long time
in a big investment.
I would certainly
feel very concerned
for whomever leaked that data.
I know there's a hacktivist group
that took credit
for it, and they certainly
published it, but
just looking at the specific
data contained in the dump,
this almost had to have
been somebody with pretty good
access on the inside.
In my professional opinion, this wasn't
like a smashing grab hack.
a hack where they found
an open file share somewhere
and downloaded this information.
Whoops, it wasn't probably locked down. I don't
believe so.
This appears to be
some kind of inside job or
possibly a disgruntled
employees somewhere
in the machine that
had access to enough of this information.
It could be that it was aggregated
on some system that got compromised
and it wasn't really meant to be leaked.
But again, given
the specificity and the combination of the files in the data leak, it sure smiles like
it was somebody with extreme internal knowledge and access to be able to pull all these
files together. I would be very concerned for that person, and I hope they're going to be okay.
I think there will be some evaluation of current techniques. We also are not 100% certain
how current the information is. Some of it appears to be very current because it talks about
stuff that in the timeline can be placed, but it's also possible that there is additional
technologies already being deployed that were not captured by the information in the leak.
So it's going to be interesting to see what potential countermeasures, the operators of the
Great Firewall might be taking as a result of this. To my knowledge, we haven't seen anything
very obvious, but this is also something you'd probably want to do low and slow to as to not give away
that you're already taking countermeasures.
Our thanks to Daniel Schwabby from Domain Tools for joining us.
The research is titled Inside the Great Firewall.
We'll have a link in the show notes.
And that's Research Saturday, brought to you by N2K Cyberwire.
We'd love to know what you think of this podcast.
Your feedback ensures we deliver the inside.
that keep you a step ahead in the rapidly changing world of cybersecurity.
If you like our show, please share a rating and review in your favorite podcast app.
Please also fill out the survey in the show notes or send an email to Cyberwire at N2K.com.
This episode was produced by Liz Stokes.
We're mixed by Elliot Peltzman and Trey Hester.
Our executive producer is Jennifer Ibin.
Peter Kilby is our publisher, and I'm Dave Bittner.
Thanks for listening.
We'll see you back here next time.
Thank you.
Thank you.
Thank you.
I don't know.
