CyberWire Daily - Root access to the great firewall. [Research Saturday]

Episode Date: December 13, 2025

Daniel Schwalbe, DomainTools Head of Investigations and CISO, is sharing their work on "Inside the Great Firewall." This two-part research project analyzes an extraordinary 500–600GB leak that expos...es the internal architecture, tooling, and human ecosystem behind China’s Great Firewall. Across both parts, you break down thousands of leaked documents, source code repositories, diagrams, packet captures, and telemetry that reveal how systems like the Traffic Secure Gateway, MAAT, Redis-based analytics, and modular DPI engines work together to censor, surveil, and fingerprint users at scale. Taken together, the research shows how the Great Firewall functions not just as a technical system, but as a living censorship-industrial complex that adapts, learns, and coordinates across government, telecoms, and security vendors. The research can be found here: Inside the Great Firewall Part 1: The Dump Inside the Great Firewall Part 2: Technical Infrastructure Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 You're listening to the Cyberwire Network, powered by N2K. Most environments trust far more than they should, and attackers know it. Threat Locker solves that by enforcing default deny at the point of execution. With Threat Locker Allow listing, you stop unknown executables cold. With ring fencing, you control how trusted applications behave, and with Threat Locker, DAC, defense against configurations, you get real assurance that your environment is free of misconfigurations and clear visibility into whether you meet compliance standards. Threat Locker is the simplest way to enforce zero-trust principles without the operational pain.
Starting point is 00:00:46 It's powerful protection that gives CISO's real visibility, real control, and real peace of mind. Threat Locker makes zero-trust attainable, even for small security teams. See why thousands of organizations choose Threat Locker to minimize alert fatigue, stop ransomware at the source, and regain control over their environments. Schedule your demo at Threatlocker.com slash N2K today. Hello everyone and welcome to the CyberWire's Reader's Reindeer. Research Saturday. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down the threats and vulnerabilities, solving some of the hard problems and
Starting point is 00:01:38 protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us. Basically, a data leak happened in September of this year, so a few months ago, which was an unprecedented amount of very specific details on how the great firewall actually works. And of course, when such a data leak is exposed to the public, it's always worth having a look. And my research team was kind of chomping at the bit. Like, can we look at this?
Starting point is 00:02:15 And we said, yeah, of course, with all the requisite precautions. Sometimes these dumps might be through-trap or otherwise, but this appear to be genuine. and a treasure trove of information about something that's generally been kept very secret. That's Daniel Schwabby, Head of Investigations and CISO at Domain Tools. The research we're discussing today is titled Inside the Great Firewall. There's not a whole lot of public information about the Great Firewall and how it does its things. A lot of research has been done, just trying to empirically figuring it out.
Starting point is 00:02:55 But in this particular situation, the over 500 gigabytes of internal data about the infrastructure and how it's organized, et cetera, was relief, and we dug into the data in order to write about it. Well, can you give us some insights how you start digging into a dataset that is that large? How do you go about it? Yeah, that can certainly be over.
Starting point is 00:03:21 We, you know, first took a high-level look at, like, okay, what files are? There were, like, you know, diagrams and text specifications. So you cluster those into kind of one category. And then, you know, whenever there were, you know, particular outlines of, like, human interaction, like, this is who controls it, et cetera. You put them in a different bucket. And then you start, you know, going through them. You will have to do a little bit of keyword searching. We intentionally didn't use any, like, LLM tools because we didn't.
Starting point is 00:03:51 didn't want to further proliferate information. But we have some of our own tools we can feed information to and then do a quick analysis to kind of hone in on what are the sort of large chunks, the human part of it, the technical design, and then potential what that could actually mean in terms of the real world. Well, let's dig in here from your research. How do you describe the overall architecture of the Great Firewall? I'm actually, this might be controversial, but I've actually quite
Starting point is 00:04:21 impressed. I used to work for an organization that basically, while it wasn't an ISP, ran a carrier-grade network. So I've struggled with how to do security on a 100-gabit link, and that is no small feat. Granted, my experience was probably about 10 or so years ago, so technology has come a long way. But even back then, trying to do any kind of like just anti-malware inspection of real-time in the traffic was a huge, undertaking and cost millions and millions of dollars in equipment to be able to do that properly. Now, in a state sponsor situation such as in China, the funding is less of an issue, but the sheer scale of the infrastructure is quite impressive. The fact that they figured out how to build this, if you will, digital wall that any connection sourced from the mainland in China has to go through. But also to map it out with... there is central control, which of course is important if you want to, you know, block certain types of information from leaving the country. So you have to have a central point of command
Starting point is 00:05:33 control, but then it can still be spread out to regions, and it gives regional governments some level of insight and blocking ability as well. And the fact that they managed to design this at scale that large, and it's fairly effective. I'm actually quite impressed. Yeah, the report talks about things like the traffic secure gateway and deep packet inspection. Can we dig into some of those details and how they work? Yes. A lot of the technology is being used is what's been used as regular cybersecurity best practices for years. So basically, what deep packet inspection means is the way the Internet is designed,
Starting point is 00:06:17 When information is transmitted from one point to another, it gets chunked into little datagrams called packets. And the idea is if one or two of them get lost on the way, you can either ask for a retransmission or it's not important because you can make it up from the context. So that gives these small packets that travel over the Internet that contain the information. Now, deep packet inspection essentially means in real time,
Starting point is 00:06:42 you intercept this particular packet, you peek inside and glean what information might be, included inside, whether there's like a malware hash or is it particular destinations and sources that are talking to each other. It's all very interesting. But doing that at speed to not slow the internet down significantly where somebody might become suspicious or if it's a customer complain, like why is my connection so slow? Doing that at scale is important. But it gives you an idea what two points on the internet are talking about to each other. Of course, there's things like encryption that makes this a lot harder.
Starting point is 00:07:17 but there are other techniques you can use in order to get an idea of what, even if the H-TPS connection is encrypted, you can still get an idea of what the source is trying to reach on the outside Internet and make blocking decisions based on that. Yeah, well, in the research, you all talk about this notion
Starting point is 00:07:38 of fingerprinting encrypted traffic. Unpack that for us. You talk about invisible identifiers. Why does that matter? Yeah. So, you know, of course, privacy on internet is certainly important to me, and I think a lot of people start caring about that a lot more as of late. And so basically, the internet wasn't really designed with encryption in mind. The early days, everything was transmitted in clear text, so you wouldn't really be concerned about somebody maliciously intercepting your traffic to see what was going on. Well, later on, we added some of those layers. and one of them that is very popular is the secure HTTP protocol, HGPS, which uses TLS encryption, transport layer security.
Starting point is 00:08:27 Basically, you connect with a web server, you exchange pieces of information and encrypted tunnel between your computer and the web server is created where all the data with that particular site is being exchanged, but outside observers would not be able to tell who it is that you're talking to and what information you're exchanging.
Starting point is 00:08:46 extracting the information inside the encrypted tunnel is much harder because the cryptography is pretty strong. And so doing that on the fly is still not trivial. There are entities around this world that probably can do it, but at scale is very difficult. So what you still want to know is who might a particular user on your network that you might be concerned with or have other thoughts about,
Starting point is 00:09:11 you want to know who they're talking to on the outside. Part of what TLS encryption, the protocol, introduced is the ability to obfuscate what virtual server you're talking about. What do I mean by that is on the internet, you might have a web server that has an IP address, but it could answer for multiple domains. So, for example, we have domaintools.com, but it could also answer for something like domaintools.net, et cetera. So from a strictly network connection, all you're seeing is this computer reached out to this IP address, but we don't know what the domain that is loading might be associated with that. And so there are techniques you can do a de-offuscation, essentially, by fingerprinting certain sites, by looking at the data that the browser sends, et cetera, you might be able to glean information of what specific, website out of the potential dozens that could be present on a particular IP address,
Starting point is 00:10:18 what that website is, which then gives you a good idea of what might this particular user be up to. So we're talking about looking at metadata then. Yes. Got it. Now, one of the things that your research highlights is that this is not a static thing, that this system has adaptive capabilities. Can you explain that to us?
Starting point is 00:10:40 Yes. I mean, anything at that size and scale has to be modular. You can't rely on basically a single technology here. If there's a failure or something, then the whole internet goes down for a particular country. That wouldn't be very practical. I mean, for better or worse, the Internet drives commerce around the world. And as we've seen here in the States from recent cloud provider outages, if one of them goes offline for a few hours a day, a large part of the population is having a bad day.
Starting point is 00:11:10 So the ability to sustain a functioning internet is as highest priority. So fault tolerance to a degree has to be there. And so the way it seems to be designed based on the information in the dump is that the modularization of it means that certain parts could potentially be instructed to take out one action where another part is completely unaffected or if there might be a regional protest movement or something, the administration of that particular region could say, we're going to block any and all mentioning of the following keywords, et cetera, but that might not necessarily be applied globally to the entire thing.
Starting point is 00:11:51 So different part of the country might not even be aware this is happening because otherwise that might give an idea. You want to control information, specifically within the country from one point to the other. You also have to be concerned, what do entities within your network talk to each other, hey, something's going on over here. And by having this modular design that's pushed pretty far to the edge down to the regional government
Starting point is 00:12:14 and the ability to affect blocking there is very central to the strategy that they're employing. We'll be right back. AI is transforming every industry, but it's also creating new, risks that traditional frameworks can't keep up with. Assessments today are fragmented, overlapping, and often specific to industries, geographies, or regulations. That's why Black
Starting point is 00:12:48 Kite created the BKGA3 AI Assessment Framework, to give cybersecurity and risk teams a unified, evolving standard for measuring AI risk across their own organizations and their vendors' AI use. It's global research-driven built to evolve with the threat landscape and free to use because Black Kite is committed to strengthening the entire cybersecurity community. Learn more at blackkite.com. At Capital One, we're more than just a credit card company. We're people just like you who believe in the power of yes. Yes to new opportunities. Yes. to second chances, yes to a fresh start. That's why we've helped over 4 million Canadians get access to a credit card,
Starting point is 00:13:40 because at Capital One, we say yes, so you don't have to hear another no. What will you do with your yes? Get the yes you've been waiting for at Capital One.ca.ca.js. Yes, Terms and conditions apply. Another thing the investigation mentions is you are referred to it as a state industrial censorship complex with vendors and telecom carriers and regional nodes and central policy hubs. What part do these various folks play and how significant is that for the maintenance and evolution of the system? Yeah, it's an excellent question. From what we can glean from the dump, from the data, is that basically any entity that provides internet access to
Starting point is 00:14:31 to end users within the country is by hook or crook conscripted into helping this effort. Like, there's no opting out. You want to do business in China as an internet service provider. You agree to participate in this scheme. That's the only way it works. Same thing is with mobile providers. They're still in the way internet service providers,
Starting point is 00:14:53 even though they provide telephony as well. But basically, that's the large part of the population accesses the internet for mobile devices. So wherever that gets routed before it hits the open internet, it has to be in there as well. And so internet service providers play a key role. Manufacturers of hardware that helps to route the internet, you know, transmit the traffic, etc. Those all ideally have to be optimized for that purpose. And there are a number of manufacturers in the country that it appears to be based on the information.
Starting point is 00:15:31 that was leaked, are actively cooperating and building hardware specifically that is beneficial to the type of network inspection at high rates that is needed to sustain this operation. So now we've got internet service providers, we've got hardware manufacturers, various different entities that are in the chain of bringing internet access to an end user wherever it may go. And because of the power of the state, and you're not going to do business in China without explicit approval of the state apparatus, they can exercise this control over the various pieces in order to make this all work. If we were to try to do something even remotely close, let's say, in the United States, because ISPs are independent entities, it would be very difficult to compel them to do so. same with hardware manufacturers.
Starting point is 00:16:27 They all have regular customers who will probably object vigorously to a hardware manufacturer basically building in a better way to sniff the traffic. It's been attempted various different ways, but unless you have the full control end-to-end over the infrastructure,
Starting point is 00:16:45 it would be almost impossible to pull off. But based on the information in the dump, it sure appears like they've done a pretty good job at getting that all working. So help me understand here, are there global providers of these sorts of things, hardware services that are, are they making custom versions for the Chinese market? So to my knowledge, it's focused on the actual Chinese manufacturers, you know, Huawei is one of them, of course, that's been in the news off and on over the years, but there's several others. I don't believe that there are outside China-based manufacturers
Starting point is 00:17:30 that do very specific modifications for the country. In order to be able to sell there, you may need to take some notes from the regime, but there's also a lot of companies who just simply opt to not sell in the market because they don't want to be forced to introduce potential backdoors or additional hardware in things. That's not to say that you couldn't buy particular hardware on the open market
Starting point is 00:18:01 and then modify it for your own purposes after the fact. But at scale, it would have basically required a manufacturer to cooperate. And there's enough of the technology and know-how within the country that they can lean on their domestic manufacturers pretty strongly without having to involve foreign companies. Well, given this information, how does this affect countermeasures? Things like VPNs or proxies or those sorts of circumvention tools. Do they work?
Starting point is 00:18:33 Yes. Yes and no. It certainly used to be much more of a cat and mouse game where, because anything that large, there's going to be potential small loopholes or flaws in the design that you can exploit given enough time. And so certain VPNs, a certain way of tunneling, et cetera, has been possible. And if it gets detected and figured out how it's done, then it gets blocked. So you cannot keep moving. However, the specific technical details that were released in this data dump will actually give individuals or entities who want to enable more unfiltered access
Starting point is 00:19:16 for people in the country. they might be able to use that to do even better job at circumventing things because the specific technical details of how VPNs are detected, how certain activity or patterns are detected that then cause downstream blocking or being flagged for further reuse something, that's been made public in the dump and could absolutely be used as a blueprint and how to do a better job circumventing. We haven't seen much of that yet, but it's only been a couple months,
Starting point is 00:19:45 so I suspect it's coming. Suppose I'm on an enterprise security team or maybe a global threat intelligence team. Is there anything in this data dump that helps inform how I might work with or monitor traffic from China? Yes, I would definitely think so. It depends on the level of sophistication of the entity and also their threat model. But there's enough technical information in there that would give your pretty good idea, especially if you're seeing web connections coming from mainland China, what those look like. They're all going through the great firewall. So it gives you a better idea about is
Starting point is 00:20:30 something going through the firewall or did somebody find a temporary way to basically circumvent it or get around it? Because the pattern and the fingerprint of stuff that's coming in are likely just slightly different enough that with this additional information of what to look for, you might be able to tell the one activity from the other. You mentioned it at the outset
Starting point is 00:20:56 that you were impressed by what you saw in this information. How so? How did it surprise you? Just a sheer scale. I mean, we knew the thing existed and there's been some research, external research that had been done on it
Starting point is 00:21:13 just by probing the various defenses, et cetera. There was never any specific information. Everything was basically assumptions based on observations, et cetera. But to actually have the documentation that appears to be legitimate, it's important to say, to have the documentation and see things like, yep, I thought this is how they were going to do this. Oh, no, this is completely different than, you know, maybe I would have thought up. I'm not a network engineer, so I'm not saying like my design would have been, you know, the world's greatest. But I've been doing this for 25 years, I've seen enough designs where I'm like, yeah, the faster the traffic, the bigger the bandwidth,
Starting point is 00:21:50 the much more challenging this becomes. And so the, like, I guess me being impressed was how to actually force this into being at the scale that it is and it working as reasonably well as it appears to be. That's the impressive part. Yeah. What do you suppose this does to the future here? I mean, this information being revealed, certainly I would imagine the powers that be. and China aren't happy about this. Do you suspect that there'll be any sort of pivoting here, or is this a system, you know, it's a battleship that's hard to turn on a dime?
Starting point is 00:22:27 Yeah, I think it's probably somewhere in the middle. Absolutely, it's a big operation that, you know, just to completely, you know, start from scratch and throw away all of the old paradigms, that's not going to work. Or if so, it would take a really long time in a big investment. I would certainly
Starting point is 00:22:46 feel very concerned for whomever leaked that data. I know there's a hacktivist group that took credit for it, and they certainly published it, but just looking at the specific data contained in the dump,
Starting point is 00:23:03 this almost had to have been somebody with pretty good access on the inside. In my professional opinion, this wasn't like a smashing grab hack. a hack where they found an open file share somewhere and downloaded this information.
Starting point is 00:23:18 Whoops, it wasn't probably locked down. I don't believe so. This appears to be some kind of inside job or possibly a disgruntled employees somewhere in the machine that had access to enough of this information.
Starting point is 00:23:35 It could be that it was aggregated on some system that got compromised and it wasn't really meant to be leaked. But again, given the specificity and the combination of the files in the data leak, it sure smiles like it was somebody with extreme internal knowledge and access to be able to pull all these files together. I would be very concerned for that person, and I hope they're going to be okay. I think there will be some evaluation of current techniques. We also are not 100% certain
Starting point is 00:24:07 how current the information is. Some of it appears to be very current because it talks about stuff that in the timeline can be placed, but it's also possible that there is additional technologies already being deployed that were not captured by the information in the leak. So it's going to be interesting to see what potential countermeasures, the operators of the Great Firewall might be taking as a result of this. To my knowledge, we haven't seen anything very obvious, but this is also something you'd probably want to do low and slow to as to not give away that you're already taking countermeasures. Our thanks to Daniel Schwabby from Domain Tools for joining us.
Starting point is 00:24:57 The research is titled Inside the Great Firewall. We'll have a link in the show notes. And that's Research Saturday, brought to you by N2K Cyberwire. We'd love to know what you think of this podcast. Your feedback ensures we deliver the inside. that keep you a step ahead in the rapidly changing world of cybersecurity. If you like our show, please share a rating and review in your favorite podcast app. Please also fill out the survey in the show notes or send an email to Cyberwire at N2K.com.
Starting point is 00:25:26 This episode was produced by Liz Stokes. We're mixed by Elliot Peltzman and Trey Hester. Our executive producer is Jennifer Ibin. Peter Kilby is our publisher, and I'm Dave Bittner. Thanks for listening. We'll see you back here next time. Thank you. Thank you.
Starting point is 00:25:45 Thank you. I don't know.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.