CyberWire Daily - Dangerous vulnerabilities in H.264 decoders. [Research Saturday]

Episode Date: May 20, 2023

Willy R. Vasquez from The University of Texas at Austin discussing research on "The Most Dangerous Codec in the World - Finding and Exploiting Vulnerabilities in H.264 Decoders." Researchers are looki...ng at the marvel that is modern video encoding standards such as H.264 for vulnerabilities and ultimately hidden security risks. The research states "We introduce and evaluate H26FORGE, domain-specific infrastructure for analyzing, generating, and manipulating syntactically correct but semantically spec-non-compliant video files." Using H26FORCE, they were able to uncover insecurities in depth across the video decoder ecosystem, including kernel memory corruption bugs in iOS and video accelerator and application processor kernel memory bugs in Android devices. The research can be found here: The Most Dangerous Codec in the World: Finding and Exploiting Vulnerabilities in H.264 Decoders Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 You're listening to the Cyber Wire Network, powered by N2K. data products platform comes in. With Domo, you can channel AI and data into innovative uses that deliver measurable impact. Secure AI agents connect, prepare, and automate your data workflows, helping you gain insights, receive alerts, and act with ease through guided apps tailored to your role. Data is hard. Domo is easy. Learn more at ai.domo.com. That's ai.domo.com. Hello, everyone, and welcome to the CyberWire's Research Saturday. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts
Starting point is 00:01:07 tracking down the threats and vulnerabilities, solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us. So the idea for this work actually started from a class project from other students back in 2018. Essentially what they did is that they were trying to fingerprint video decoders on the web. That's Willie R. Vasquez, a doctoral student at the University of Texas at Austin. Today we're discussing his research,
Starting point is 00:01:45 the most dangerous codec in the world, finding and exploiting vulnerabilities in H.264 decoders. And so they got an MP4 file and they just created a Heming Distance 1 video. So what they did is they got the video file and then flipped the bit at every single point inside of the file and then just played it back to see what would happen. And while doing this, they found that some videos were able to leak out contents of previously decoded videos. So, yeah, it was kind of weird that – so video decoding is supposed to be this deterministic process, right? Whatever you put in, you should always get back the same result.
Starting point is 00:02:48 And so they noticed every time they decoded this malformed video, something weird would pop up. So they did this for a class and wrote up the report, got some good grades, but didn't continue on with the project. Then in 2019, whenever I was looking for some research project to work on, my advisor, Hovav, suggested following up on this work and exploring more into the video decoder space. So the first task was figuring out why this specially crafted video played differently each time. And in doing that work, we started to find a lot more fun things to explore in the H.264 space. So just for my own sort of background here, I mean, in a previous world, I was in the desktop digital video world, and I remember a lot of these codecs coming to be. I remember we dealt with things like MotionJPG and H.263, and then H.264 came out,
Starting point is 00:03:48 and it was kind of like this universal codec that had the capability to contain all sorts of different things. But it had a high overhead in terms of processing power. My recollection was that a lot of this stuff was baked into hardware at the time. You would buy a video camera that would encode to this, and the encoding was on chips. Can you give us a little bit of the backstory here on what exactly is going on when it comes to compression and decompression with a codec like this? Yeah, so all our commodity devices do have specialized hardware, as you described, in order to encode and decode videos. Either they come as a part of GPUs through NVIDIA and AMD, or as a separate coprocessor on system-on-chips on devices. devices. So how this process usually works is you have an MP4 video file that contains the H.264 encoded contents. And this MP4 file is just a container format. You could have different
Starting point is 00:04:57 codecs inside. So the browser will first parse out the MP4 and then send the encoded contents down to the operating system, which would prepare the hardware to receive the encoded video. And then the hardware will then reconstruct each frame that you'd see. So one of the things that struck me in reading your research was that, and correct me if I'm wrong here, but the H.264 standard is only laid out on the decode side. Do I have that correct? Yeah, that's correct. So video encoding is actually a search problem. So how codecs work is that they find similarities within frames of a video and also across frames. And so encoding is this search problem of finding these similarities. And then the encoder will jot down the instructions to recreate each frame.
Starting point is 00:05:59 And that's what the codec specifies. So whenever you get your encoded video, the H.264 specification tells you how to take these instructions and reproduce an image. But encoding is all search-based, and a lot of it is proprietary and patent-filled. I see. I see. So just to clarify that, I mean, what it means is that the specification says, this is what we're going to do on the decode side, but the folks who are doing the compression on that side of things, it's kind of the Wild West on that side. They can do whatever they want as long as it meets the demands of what the decoder is expecting to see? Exactly, yeah. So as a part of the specification, there's all these profiles and levels, profiles, detail, what kind of features to use, what kind of features are used when compressing the video. And then the level is an estimate for the expected bit rate for playback of a video.
Starting point is 00:07:06 So decoders are meant to satisfy that, and encoding tries to reach a particular bit rate. And on the decoder side, how much documentation do we have here? Is it laid out in a very specific and overt way, or is there black magic there as well? So for the most part, there are many open source implementations. I think the most well-known one is OpenH264 by Cisco, and that is the H264 decoder actually used in Firefox for WebRTC. And also the people that create these specifications create reference encoders and decoders to compare your own custom decoders there. So there's a lot of companies that create their own decoders.
Starting point is 00:08:06 create their own decoders. And I think that's some of the problems that we were able to identify, the heterogeneity of the ecosystem of decoders. Well, let's dig into the actual security issues that you found here. Can you walk us through your research process and how you discovered things? how you discovered things? Sure. So as I previously mentioned, we wanted to understand why that specially crafted video was decoding differently each time. So to understand that, it was a lot of time spent on the reference decoder and also just looking at the H.264 spec and understanding what each item means. So how we got started is by trying to understand that video, that specially crafted video, just diving into the spec and looking at the reference decoder. and looking at the reference decoder.
Starting point is 00:09:07 And at the same time, to get a better understanding, I also began to write a decoder in Rust. This was the base of what would later become H.264. So by looking at the spec and understanding how the different syntax elements work together, I should go back and say that the H.264 spec describes video reconstruction instructions using these things called syntax elements. And so these are variables that tell the decoder how to reconstruct the image. And each syntax element is expected to have a particular range. These are known as the H.264 semantics. So what was going on in that compressed video
Starting point is 00:09:54 is that one of the prediction modes, the semantics was way off. And now, a message from our sponsor, Zscaler, the leader in cloud security. Enterprises have spent billions of dollars on firewalls and VPNs, yet breaches continue to rise by an 18% year-over-year increase in ransomware attacks and a $75 million record payout in 2024, these traditional security tools expand your attack surface with public-facing IPs that are exploited by bad actors more easily than ever with AI tools. It's time to rethink your security.
Starting point is 00:10:40 Zscaler Zero Trust Plus AI stops attackers by hiding your attack surface, making apps and IPs invisible, eliminating lateral movement, connecting users only to specific apps, not the entire network, continuously verifying every request based on identity and context. Simplifying security management with AI-powered automation and detecting threats using AI to analyze over 500 billion daily transactions. Hackers can't attack what they can't see. Protect your organization with Zscaler Zero Trust and AI. Learn more at zscaler.com security. So we've got this oddball file that's making the decoder behave or misbehave and in an unpredictable way, which, you know, anybody who works with computers would be like, wait a minute, this should be, you know, it should be repeatable, right? So how do you dig into that and explore what's going on here under the hood?
Starting point is 00:11:49 Yeah, so the first thing that we did was try to run that video under the H.264 reference decoder and see where that crashed. And that gave us an inkling of what part of the spec to look at. And then in understanding the spec, we found different areas that could be interesting to look at from a security point of view. So there are many cases in which a variable is read in and then that's used as a loop bound.
Starting point is 00:12:24 A variable is read in, and then that's used as a loop bound. And in understanding that video, in understanding the codec, we built out this tool that became H.264ge and first started to generate videos that had out-of-bounds syntax elements and just ran it on devices to see what was going on. And what did you discover when you started messing with these files? So at first, we discovered a bunch of older broken code in some Android devices. So what we were interested in is looking at how different decoders interacted. So depending on the output that you get,
Starting point is 00:13:20 you could actually identify what decoder was being used, and you could use this as a kind of web-based fingerprinting based on the output image. And in generating these videos and running it on devices, we also created different heuristics to identify videos of interest. Were you trying to sort of stress test the codec to see, you know, if we do this, it'll break here? Or isn't this an interesting way that it reacts if we mess with it
Starting point is 00:13:53 this way? Yeah, so we just started generating randomized videos, playing it on devices and seeing what would happen. So some of the heuristics that we were looking at is, does the device turn off after we play this video? So that's very interesting to see. Second was, if we decode the same video multiple times, do we get different outputs? So that was also something worthy of investigation. And also just looking at interesting log messages. And in testing Android devices, yeah, we found a couple of issues in the hardware decoder. We were able to understand that one weird video.
Starting point is 00:14:36 Essentially, how prediction inside of a particular frame works is that the frame is broken up into 16 by 16 pixels and it looks at the edges to copy down information to create a prediction of the frame. And what we found is on the top most part of the frame if you tried to predict up there shouldn't be a frame there but we were actually able to get pixels from previously decoded videos um and so that's what was going on in that in that video um it was reading stale information inside of the decoder. It wasn't resetting each time. Wow. Tell me about H.264 itself. Is it easy to work with? Is it challenging? Where does it stand?
Starting point is 00:15:35 So there are two challenges we're trying to work with H.264 encoded videos. First is that the values are encoded at the bit level, meaning that traditional fuzzers like AFL couldn't set a particular syntax element to a chosen value. This is because AFL tries to work at a byte level granularity. So that's one issue. And then the second issue is the cascading effect between syntax elements. So if you change one parameter, that's going to change how the rest of the video is decoded. So what our tool aims to do is just change one particular element, but keep everything else the same. words, it tries to make sure that the specific values are decoded correctly. And it's more on how the decoder uses those values where the issues arise. You know, it's interesting to me, again, you know, reaching back in my memory of just from the video side of things, being a video
Starting point is 00:16:41 producer. I remember, you know, year after year, the big camera manufacturers, the Sonys and the Panasonics of the world, their quality would get better each year using, you know, same bit rate, same codec, but somehow whatever they were doing in there, it would get better and better each year. And I guess that sort of speaks to some of the research that you've done here where they had a lot of flexibility or leeway on what they could do on the ENCODE side. Yeah, correct. On the ENCODE side, there's newer patents, there's faster chips. And so they can utilize different features of the H.264 codec. And also, I should say that
Starting point is 00:17:27 the codec itself has not remained dormant since it first came around in the early 2000s. Every so many years, they keep adding new stuff to it, new updates. So I think something interesting for security researchers is wherever there's new code, there's likely new vulnerabilities. And so I think our tool can help explore those issues as well. So what are the potential issues here as you all dig into it? To what degree is this actually a real world security concern as opposed to, you know, an interesting finding from a research point of view. Where do we stand there? We all believe that this is a very important area to do research in,
Starting point is 00:18:14 especially given that some of these issues in video decoders are being exploited in the wild. So in the paper, we talk about a root cause analysis done by Natalie Silvanovich of Google Project Zero for an Apple H.264 video decoder kernel bug. So I mentioned before that we were doing a lot of work on Android. But then once we learned about this in the wild vulnerability, we were like, okay, let's get some iPhones and begin poking around there. And we were able to find a lot of issues inside of older Apple video decoder SoCs. I think it's really concerning that these kinds of issues are being exploited in the wild, given that there's a possibility for zero-click exploits. So someone just sends you a video, and while a thumbnail is being generated, that goes through the same video decoding pipeline. So the vulnerability can be hit there, and you may not even notice.
Starting point is 00:19:26 Or alternatively, you're just browsing the web and you get this video in an ad. And it's fair to say, I mean, pretty much every bit of computing hardware that you get these days that has a display on it has some capability of decoding H.264 video. Definitely. I think, as you mentioned in the beginning, that this codec has been around for a while. So it's used by a lot of video companies as almost the default codec. It's assumed that every device can decode H.264. So they'll experiment with newer codecs, but they always know that they can fall back to H.264. This is why we went ahead and said the most dangerous codec in the world. Yeah.
Starting point is 00:20:10 Do you suppose that that's a big part of this here? You know, H.264, I suspect the spec was probably, certainly they were thinking about it back in the 90s, I'm guessing. And, you know, they probably weren't thinking about cybersecurity the way that we are today. Everything was hardware-based. It was a lot harder to do back then. Yeah. I think that the Kodak developers did have a good sense of the kind of issues that can arise. And inside of the spec, they do say that, hey, for each variable, this is the expected range. But the challenge comes from the actual implementation of the spec
Starting point is 00:20:55 in which errors can arise. People may miss a bounce check, and that can lead to the many vulnerabilities that we found. This tool that you all have created here, as you say, H.264, is that generally available? Can folks do their own work with it? Yeah, we're working on cleaning up the code. And we plan to release it before August when this work will be presented at USNIC Security. All right. Terrific.
Starting point is 00:21:29 Willie, thanks so much for taking the time for us. Fascinating conversation. Thanks, Dave. Our thanks to Willie R. Vasquez from the University of Texas at Austin for joining us. The research is titled, The Most Dangerous Codec in the World, Finding and Exploiting Vulnerabilities in H.264 Decoders. We'll have a link in the show notes. Thank you. ThreatLocker is a full suite of solutions designed to give you total control, stopping unauthorized applications, securing sensitive data, and ensuring your organization runs smoothly and securely. Visit ThreatLocker.com today to see how a default deny approach can keep your company
Starting point is 00:22:39 safe and compliant. The Cyber Wire Research Saturday podcast is a production of N2K Networks, proudly produced in Maryland out of the startup studios of DataTribe, where they're co-building the next generation of cybersecurity teams and technologies. This episode was produced by Liz Ervin and senior producer Jennifer Iben.
Starting point is 00:23:11 Our mixer is Elliot Peltzman. Our executive editor is Peter Kilby, and I'm Dave Bittner. Thanks for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.