Hacked - Hacking Security Camera AI
Episode Date: December 2, 2024Can you trick the AI model running locally on a security camera into thinking you're a bird (and not a burglar)? We sat down Kasimir Schulz, principle security researcher at HiddenLayer, to discuss Ed...ge AI, and to learn about how AI running on your device (at the "edge" of the network) can be compromised with something like a QR code. Learn more about your ad choices. Visit podcastchoices.com/adchoices
Transcript
Discussion (0)
As somebody who has hacked one of these models now, I still think it's great that people are actually employing them.
I am firmly in the belief that, yes, we should still keep using them because the benefit hugely outweighs anything else.
If you picture a computer network and you're looking into the network, you can think of your device as being on the edge of the network.
It calls into and receives information from devices and servers that are deeper inside.
For this reason, some people call computing done locally on that device, edge computing.
This is a whole category of thing.
And if you do artificial intelligence tasks that way, locally on the device, on the edge of the network, instead of calling a server deeper inside, people have started calling that edge AI.
There are a lot of devices that do some version of this.
Maybe the cheapest, most accessible, is something like a wise camera.
A wise camera is a security camera.
So it relies on AI and machine vision for one thing in particular.
A working modern security camera needs to be programmable to send an alert to the user
if it sees a person skulking around whatever the cameras point to that.
But it can't send an alert anytime it sees motion,
because birds and cars.
So the model needs to be able to tell the difference
between a person and not a person.
You can do this in one of two ways.
The camera can send the video to a server
where it runs an AI model capable of distinguishing
that's a guy, that's a duck, that's a burglar, that's a goose.
It can run the duck versus guy model on a server
or it can try and do it locally on the device.
And there are some real security and privacy reasons why this is preferable.
It's a video feed of your house.
How much do you want it sent to some server you don't really know about?
But the question we like to ask, can you hack it?
Without touching it.
Because if you get close enough to a security camera to plug something into it to hack it,
it's going to see you.
It's going to know that's not a bird.
Our subject this episode is Kazimir Schultz.
principal security researcher at Hidden Layer, who took a run at this problem.
He was busy all DefCon given a bunch of interesting talks,
but I wanted to understand what he and his team did to crack the wise camera.
Not by walking up to it and plugging something in,
but by figuring out what the AI model running on it
is doing to distinguish a person from not a person.
And reverse engineering something,
that you could show that camera that'll make it think,
oh, that's a bird,
when in reality, that's a burglar.
Like a QR code, but instead of bringing up a menu,
it tricks a security camera into thinking you're not a person.
If a person is in the camera with whatever bad thing they have in there, the patch,
the camera does not detect a person, even though a person is there.
And then ideally, we were going to have it set up.
up in a way so that you aren't just, you know, carrying a bush or holding a tree in front of you,
something that people around you might notice.
Sure.
Right?
We wanted something really subtle.
So somebody could, you know, come up, steal package off your porch and you would never notice.
So I called them up.
This one veers a little technical on occasion.
If that worries you, know that I am not that technical and I found this process fascinating.
This is Hacking Edge AI with our guest, Kazimir Schultz, here on.
on hacked.
Casimir, thank you for joining me.
Thank you.
Thank you for having me.
Okay.
So we're here talking about the Wisecam and Edge AI.
Before we get into the story itself, what led you to want to research this?
Yeah.
So we were trying to see if there were any actual devices out there trying to use AI on device.
So rather than calling a cloud server and the model actually being in the cloud, having it
on device because that way we could actually attack it, see what people were actually using,
trying to see if there is a way to really utilize these models in a malicious way or try to
bypass them, especially with the new advancements in hardware. So people are using MPUs,
which allows these small devices to run low power AI models over a long course of time.
And just so I understand, that's what edge AI is. It's AI being run locally on the device
versus throwing it to the cloud.
Yeah. So edge AI is a term that we were actually kind of originally looking for. So
We actually had defined the term edge AI ourselves.
And then while looking around trying to find if there was any AI being run on the edge,
we saw that Wise Cam was marketing edge AI.
So Wise cam had actually marketed and named it as edge AI, which worked out pretty well.
What's the like the etymology there?
Why call it edge AI?
Like are the devices somehow defined as edge devices?
Like what does that mean?
Yeah.
So the way that the Wise system used to work is that when a Wise camera,
detects any sort of motion.
It triggers an event.
And that event used to send a photo off
to one of the Ys servers.
So in which case, the camera was the edge device,
and then the Ys server was the main server
where everything was being sent to.
And then the AI model would run on the server,
see, you know, is there a person in the photo?
Is there a package, a pet?
And then send the detections back to the camera.
So all the processing was done off device.
However, some people had privacy concerns.
They didn't want their photos being sent.
into a server. So instead, the AI was actually put onto the device. So that's why they called
it edge AI. And then for anyone that doesn't know, it's kind of intuitive at this point,
but what is a wise cam? Like, what is this? Yeah. So there are these little budget cameras.
They are fairly popular. I believe, don't quote me, but I believe that one of the cameras has
I think upwards of 70 million sales on Amazon. And they're a real,
originally meant for indoor cameras. So watching your pet while you were gone, you know,
nanny cam type work. However, since then, some outdoor cameras have been developed. There's cameras
for doorbells and they have developed the line of other products. But they normally, the ones that we
were looking at normally ranged in the $30 to $50 range. So it's a camera that's accessible to a lot
of different people. We talked a little bit about what got you into this, but broadly speaking,
why look at this device? Was there something that you thought you were looking for, something you were
trying to do when you started peeling this thing apart?
Or like, what got you looking into this?
Yeah.
So first off was, it was somebody actually marketing at J.I.
So there was a lot of AI models run on lots of devices.
However, most of them aren't going to do this well.
And what's actually kind of a fun turn of event is that I'd actually hacked the wise
cameras a few years previously.
So I had experience with them.
So actually a lot of other hackers and reverse engineers out there that I know have
worked with WiseGammer before because they published their firmware online.
And because they're on the cheaper side to buy, you can actually have a device.
If you brick one, you can always get another one.
But instead of having to try to extract the firmware yourself, you can just download it,
start doing some reverse engineering.
So just the prior knowledge of the device and having reverse engineered, plus the marketing
and GI made it a really good use case.
So they were cheap and the firmware was publicly available.
Why do you think they published the firm?
Why do you think they give you that toehold?
Yeah.
So it actually, I think, worked up fairly well for them just because they've gotten so many
reports over the years.
And the other year, they were in Pone to Own as well.
But, yeah, so it's just a choice on the company.
I don't think it actually makes, it doesn't make them more insecure.
If anything, there's more researchers looking at them.
Got it.
And for anyone that doesn't have Pone to Own, what's that?
So Pone to Own's event that happened.
I believe once a year.
And these companies, they go out, they say, hey, we have this device.
We want you to hack them.
Everything from routers to even a Tesla a few years ago was part of phone to own.
And then you get a big prize if you actually are able to exploit the vulnerability,
the day of the competition.
This is a total aside, but I know you talked about this at DefCon, I believe.
I did, yes.
Did you see any of the similar, like, did you see the Tesla that?
they had on the floor there in the Rivian? Like did you did you walk around any of that stuff?
No, no, not too much. I actually had six talks the week of death. Oh, dang. I was fairly
busy. Okay, fair enough. Yeah. I was not because I did not. So I spent a lot of time hanging out
with those people. It's very, it's very fascinating. Okay. So cheap, they're plentiful. The firmware is
available online. You had a little bit of history with it. There's almost like a culture around
hacking these things and why seems down.
for it. How does the investigation start? What are some of like the early discoveries? What kicks
this all off? Yeah. So as I had mentioned, they were part of Pone to Own, which meant that they
were publicly available exploits for older version of the firmware. And we actually had a few
older devices laying around. So we decided we would try to see if we could hone one of the,
or get a shell on one of the older devices with that older firmware since we hadn't updated yet.
And once we were actually on the device, our goal was to see if we could find the AI model.
So that's kind of where we started our journey.
So the older device we had wasn't actually supposed to have the AI model because the newer devices were marketed as having AI, the older ones, you couldn't even enable it.
But when we actually got on there, we saw there was a folder called Edge AI and there were some binaries in there.
What happened though was that the folder didn't exist inside of the actual firmware.
So we had initially reverse engineered the firmware that wasn't there.
So we noticed that it had to be downloaded some way.
And this edge of the actually actively being used by the binary of the latest cameras.
So even though we knew that, so the current camera, the older one, would not access the folder.
However, the new cameras all access the folder.
but we couldn't see the folder on the new cameras.
So we decided to poke around and reverse engineer,
and we actually found that there were AI models in that folder.
So the way we reverse engineered them is the AI model
instead of being by itself was built into a shared object.
So it loaded up by an executable and then run,
and after a bit of reverse engineering,
we saw that there were a few layer names in there.
So AI layer names are going to be,
your names are going to be things like convolution or quantize input output and reverse
engineering that we were able to see that we actually were able to get a model out
there at that point we wanted to step back a little bit because we were concerned
that maybe we were putting too much time into something that isn't actually on
the newer devices the same way so we decided to see how they're actually being
how that folder is being downloaded onto the device and what we did is
is we set up TCP dump and then deleted the folder.
And we found that there was a binary on the device called Zinker
that just redownloaded folders.
And we just ran grep for the stringed prior AI,
found it that way.
And having TCP dump, we ran the command.
So it redownloaded the binaries.
And then what we'd also done is we had dumped the client secrets
for the HTTP traffic because it was HTTP, not HTTP.
And the way that you do,
that is on some Linux devices and other devices as well, there's an environment
variable you can set. And then every time an HTTP request happens, it saves the secret for
that. And then you can just upload that into WireShark, where we were able to see that there
were multiple calls. And in those calls, we could actually see where it was going to get the binary.
So it was doing a call to see based on this firmware version, what should I have?
up, you know, firmer version camera, then it would tell you where to download the devices,
and then a final call to actually generate a one-time link that expired to actually download
the content. So it wasn't just always hosted, which is pretty interesting. So the firmware version
and the camera ID were something that we actually had for the newer cameras. So instead of trying
to hone the new cameras at first, we then went ahead, put the new, like, firmer version, new
camera ID in and we were able to download the edge AI directory for the new cameras.
And what was actually really interesting was that it was completely different files.
So with the older version, it was Lib JZDL, where the binaries.
And the new one, it was Lib Venus. And we did some online sleut thing.
We found one open source repo that had some documentation for
or engenic, which is the chip set that these cameras are based on.
So we were able to see that it was a proprietary model format for that chip set
because that chip set had the new NPU to run AI models on it.
So we started reversing that as well.
And at this point, we had two different models.
And we were able to see that they were fairly similar.
And that they were actually based off of Yolo.
And Yolo is an image recognition model.
So the way that it works is that if you pass it an image or number of images,
So a video, it will draw rebounding boxes around all the items that are in there and then classify.
So you can see if there's a person or two people or a person and a pet, which makes sense.
And so those are the sections actually coming back out.
And then from there, we needed a way to actually run the model.
But the issue was because it was the proprietary format and was built for the specific set of chips,
we actually couldn't run it locally.
So we'd initially tried emulating because it was a QMU operating system.
You were, you started to emulate the entire camera so that you could run this AI model that you had,
through that whole process, get that running on this emulation of the camera.
Yeah. Got it. So we found out that it was a MIPS architecture. We got to the point that we were
actually able to emulate the different binaries on the camera. However, we couldn't emulate the
AI model because it needed that very special instruction that was only for the
that special CPU, NPU, and wasn't actually in chemo.
So at this point, we were saying what to do next.
Yeah.
At that point, we needed to actually run on device.
However, the newest AI model didn't run on the old cameras
because the old cameras didn't have that special chip set.
So what we did is we decided to find a zero day to get onto the new camera.
Jesus, dude.
Yeah, a lot easier to do once you are actually
on a device. So instead of just reverse engineering and statically finding a zero day on the new firmware,
we decided to see if we can find a zero day on the old camera and see if it still exists in the new camera.
Cool. So what we decided to do is we decided to see if there was, instead of trying to find a really complex attack vector
where you're trying to send traffic to some cool port or something, you know, what other people have actually checked into in the past,
we decided to see if there was a simpler vulnerability that might not be as relevant to an attacker,
but is really relevant to us trying to get onto the shell, onto the device.
So part of the camera setup process is you scan a QR code for your Wi-Fi.
So it has your Wi-Fi SSID and your Wi-Fi password.
And what's really cool is that when you have your Wi-Fi SSID,
it adds that string into a command that it runs.
So then it tries to find if that S's ID is available,
which also means since they're just adding the string in,
if you have a semicolon or anything else at the end,
you can add whatever other commands you want in there as well.
And when we looked into the firmware for the new camera,
we were able to see that it actually did exist on the new camera as well.
So at that point, we were finally on the new camera, which was great.
We were able to just see all the detections,
see that all the files that we had pulled from the server,
were the same files on the camera, which was great because it meant we had actually done good reverse
engineering and not lost all that work.
So from there, now we were able to see that detections occurred and that the files existed.
But what we needed to see was we needed some way to find out what actually were the percentages
being returned.
So right now, all we got was if there was a person in the photo that the camera saw, it would
send a message to our phones.
But that's not really useful if you're trying to create an adversarial example like we did.
So that people understand, when you say create an adversarial example, what would that example look like?
What would the negative, what would the bad actor try and do that you were trying to recreate?
Yeah.
So as a bad actor, the adversarial example we were trying to create was that if a person is in the camera with whatever bad thing they have in there, the patch, the camera does not detect a person, even though a person is there.
And then ideally we were going to have it set up in a way so that you aren't just, you know, carrying a bush or holding a tree in front of you, something that people around you might notice.
Sure.
Right.
We wanted something really subtle.
So somebody could, you know, come up, steal package off your porch and you would never notice.
So when you say, Pat, you're talking about like a small physical thing somewhere on their person that would cause this camera to go, whatever that detection threshold of I think that's a human.
You're not going to trigger that if you're wearing whatever this thing is.
Cool.
Okay, please continue.
Yeah, yeah, yeah.
So at this point, to create an adversarial example,
it's so much easier when you actually know the percentages that are returned.
So you know, especially with YOLO since there's multiple classes,
we can say, hey, this is 90% of person.
And then if we add, you know, a small patch,
all of a sudden it's 80% a person, 10% dog.
And then, you know, from there,
we can slowly try to get the percentages more in our favor.
So luckily, since I'd reverse engineered wise in the past,
I knew that they dumped a lot of information into their logs, sometimes more than was necessary.
And I also knew that the logs were all encrypted with a key that was the same across all devices.
And the reason for that is they didn't want logs to necessarily be opened by a person.
So when there's a crash, the logs get saved SD card, and then you send that over to them.
And then it's easiest if they have one key that they can just decrypt the logs.
So it ended up just being an AESDBC.
We double-checked the encrypt file on the local file system.
It was all the same.
So at that point, we were able to take the encrypted log file,
decrypt it, and see all the logs from all the binaries.
And we just looked for inference or person, you know, other things like that.
And we were actually really happy to see that in the logs,
it was logging all the detection results with the percentages,
which is awesome.
Cool. So now you have a way of measuring whether or not this patch is successful or not.
You can see I'm getting a great average here. Oh, it 100% knows I'm a person. This got it down to 90. This got it down to 80. You have a row to know down, basically. Got it. Yeah. So at this point, since we could see that and we could see a few different files in the edge of I folder, we decided to take a look back at the edge of folder and see if there are any files we could mess with. And in there, there were two files.
AIS params. INI and, excuse me, modelparams.
INI.
And INI is normally used for configuration.
So we decided to look at them to those.
And you could see that all the classes
that the AI model detected were in there.
So you had person, pet, package, and face and vehicle.
And then there were thresholds as well.
So we saw that person was set to 50.
And then what we did is we set the person detect,
to it had to be 100% sure.
And we started walking in front of the camera.
And now we saw that the detection event was fired.
It saw a person 95% confident, but we weren't getting an alert on our phones.
Which meant that after it does the detection, it makes sure to see if you are above a certain threshold before sending an alert.
And even though person and face were both classes, if face was detected, the person wasn't over the threshold, it would not send an alert to your phone.
send an alert to your phone. So that meant that we now knew that our criteria was to get
person below that 50% threshold. So even though we could change the I and I file, that's not something
a regular attack can do since you have to actually be on the camera, but it let us know that that
is our goal, which helps a lot. Then we reverted that back since now we knew what we needed to do,
and we wanted to find some way to send a photo directly to the AI instead of having to walk
in front of the, you know, camera because when trying to create an industrial patch, you're sending
lots of photos.
You're not always doing them in the physical space.
You might, you know, try to put pixels here and there just to kind of get an idea of what
can happen.
And it wouldn't have been the best if, you know, we spent hours just holding up signs in different
ways in front of the camera, even though would have been funny.
Yeah, pretty funny though.
Yeah, it would have been really, really funny.
We do have some good ones.
I mean, at some point we dressed up like a package that ended up actually working.
Yeah.
Wait, like you put like a cardboard.
box on like metal gear solid style arms through the you know the sides and head yeah
cool as well uh detected package instead of person which was really fun it back a laugh when we
showed it at defcon um but yeah so anyways while we did do that for fun later on that wasn't
really the best way to go about things so we needed some way to have ourselves send an image to
the i instead of the camera send an image to the i i and so
So again, we did some reverse engineering, and we saw that there were two main binaries.
So there was eye camera, which pretty much governs the entire camera.
So that is all the logic, main logic calls other things.
And then there was this edge AI protocol, blah, blah, blah, file, like a really long name in the
edge AI directory, which loaded up the model and did inference.
And they talked to each other over a local socket on the camera.
So what we did then is we created our own socket, patched the really long name binary that
actually runs the AI to go to our new socket instead of going to the originally created
socket.
Even the end.
Yeah.
And then we wrote a Python script that opened a port on the camera.
And we sent a photo to that port.
Our Python script would add it to the socket, which would then trigger the camera or the
send it back. In the end, we had to do some patching and we had to hook into shared memory
because the way that the cameras worked, or the eye camera and the AI work is it wrote the image
to shared memory, sent it over socket and then sent an alert over the socket. So we sent the
alert over the socket after writing shared memory. AI reads shared memory, does all it does,
and then sends the result back over the socket.
Think about the last time you heard a breach story on this show.
It always starts the same way.
Someone somewhere saw something too late.
An alert buried, a signal missed, an SOC that just couldn't keep up.
Arctic Wolf set out to solve that problem by rebuilding security operations from the ground up for a world where attackers are already using AI.
They created the Aurora Super Intelligence Platform with fully agentic system powered by the swarm of experts.
Instead of single-purpose bots or lucky-guess LLMs, this swarm is full of deterministic agents that handle whole entire workflows.
Humans stay in the loop and on the loop to validate the critical decisions and keep everything trustworthy.
And all of this is just off running on their secure operations graph.
A constantly updating intelligence engine fueled by more than 9 trillion telemetry events every week and over a decade of real-world incident response.
The system reasons on real signals and real context not synthetic training data.
And the result is the new Aurora Agent SOC.
It's the first SOC that is agent led by design.
You get agents that coordinate, agents that investigate, agents that respond at machine speed,
and hundreds more that automate the repetitive work that normally buries human analysts.
Arctic Wolf didn't try and bolt AI onto an old model.
They rebuilt the model entirely.
What makes it even more effective is how it works with Arctic Wolf's concierge experience.
The team brings customer-specific context directly into the platform so every AI
driven decision reflects your environment instead of generic assumptions. The automation frees your
concierge security team to focus on higher value strategy and proactive risk reductions while the
agents handle the grind. If you want to see what trustworthy, production-ready AI and security
operations actually looks like, go to arcticwolf.com slash hacked. Never feel like cyber threats are
evolving faster than anyone can keep up? Last year, 2025 was nothing short of a record-breaking year for
major breaches, from sophisticated ransomware operators to AI-enabled attacks that turn defenses
on their head. Organizations around the world saw headlines they never expected and cybersecurity
teams were tested like never before, but here's the thing. These incidents aren't just news
headlines. They're learning opportunities. And that's why Arctic Wolf is hosting a live webinar
on February 5th diving the most impactful breaches of 2025. Their field CTO and security leaders
are going to unpack not just what happened, but why these attacks succeeded. And most importantly,
what businesses can do to fortify their defenses for it's too late.
You're going to walk away with real insights in how threat actors are evolving,
how defenders are responding,
and what strategies can help you stay ahead of the next big breach.
It's not fear-mongering.
It's practical, actionable, intelligence from experts in the trenches.
Register now at arcticwolf.com slash hacked.
So you have a mechanism by which to see how confident this AI is and what it's looking at,
and you have a mechanism by which to feed an image into that AI,
that isn't just the camera so you don't have to dress up like a package.
Exactly.
Got it.
And what was really great is now that we were hooked directly into the AI, we didn't have to look at the log files.
We were actually getting the response straight back from the AI, which was really nice.
Because to trigger a log file on the camera, you have to get the camera to crash, which we didn't want to crash the camera every time we had an image.
Yeah.
Right.
So now comes the really fun part.
So since we knew that this was YOLG.
was Yolo-ish model. We had read there were a bunch of academic papers about attacking Yolo
models, just because it's a more common model, people use it. And we'd also read some papers
about attack transferability between models that were pretty much the same. So what we did is
there's a bunch of tools out there. So we use D-Patch and Art to generate a bunch of adversarial
examples, as well as handcrafting a few of our own. So,
for a few of them, we took photos of ourselves holding up a small poster board, and then we put
images on there of the other classes. So, you know, put a car or a dog. So we did a few of those,
and then we did a few of the adversarial example ones. And we saw that about 20% of the adversarial
example ones transferred from the YOLO attack to our camera, which is awesome because we didn't
have to come up with a brand new technique to attack this AI model. So we were
able to take academic attack techniques and apply them to a real production system with 20%
is a really good rate.
The issue with a lot of those is that they were generated for non-physical.
So if you're only staying in the virtual.
So if I have a photo of myself, it's not a problem if I draw a smiley face up here and hack
it.
But I can't just walk around with a red smiley face just right there.
Yeah, sure.
So that's why we went more with the holding up on board.
So our idea was if we could override the classes that are there and make another class more
confident, then we would be able to decrease person.
And that actually worked really, really well.
So in our blog that we have released with about a 40-page blog with all the technical details,
we have the photos up there.
So if we're holding a photo of a car, it will detect car, things like that.
So there are a few limitations and I always try to make sure that I always list limitations, especially for things like this.
While we were able to do it, you had to kind of hold it at a specific angle.
So if you're walking, you might mess it up or something.
So we were able to bypass the detections fairly easily.
But it might, you know, it works in that type of setting.
It might not always work like a porch pirate.
So the takeaway is not like, oh, everyone can steal our packages.
The takeaway is like, hey, this can work against the production system.
And somebody might come up with a better patch, like a t-shirt or something that is able to be moved.
But it still opens a really interesting door for a lot of research.
Extremely.
So at the end, the best way you found to compromise this had less to do with like a random cluster of pixels that causes the AI to wake out
and more to do with getting the AI to think that that human being is actually a car or a pack.
or a dog or any number of these discrete categories that it has been told you don't need
to alert the owner in the event of dog.
You only need to alert the owner in the event of person.
And that is mainly just because we were working in the physical space.
So if, you know, it was like camera, it was some server or something, sending those other
random pixel ones would have worked great because they were transferring really well.
It's just not something you can have consistently in the physical space.
Hmm. Do we see this type of AI model used in anything to larger scale than a consumer
$100, $100 camera? Like, is this type of on-device AI being used in any other types of hardware
where your research might be relevant?
Yeah. So, I mean, image classification models are being used everywhere.
So consumer camera, maybe non-consumer cameras. So if you have the security system
for a larger building, we see them being used in industry.
So for example, in the industrial setting,
you'll have these classification models where they'll try to sort out errors in parts.
So then maybe if you're there, you could modify the part a little bit and it just,
you know, doesn't get to detect as an error, stuff like that.
So it's really interesting of just what is the potential there,
especially cars, you know, the newer cars also have classification models.
So they are being used quite widely.
Yeah, cars was kind of what was sitting in the back of my head is like,
obviously all modern cars are just network-connected computers.
They're constantly reaching out to a ton of different things.
But a lot of it would have to be local.
Yeah, and you actually have seen things like this in the past.
So a few years ago, Tesla, there was an issue where if you taped over the stop sign,
it would like run through the stop sign, things like that.
Or, you know, change the speed limit number by putting a little bit of tape to make it look weird.
If humans can say, oh, no, it's not 75 instead of, you know, 15, but a car mate.
Hmm.
Yeah, the thing I found fascinating about this, between when we agreed to have this conversation
and now read the whole report that y'all put out, was that potential for, it kind of,
it changed how I understood what these models were actually seeing as a human being.
You put a single line through a stop sign.
I'm inferring that's a stop sign.
I can still see.
It's the red.
I know what that is.
And it shifted how I understood what these models were actually perceiving when they look at something
that a guy wearing package outfit can trigger it all the way down to holding up a sign with the
right depiction of a dog. It's like, oh, they're not really, they don't have an internal model
of the object they're looking at. They're looking for very specific patterns that are quite easy
to disrupt. Exactly. So what we found was that one specifically,
a lot of image classification, as you mentioned, looks for patterns. So we found we had a lot higher
chance of success if you know we were disrupting the shoulder outline versus holding it over your chest
so it seemed that that was one of the patterns they were looking for for a person detection
yeah you just cracked you're like oh it's the shoulders it was all on the shoulders all along
that's what they're looking for interesting did you notice any did you get any other weird
little insights into what's going on inside of this this like relatively commonly used model
of a variation on it sounds like but what else did you learn about how this thing thinks
I mean, so like with pets, like the .28 years versus, you know, other things like that.
So, I mean, just the shapes that a person might detect is just the person is going to use a bit more logic rather than just detect a specific shape.
Hmm.
So it is trying, it is kind of reproducing the way a human being infers from a limited amount of information to a point.
It's sort of reproducing that, but it's still earlier days.
Huh.
So privacy is an obvious benefit of this.
It's not constantly calling to the cloud.
It can run local.
It's probably a little more efficient.
I can imagine some of the benefits of having these models running locally on these devices.
Yeah.
So, I mean, as somebody who has hacked one of these models now, I still think it's great that people are actually employing them.
Cool.
I am firmly in the belief that, yes, we should still keep using them because the benefit hugely outweighs anything else.
And we're still early days.
So the fact that something was hacked, that's not a bad thing.
That just means that people are out there doing the research
and then people are out there securing the models as well, which is great.
But instead of having your photos sent off to a server,
especially if you're using the camera inside your own house, it's a huge benefit.
It's just companies, when starting to implement these edge AIs,
they should just think about what is the worst thing that can happen when the AI system fails.
Right? So in this case, you know, you might not get a detection, but if you're still saving off all the
video, which most people aren't just because that's a huge amount of storage, you know, you
still have something there or you still have an alarm system. You know, it's still there as well.
So it's that kind of trade off there, but it's still definitely worth having those new systems
in place. You still prefer it. You like the idea to try to handle as much of this as possible
locally on the device. Interesting. I think it's intuitive and it certainly you're starting to see that
a little bit more in the way that certain AI functionality you see in the smartphone. The way it's being
marketed that we're going to handle as much of this locally as humanly possible. I was fascinated by that
transition. It felt like AI. It's everywhere now and then about nine minutes later it was like we're going to
do it locally on the device. Don't worry because the privacy implications of some of this stuff for as remarkable as
it is are horrifying, like that this piece of information that I've just fed into it is like,
oh, we're going to throw that off to a server somewhere and you're going to have no clue
where it's going.
Exactly.
You gave a talk about this at DefCon.
How did you find, how did you find people responded to it?
Like, is there a lot of excitement about this right now?
Yeah.
So this was actually probably one of my most well-received talks.
Cool.
So as I mentioned, I had six talks that week.
And for this one, people, so many people actually came to the talk that they ran out of seating space and people were standing around the seats to watch, which that's always a really good feeling.
And then, you know, we were able to put a lot of jokes in of, you know, package ban and things like that.
But for the rest of the conference, I had people coming up to me, talking to me about, you know, what we had been able to pull off.
People were asking when we were releasing the blog post when they could read the blog post.
So there was a lot of buzz around it, which was really, really great.
What surprised you most? Like outside of really specific technical details, what about this whole
process? And I guess this emerging world of embedded AI, like what shocked you?
I was surprised by how, so I come from a vulnerability of research background.
So my job in the past has been to find zero days. So go out, find the CVEs before they even
become CVEs, and then report them, they become CVEs and they get patched.
And so a lot of that skill sets, reverse engineering, you know, digging down,
deep into decompilers and disassemblers and doing all that fun stuff. And a lot of people,
especially my peers, they don't think those skill sets really transferred to the AI security side
of things. Just because, you know, so many people hear AI and they, you know, think something
that they don't really understand. So one of the things that we were actually trying to show with
our talk, which I think we were able to because people told us that that's what they understood from
it was a lot of the old skill set still applies. And it's a
It's because a lot of the new AI security that's happening out there is a little bit too focused on just the AI model or an LLM.
Meanwhile, there's all this supporting infrastructure around AI that might not always be considered.
So that was a surprise, happy surprise. I was hoping for it.
But the fact that that skill set was able to transfer over so well, which is actually how we ended up with the taking the chip off of the device.
Because that actually happened after we were able to get the model backdoor, or not backdoor, but triggered.
I don't think we are served by this stuff seeming completely inaccessible to people with even a high level of tech literacy.
The idea they're like, oh, I'm simply a vulnerability researcher.
I could never help to interface with like engage with this stuff.
It's not good.
I think it's useful when people can see that the existing, this is just built on more of the same technology.
The knowledge that you have is still relevant towards this.
Exactly. And that's just something we've been trying to say because the more people that are trying to hack AI, the better. I mean, hack AI for good, of course. But the more people who are out there doing that security research, the better it's going to be for everyone.
Especially for stuff like this where it's consumer facing. Like this is a camera in your home that you are relying on for potentially like personal safety issues. The idea that this feels like this like nebulous black box that no one could ever hope to understand. It's like that's not.
good we need to be figuring out we need this ecosystem of hackers to be tearing these things
apart figuring out how they're vulnerable very cool casimir i appreciate you taking the time to
sit down and chat with me about this um it was a lot of fun yeah thanks for having me on here
it was really great to be here talking about this
