PurePerformance - What is Privacy Engineering and Why Its not as complicated as it sounds with Cat Easdon
Episode Date: July 28, 2025"Privacy engineering is the art of translating privacy laws and policies into code, figuring out how to make legal requirements such as ‘an individual must be able to request deletion of all their p...ersonal data’ a technical reality.", was the elegant explanation from Cat Easdon when asked about what she is doing in her day job.If you want to learn more then tune in to this episode. Cat, Privacy Engineer at Dynatrace, shares her learnings about things such as: When the right time is to form your own privacy engineering team, why privacy means different things for different people and regulators and what privacy considerations we specifically have in the observability industry so that our users trust our services!Links:Cat's LinkedIn Profile: https://www.linkedin.com/in/easdon/Publications from Cat: https://www.dynatrace.com/engineering/persons/catherine-easdon/Blog on Managing Sensitive Data at Scale: https://www.dynatrace.com/news/blog/manage-sensitive-data-and-privacy-requirements-at-scale/Semgrep for lightweight code scanning: https://github.com/semgrep/semgrepThe IAPP: https://iapp.org/'Meeting your users' expectations' is formally described by the theory of contextual integrity: https://www.open.edu/openlearncreate/mod/page/view.php?id=214540Facebook's $5 billion fine from the FTC: http://ftc.gov/news-events/news/press-releases/2019/07/ftc-imposes-5-billion-penalty-sweeping-new-privacy-restrictions-facebookFact-check: "The $5 billion penalty against Facebook is the largest ever imposed on any company for violating consumers’ privacy and almost 20 times greater than the largest privacy or data security penalty ever imposed worldwide. It is one of the largest penalties ever assessed by the U.S. government for any violation." I think that's still true; the largest fine under the GDPR was €1.2 billion (again for Facebook/Meta)
Transcript
Discussion (0)
It's time for Pure Performance.
Get your stopwatch is ready.
It's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello, everybody, and welcome to another episode of Pure Performance.
My name is Brian Wilson.
As always, I have with me, my co-host, Andy Grabner.
How are you doing, Andy?
Sassy Andy today.
Sassy, why sassy?
You're pretty sassy today.
Okay.
Are you familiar with that term at all or no?
I am, yeah.
Okay, yeah.
You came on to the call with jokes and funny stuff.
And, you know, our listeners are going to expect the absolute best episode ever from us today because...
They will.
They will.
Why will they get it?
Well, the thing is, first of the...
First of all, I want to remind everybody what you will get at the end of the hour.
Hopefully, tickets to Paul McCartney.
Hopefully.
Hopefully.
Yeah.
And I think you earn it because I just came back from a weekend and I saw two concerts.
I saw Robbie Williams and I saw Sukoro.
I don't know the second one.
I know that Robbie Williams, they made a biopic about him and it was an ape.
Yeah, it was an amazing movie.
I've heard.
I've heard it's really good.
But I don't know much about him.
But, yeah.
In the end, in his movie, there were no secrets and nothing if he was shying away from any privacy.
He shared a lot of stuff from his personal life, which I think is a perfect segue now to the topic.
It is about time to introduce our guest today.
And I'm very happy that we have one of our colleagues on the call today.
Kat Easton, thank you so much for being here.
I think we met, well, we met probably multiple times.
But I remember when you were giving an internal talk in one of our engineering labs around privacy engineering.
And then I approached you afterwards and said, hey, you have a lot of stuff, a lot of knowledge on a topic that I am not at all familiar with.
But that I think everybody should have a basic understanding.
And so that's why we invited KET to Pure Performance.
And now I think I want to pass it over to you.
I want to quickly get an introduction, who you are, what you do.
where you are, because we see from the Zoom background,
you have a beautiful landscape from Innsbruck.
I think that's what you said, but maybe just fill us in.
Sure. I'm very glad we had that conversation, Andy.
I look to Connectors. Thanks for having me on the podcast.
So, yeah, I'm Katz. I currently lead Dantros' privacy engineering team.
My background before that was in hardware security research,
and I still do a little bit of research on the side.
And I'm based in Innsbruck,
but as you can probably tell from my accent
and the fact I say privacy rather than privacy,
which is a running gag,
confuses quite a lot of people.
I'm originally from the UK.
Cool.
And what brought you to Innsbruck?
A mix of things.
I absolutely love spending time in the mountains.
And also personal reasons my partner is based in Munich.
So when I came to Austria, originally I was in Gratz.
I think that is actually where we first met Andy
years ago when you did a visit to our Graz lab.
But then the privacy engineering team didn't exist.
I was working in licensing.
And then over time, reassessed where I wanted to be based.
And Innsbruck was a natural fit.
It's full of crazy people like me.
We're just completely obsessed at the mountains.
They're best kind of crazy.
That's a good advertisement, too.
A, for the region of Tyrol.
So, folks, if you are...
If you're crazy about this.
if you're crazy about mountains
I mean I know
obviously Denver Colorado
would be a good place to Brian
but in Sprook Tyrol
is also a great spot
Well there are no mountains in Denver
Yeah but close by
I mean
I was just busting your chops
I think you've got better
better power than us Brian for skiing
so
Denver does well on that one
We do a lot of fake
fake snow though
isn't a ball of our water
Yeah
I think
Fake snow, artificial snow is something we unfortunately more and more rely on as well because of changing weather.
But let's go back to the topic.
You gave me a great list of topics we could talk about and we'll see how much we can actually fit into this episode.
And we may ask you back.
But two things stood out.
And I want to start with privacy engineering.
What is it?
What is privacy engineering?
What can we as people that are not in that field?
understand about privacy engine. What do we need to know?
Excellent question. How much time we have? Could we go on till two, three hours?
This is a fun question because it's still a struggle to answer, even for those of us, based in the field.
I'd say the answer is it's a very broad term, but one of my favorite ways to describe it is it's the art of trying to translate
laws and policies to protect privacy into code. And you can do that at lots of different levels.
so some privacy engineers don't write any code, but they work on the intermediate stages.
So perhaps translating legal requirements into policy requirements that we can apply across the whole
organization, figuring out processes to implement that across the software development
lifecycle. Or you can be, as I am and as Dan Trace's privacy engineering team, is really
focused at the code level and building privacy features, but also looking at how we can integrate
privacy checks into the software development lifecycle, checks on poor requests, for example.
You mentioned earlier, and I'm not sure, you know, if you can answer this question, but I'll give it a try.
Nevertheless, you said when you started, when we met, you were in Graz, you were in a different team.
There was no privacy team yet.
See, privacy.
I'll try to adjust my.
I also learned British English, but I think I spent too much time in the States.
So my question is, back when you were in Graz, you said there was no privacy team.
why the move from Dinah Trace
to invest in such a team
to build such a team
is this something that other organizations
also do at a certain size
have things changed over the years
what are some good indicators
when an organization
a software organization or any type of organization
needs a dedicated team
another great question
and I would say the key
indicator is when the people tasked with privacy are really struggling to implement what they need
to implement across the organization. And that will vary depending on the organization, what their
product or services. It could happen at very different sizes. So if this is a really sensitive use
case, that could happen just when you're a startup of 10 people, you might already want privacy
engineer. We saw that, for example, with the AI companies, Open AI, Anthropic, invested in
privacy quite early on because, as we know, people will send absolutely anything to an AI model
really sending sensitive details of their lives, and so privacy is essential. Whereas if you have a
product where it's very unlikely, somebody is going to send you any sensitive data, then it might
be that only once you get to a very large scale, you start thinking about having a technical
component to privacy, because you might be able to handle all the requirements you need to meet
and protect your users and maintain your customer's trust and how you're protecting their data
with less technical measures, organizational measures, essentially policies
that are applied across the organisation.
So it really depends on what your development life cycle looks like,
what the organisational context is.
I wish I could give you a fixed size of once an organisation crosses 200 people,
that's when you hire a privacy engineer would also help create some jobs for us
because it's a really niche field.
So anyone listening, please establish a privacy engineering team at any size.
It is an interest.
The reason why I ask it, I've been spending a lot of my last couple of months and maybe a year or two on a platform engineering, right?
And the question always comes up, what is a good time to invest in a platform engineering team?
What is a good time to think about building an internal product for self-service?
And there are, as you said, there's no fixed number.
But it's actually similar to what you said earlier when there's too much effort.
or when actually when there's too many people within an organization
struggling with the same things
or implementing their own self-service portals
their own tools to make the day-to-day life easier,
this is then a good point when you say,
well, maybe we're investing in a central platform engineering team
and that's why I was asking the question
if there's any best practices, any surveys,
any reports out there that people can look into
and get some ideas.
I really like that comparison, because something I want to emphasize is it's not just privacy engineers who do privacy and actually build technical solutions for privacy.
So it could be that you have engineers across your organization who are working on how to detect personal data, for example, to mask it.
And if you find you have three separate teams who are reinventing the wheel each time, then that could be the time when you want a central privacy engineering team who could maintain that offer it as a shared library that's offered to everyone.
Yeah. And I think that's also another thank you for that. I will take this into my repertoire when I talk about platform engineering, right? Because in the end, it's once we start duplicating efforts, this basically means you are becoming very inefficient as an organization because you don't hire these people to reinvent the wheel. Often, though, it's not visible, right? Often we lack the visibility that other teams are solving the same problem.
And I think it also needs an organization's maturity to say, hey, we need to look into everything we do.
We need to understand where do we have duplicated efforts to solve essentially the same problem
and how can we make this easier and provide it as a, as you say, the shared library.
In platform engineering, we talk a lot about an IDP and internal development portal where people can go to and then consume things as a self-service.
got another question for you
what
what is
what is privacy data
privacy data is this is there a global
definition
of what is data
that needs to be protected
or are there many different
regulations around the globe
that we especially
you know we are working for a global
operating organization
do we have to then
adhere to different
regional laws or different regional definitions of privacy?
Yes, so DynTrace definitely does, and so do other organizations operating across multiple countries.
And this is where it gets really challenging because there are very different definitions.
So it's something like an email address.
Most laws agree that that is personal data.
So we have some data types where you can reliably say pretty much wherever we are in the world,
this is personal data and we need to protect it.
But then you have sector-specific requirements.
requirements in healthcare, for example, that's the HIPAA privacy rule in the US to look into.
And then specific identifiers that may not seem sensitive to us, but in a different context, are
sensitive. So, for example, in some regions, the world, your tribal affiliation is considered
highly sensitive data about you. That's not something that springs to mind for me in my cultural
context. That's where it's really tricky because it's not just about your legal knowledge,
but also some element of cultural understanding and trying to put yourself in the shoes of
somebody who has a very different life to your own. So something I find easier to focus on
rather than the specific legal requirements. And this is something I use when I'm trying to
train up engineers, rather than teaching them, this is what's required in the EU, this is what's
required in India. That's too much detail. It becomes overwhelming to somebody who isn't a
specialist in this area. So I instead focus on meeting people's expectations. So I ask them to think
about the context of this data transfer and say, would this meet your expectations in this
scenario, would you be surprised that this data is being collected about you and it's being
used for this purpose? An example may be if you're at the doctors, you're very happy to share
sensitive health information because you want the doctor to diagnose you and help you get better.
but if you're asked the very same questions in a job interview, that would be a massive red flag,
and you're not going to work for that company because something strange is happening at that company.
So the context really matters. The setting that you're in really matters.
So I find that a more helpful framing than which data types do we need to care about.
Of course, we do need to think about data types at times.
But when you're thinking about a specific scenario, will this violate some privacy law or not?
instead of each engineer
trying to think through legal requirements
just think about
would this meet my expectations
would I be surprised
the challenge there
coming back to something I mentioned before
regarding the putting yourself
in the shoes of somebody
who's very different to your own
that's where this comparison can fall down
if you don't have a very diverse team
and somebody thinks well I would be very happy
to share this there's no danger to me
but actually somebody in a minority group
it might be really dangerous to share that information
so that's the limitation with this
but at least it saves people having to learn a lot of legal detail.
I think this is also a call out that diversity is a very good thing
because you have a lot of different perspectives and opinions and backgrounds
that together can then really form a strong sense and a strong foundation for good
previous engineering and not just previous engineering in general,
obviously having many different people with different backgrounds and different thoughts
plays out and pays out pretty well.
I was just going to say to the, it was an interesting idea with the question of, you know, what I find this in awkward conversation or, right, and having the diverse broad background thoughts on it because I think this comes up in, not to change the subject in any way, shape, or form, but when you think about like social media or all these different apps that are collecting our data in order for free usage and so many people are like, yeah, well, I don't care, right?
but there are a lot of people who do care, right?
But that obviously gets put to the wayside for the free product.
But when it comes to a corporation and having this data sharing,
it's a lot different than just what's in the public view.
But the idea of some people would be okay with it, right?
The reason you said you need a diverse background of people
is to have that setup, and it's something you see all around.
Like, what do I have to hide?
but there is a principle behind it
which is a lot harder to pin down
with just one or two people
I don't know if that made sense
but it brings me to the next question
and kind of like
I guess we now we are in 2025
and I guess we ended up here
with a lot of regulation because of
you know maybe historical events
where some people were very relaxed
on these things and we're just doing things and collecting data and not protecting it in the
right way that they shouldn't have.
Is there any, are there any, can you give us a little bit of a history overview of how we ended
actually up here where we are and things that we should, that we should know about why
privacy, privacy engineering is what it is today?
So I'll assume your listeners are relatively familiar with what's happening.
I'm currently how data has been monetized over the last few decades.
and that's brought us some amazing advances.
So the cool things we can do with LLMs now
would never have been possible
without those LLMs ingesting enormous amounts of data.
Similarly, facial recognition and computer vision algorithms
were trained on huge quantities of images
that were collected usually without consent
because there was no legal requirement to get consent at the time.
So this has enabled lots of cool things
that we now enjoy, sometimes depend on,
in the systems that we're building or using.
But this wild west we had on the internet also led to lots of exploitation of personal data.
And I think the most shocking examples of where we saw elections being interfered with, you know, micro-profiling of people based on what seemed like non-sensitive data about them.
You know, Facebook likes, I like dogs, I listen to Britney Spears, all these innocuous sounding things that how could you possibly influence a person's voting decisions with this information?
actually, if you have enough of those data points, you can profile them into quite a fine category
and guess which interests are, which voting concerns are of the most interest for them,
then target them with advertising, which is far more effective than broad spectrum political advertising as we used to have.
So this came up, for example, with Facebook.
for this case where, and it wasn't solely about election interference,
it was more the fact that they gave third parties extensive access to Facebook users' data,
which they could use for a variety of purposes, including this micro-profiling and targeting.
They were hit with, I believe it's still the largest fine ever, I may be wrong there,
was $5 billion from the Federal Trade Commission in the US.
So that, that I'd say was one of the,
turning points. Regulators started to notice we really need to fight back against this.
This is having very severe societal impacts. It's not just the individual level impact,
which is of course important, but societal level impacts. But how do you fight back against
it when you have these companies that have based their business model on this exploitation of
data? You could try to regulate the amount of existence, but their annual revenue makes them the size
of large countries in some cases. So you end up with these tech companies that are really
political actors on the world stage
and still nobody knows
what to do with them. Nobody knows how to
negotiate with these new political actors
who have the same power as states.
Canada tries to regulate
Facebook and Facebook just says no
as if they were negotiating country to country.
I don't want to demonize
just Facebook here. This is an issue across
lots of different companies
but they came to mind because of that
particularly high fine from the FTC.
So in that context
and coming back to the EU
context that I'm most familiar with, we have the GDPR emerging. And I think it's really important
to view this is not just privacy law, but there's a broader topic of digital regulation,
trying to work out how to negotiate with these companies as international actors. It's a grand
political project. It's very opinionated. And that's important to understand because we see
a lot of pushback against it, even internally within the European Union now, as pushback against
this wave of digital regulation, not just the GDPR, but also other accompanying laws.
And you end up with privacy engineers and privacy professionals in their organisations
as the people who are supposed to act out this political ideology.
It's a political statement, and you're an employee of a company, and you want the company to
succeed, but you're also supposed to be acting out this ideology.
And that's something that comes up again and again in my discussions with privacy professionals,
so lawyers, privacy engineers across the spectrum of privacy roles.
When I talk to others in the industry,
they face such resistance within their organisations,
and some of it is specific to that organisation,
but a large part of it is we need to change this organisation's business model
to implement this. How are we going to do that?
So it feels a bit sort of David and Goliath sometimes in some organisations.
In others, the business model is already accommodating to this,
and it's a simple, I say simple,
with air quotes because it's not that simple,
but as simple as trying to find the technical ways
to implement these additional protection measures
that are mandated.
And in other organizations,
you find yourself up against the CEO,
basically telling them they need to change their business model.
And maybe to bring an analogy, I guess,
and correct me if I got this then right,
if you think about energy, right?
If you think about all the big oil companies
that obviously still want to examine,
exploit all the oil that still comes out of our wells.
But if we as a society think that we want to move to other ways of energy,
wind, solar, and these oil companies have not,
they would basically need to change,
or they have a vested interest that they can still dig out oil as long as possible,
because they've made an investment and they,
rightfully so probably want to get their investment out of,
they want to get money back out of the investment.
But on the other side, if we are then dictating that we need to go into a different direction,
then I can see the struggle.
You want to be obviously playing green,
but if green is currently not what your company strategy was,
then you need to change your company strategy, your business model or whatever it is.
So it feels at least from this is kind of the analogy that I came up with.
It's a struggle for some.
But it's the right thing to do.
And I guess the question is then who is on the longer lever, right?
And who can move faster?
Really interesting.
One of the things that I, because you were really advocating for, you know,
if you please become privacy engineers and please if you're an organization
and you don't have a privacy engineering team yet, start one.
If people are looking for that type of job, of role, what are some of the necessary skills?
Any tips?
Because obviously, you also changed from what you did earlier then to the privacy.
Any tips that you have?
Any what makes a good privacy engineer?
Great question.
Yeah.
So a bit more about my background.
I had a little exposure to privacy before I did hardware security research.
I interned at Palantir
on their Privacy and Civil Liberties team
and found that absolutely fascinating
learned so much from them
and was very tempted to stay
but then the allure of research
tempted me over to Gratz
I had already committed to starting a PhD there
and although it seems like a huge topic switch
I think the common theme
is being interested in how data's leaked
how systems are exploited
and so I think this this offensive mindset
is something that's very easily transferable
from security to privacy
and it's something we've been trying to do at Dinotrace.
We work really closely together with our product security teams,
and we've been trying to cross-train them in privacy.
It's a slightly different perspective,
but actually you're most of the way there,
if you're already familiar with threat modeling systems
and thinking about how could we break this,
how could somebody attack this?
It just tends to be different kinds of data
and different threat actors scenarios that you're thinking about.
But that core practice of threat modeling is something that you already know.
So huge shout out to everyone working in security.
Feel free to stay working in security because we need you.
But if you want to cross over to privacy, you're also very welcome and it won't be too hard to transition.
So threat modeling, of course, some degree of privacy domain knowledge.
It never feels like you know enough because the legal situations around the world are changing all the time.
Right now with GDPR, we could end up with that being slimmed down.
We'll have a GDPR light.
They're looking at ways to deregulate.
So that could drastically change.
in the EU, we don't know. Nobody's quite sure yet. The IAPP, the International Association of Privacy
Professionals, is a great way to try and keep up with some of these changes. You can become a member
and they have lots of online resources, conferences, virtual and in-person meetups, which are a great
way to get to know other privacy professionals and share tips, because there is a lot of ambiguity
in how you can interpret the regulations. So it really helps to ask people from 10 other different
companies, hey, how are you interpreting this? How have you implemented this? Do you feel comfortable
with it. What's your confidence level in that this is the correct interpretation and the courts
will agree? So that's really, really helpful. The other aspect, I say, is communication skills.
And this is something that continues to surprise me over and over again, just how crucial that is.
So I described privacy engineering at the start as kind of trying to translate law and policy
into code. And this active translation that there's so much misunderstanding happening.
I've encountered engineers who thought email addresses can't be personal.
data, right? Because we have to process them, and we wouldn't be processing personal data.
So you have to address these kind of misunderstandings, also misunderstandings between
different layers of the organisation. So if as a engineer deeply focused on the technical
details, you go to the legal team complaining about these technical details, there will be a total
misunderstanding. Even terms like service can cause confusion. So they might be thinking at a higher
level of some kind of service that Dinah Trace offers to its customers.
When I say service, I mean a microservice.
I'm literally thinking of a Java microservice,
and I'll start talking to you about which Kubernetes cluster it's running on.
And that translation gap is something that time and time again find we need to address.
Yeah, it's great.
I mean, and it's examples that I guess the whole terminology definition.
I think this is, I just, I remember.
one thing, and I've maybe mentioned this in the previous podcast, but Dora is a big topic for me.
But Dora for me, means maybe something completely different than for you, Kat, because for me, Dora,
the DevOps metrics, the four key golden signals, the four key golden metrics.
And for you, probably it's the Digital Operational Resiliency Act.
And I remember a conversation I had last year at a conference, and I was sitting down at the
the lunch table
and I started
talk I heard somebody
talking about Dora
and then I chimed in
and then we were talking
about 10 minutes about Dora
and everybody was an expert in Dora
and then we realized in the end
after 10 minutes
we were talking about
two completely different things
right and it's
that's insane
because I've been hearing Dora
coming up and I keep on thinking
about the metrics
and but I've been hearing
it come up in like
the security or this other aspect
and I'm like
how is this the same
I didn't realize they were two different things.
I just learned that right now.
That makes total sense.
This makes no sense that there's these conversations like this around Dora.
Like, what the hell's going on?
Oh, my goodness.
All right.
And that's why to your example, Kat, right?
A service and a service can be two complete different things.
We are providing a service to our customers.
This could be a consulting service.
This could be whatever.
And for an engineer, as he said, it could be,
microservice
that runs somewhere
yeah
and to our
younger
listeners
doors
the explorer
you know
awesome
you already
made a segue
a little bit
into the
next topic
I want to ask
you and this
is like
how we are
doing things
within our
organization
and obviously
in our
organization
we are
in the
observability space
that means
we do
collect a lot of
data
from our
customers
that consume our service of observability and security
and really user monitoring everything.
I know a lot of our listeners, right?
They are somehow very, I guess most of you
the listening in are very familiar
with what we do with observability,
with performance engineering,
with platform engineering, DevOps.
These are typically the topics we talk about.
So I would be interested in
what are privacy considerations in observability.
So I would just be interested
in hearing what are the thoughts what are the things that we as an observability vendor
need to take care of what are the depressing topics that you and your team are bringing to the
managers engineering team that's a great question i say our work is split in two aspects
there's the customer facing side we're building product features and then the
enablement within the software development life cycle so consulting on new design
and threat modeling them, giving tips on how can you protect privacy better at the design stage,
the way through code level checks and testing before it's released.
I think for this discussion, the most interesting part to focus on is the customer-facing features.
So which privacy considerations do our customers have that we then need to offer features for?
And the key word here is sensitive data.
We talked about the different types of personal data.
But when it comes to B2B SaaS, and particularly a product,
like Dinotrace where you have customers ingesting hundreds of terabytes per day of data
in some cases, it's no longer enough to just focus on a few different identifiers
because you could have some data that's incredibly sensitive either to an individual person
that's being monitored by the customer through their use of Dinotrace or incredibly sensitive
to the company. These are corporate secrets that they do not want to share.
And this is something that I'm sure has come up on the podcast before this question.
how do you get customers to trust you as a SaaS provider?
How can they have some proof that you can be trusted with their data
because it feels like they're just sending this data into the void
and who knows what you're doing with it?
So I would say our privacy features there
complement all of our other security and compliance features
that are part of this larger program of proving to customers
that we can be trusted.
And some of the features we have, for example,
at the, you can look at different stages of the data life cycle
when observability data is being ingested.
So one phase you could look at is the very start
where data is being collected, you might want to mask it at collection.
So if this is very sensitive data, the customer might still not trust us with it.
They don't want it to ever go to a Dinotray server.
In fact, they might have a legal requirement
that they don't send that data outside of the country or to a third party.
So it needs to be masked there before it ever reaches us.
Then there are further layers of masking, for example, as the data is being processed through open pipeline, they can apply various transformations, they can anonymize it, they can drop certain records that shouldn't have been collected.
And then there's also masking at display. So it might be that you genuinely have a good reason to collect this data, but only a small number of people at your company should be able to see it.
Or there's one column of the data that is particularly sensitive. And so you make sure that only a small group of people can see that. It's masking at display.
And finally, we have some emerging features where we're looking at how to respect the subject rights
and detect mistakenly ingested sensitive data.
So if somehow sensitive data has made it pass all these checks, you have a misconfigured rejects
say that you're using for masking, and you realize, oh, no, we've collected these bank account numbers
that we shouldn't have collected, you need ways to delete the data.
But most of all, you need to know that it's there in the first place.
So that's something that we're working on at the moment, scanning for sensitive data,
can be really challenging at scale.
Hard deletion, of course, is already a feature that's available.
And the individual privacy rights requests, these are less common,
but we're starting to see more awareness of it.
When one of our customers' customers makes a request to them and says,
hey, please delete all of my personal data, they need to delete that from all of their systems
and all of their third-party systems, and Dinotraces is one of them.
And I think there's still limited understanding that observability systems can collect so much personal data.
It really depends on your use case, but there may be a lot of personal data in there.
If you have legitimate reason to collect that about your monitored users.
And so we also have an app to support that to help with export and deletion requests to find data about a specific person.
Cool.
So just that I understand this use case, because for me, that's interesting.
Let's assume one of our customers is in healthcare.
And they're using Dinah Trace to monitor their health care system.
And then if one of their customers says, hey, I want you to remove all of my personal data from your records, for whatever reason, they have the legal right to do so.
That obviously means that this healthcare provider needs to make sure that really all of the data related to that individual is really either removed or massed or however.
treated with.
And that means
some of the data
may no longer be
in their possession
because it has been sent
to assess solution
like our solution.
So that's interesting.
And that was there.
Wow.
And then I know
this is an episode
that we wanted to
record at some point
the
what's the privacy
app,
what's it called again
the app that we have
in Dinah Trace
to then fulfill
these requests.
That's called
Privacy Rights.
Privacy Rights app.
Yeah, exactly.
That's why you couldn't find it, Andy, because you were typing privacy instead of privacy.
You have to talk with a British accent, you know?
Yes, exactly.
No, it's, you know, for those of you that are listening in and that are actually using our product,
be reminded that we have also a privacy app that you will find in either way how you write it.
UK or U.S. pronunciation.
Cool.
Really insightful.
For me, and I think this is why I mentioned in the beginning of before we hit the record
button today, Brian and I are always so lucky to have guests like you because you bring
such a wealth of knowledge on a topic, on an area that we are typically not exposed to.
Right.
So, and this is why it's fascinating to learn all these things.
yeah i feel like it's every time we uncover one of these new topics it's my mind just starts
exploding with the complexity and it's not like this was this topic was thoroughly explored
back in the early days of more simple architecture more simple compute systems and all in a simpler
world. It's when these are
focused on
when everything is complex as this
and you start thinking of
oh well we'd have to check this, we'd have to be able to do this
like it just blows
my mind that people can even start at this level
and start getting a handle on it but I guess
it's really just starting and chipping away
and finding bit by bit
where we tackle
I'll go with privacy
where we tackle the privacy
aspects of this
And, you know, you can't boil the ocean and get it all done at once, but just keeping a steady pace and moving forward and moving forward and moving forward as quick as you can because these things obviously have consequences for a company.
So you can't also just be slow about it.
But, yeah, it's just mind-boggling.
Like, you know, Andy, you and I going back to our old, you know, load testing stuff, right?
When at first you would just do, okay, let's do a certain amount of people hitting the site, right?
But then you start thinking of, you start trying to recreate.
tests more accurately. And then the more accurately, you try to recreate the tests, the more you uncover, the complexity of the accuracy. And it keeps getting like, it's a rabbit hole, basically, you know, is where I'm going. So to that point, with this being our first introduction, or at least my first introduction to all this privacy stuff, it's quite amazing. And I'm not literally having my jaw on the floor, but figuratively, I've got my jaw on the floor for most of this episode thinking about all this. So.
Very much thank you on that, Kat.
So on the topic of, this is true, obviously, for every SaaS vendor,
but let's stay with the observability space.
In the end, it's about really building trust.
We obviously need to make sure that our consumers, our customers,
or any customers of SaaS services really trust that SaaS service,
that previous information that shouldn't be captured,
it is not captured and that if it is captured only the people that are allowed to see it
are you know can see it is there anything else from an observability like privacy considerations
in observability that that you wanted to to highlight any other things that you think
this is something that maybe people don't think about when it comes to privacy considerations
in observability or have we pretty much covered the key topics
Maybe I could give a few examples just to make this more concrete.
So I covered all the features of how you'd try to stop sensitive data being seen by people who shouldn't have access to it.
But why is there sensitive data in the first place?
What kinds of sensitive data are we collecting in observability?
If we just stick to the classics of logs, metrics, and traces.
I know that observability dueling is offering many more features now, but if we just stick to the classics,
logs, of course, it's really common to see people logging personal data for debugging purposes.
something like user X with email address Y logged in just now.
I've heard lots of engineers referring to those as audit logs.
And then you dig deeper and you ask, hey, do we need this audit log?
Is there actually a requirement to have an audit log?
If it is an audit log, could you please put it in a special audit log place,
not just into the general logs that anyone can see?
So those kind of things are really, really common.
You see that all over the place.
Then there's less obvious things like API design, for example.
you can have an API design that isn't privacy preserving.
So you might include personal data in the URL
or in the request headers or something else
that gets captured in the trace.
And of course, depending on your settings,
you could choose to mask that.
You could choose not to capture it.
But you might want to capture it for most cases,
but you have this one API endpoint
that is leaking loads of personal data
or sensitive financial data,
bank account numbers, for example,
credit card numbers, eye bands.
Those are also really common.
And the real gotcha is even if you don't use observability tooling,
somebody else further along the request chain might be using it.
So that's something to keep in mind,
even if you think it doesn't matter how I design my APIs
because I'm not using observability tooling,
somebody else might be, so your content delivering network.
And I'm just making notes because for me,
this just opened up my eyes on API.
First of all, both examples, the audit logs and the API design were really interesting,
Because the first one really on the logs is if you log it, do you really need this data?
And if you need it, what is the right storage?
What is the right place to store it, right?
Because obviously many, I guess every logging observability solution has a way to put data into certain buckets.
We also call it bucket and I'm not sure what other tools are calling it.
But we can decide where the data gets stored and then that bucket.
it has them privileges on who is allowed to access the data.
And we even do it on a record level.
But I think that's an interesting question.
So, A, do you really need this information in the log?
And the way you log it out, it will end up, let's say, in a general store,
but it shouldn't be there.
So you need to think about this.
The API design is also pretty interesting because if you put sensitive data on a URL,
it actually means that from the browser of
the mobile app until that URL reaches your backend, there might be many different hops in
between. There might be caching layers, there might be other web servers reverse proxies,
and most of them are logging the URLs for various reasons, right? I mean, typically the
URLs always get locked. And then the question is if there's sensitive information on there,
then there's a third part in the middle, this is obviously not what you want. And then also
on your backend system, you want to vet it,
you want to make sure that certain data
doesn't get locked by default, but only
if you really need it and then again, think about
where this data gets stored.
Is this
something in your work
that you also, are you
part of API design reviews?
Are you part of
code reviews to
validate this?
Is this something we can kind of
automate or any best practice
and how we can make sure that already in the design phase,
you are considering all of these privacy rules?
That's a great question.
And the classic challenge is how do you scale that?
So we would like to be there for every code change,
for every new API that's introduced.
But how do you scale that with a relatively small team?
And I would say AI is opening up lots of new opportunities now.
I was at Pepper, Privacy Engineering Conference,
most by the Usnix Association in the US just a few weeks ago.
And AI and how we could build in LLMs was on everyone's lips.
They were saying, oh, I've been trying it to automate privacy code reviews and threat modeling.
So everyone is trying it out.
It's still not at a mature enough phase yet.
But it gives me hope for the future that we could really scale this out en masse
so that every pull request is checked, for example.
Until that point, there is a much simpler analog method,
which is privacy coding guidelines,
and also you can work this into your API guidelines.
So then you can make it clear to each developer, hey, it's your responsibility to follow these practices
and try and keep those as simple as possible. No legal wording. Ideally, don't even mention privacy.
Just document a requirement. You must not put user identifiers, email addresses,
list every single data type, because that is a really common source of confusion if you say PII or personal data.
Most people don't know what that means, or if they think they know what it means, they don't know all the different
cash crease it can cover.
It almost seems like, you know, as there's security scanning tools, that it would be
really beneficial for there to be, you know, some privacy scanning tools.
One example I think of a lot is, as we're talking with customers who are sending us logs,
they're looking for like, you know, BAAs and all this other kind of stuff about making
sure that we don't store certain things.
And my thought is always, why don't you just not put that in the log, right?
Why are, I mean, obviously there might be some times, but they're like, no, there's no reason for us to do it.
We just have no idea what's in the logs, right?
And that is the big challenge because they have this legacy of code writing to logs and they have no idea what's in it.
So even starting with like some sort of maybe LLM based, you know, log scanning tool, which can go through and identify person.
you know, identifiable data, let developers know, start going back and fixing it, also
maybe even doing the code reviews, you know, just like with security, there's all these
tools that'll go in and do these scans.
It's, again, opening this whole world of possibilities and endless things that are needed.
So, yeah, I can imagine at some point in the future, people not only asking us for our security
status codes, you know, and all the different
benchmarks we pass, there'll be ones for
privacy, which, you know, obviously there are already
some of those, but even more of those kind of things
going on. It's just
mind-boggling. It's such a
deep topic suddenly within
50 minutes.
If I can add one
simple example, if as a listener
you're thinking, oh, I wish we had privacy code scanning
but we can't afford to integrate
LLMs into our workflows, I know it's
expensive, we don't really trust them yet.
A very simple option,
which you just reminded me of, Brian, when you mentioned can we leverage security tooling?
Something we tried was using SemGrip, which we were already using for security code scanning,
to do some very, very basic regex checks to look for variable names that look like they might have personal data.
So look for VAR email address, VAR email, VAR user profile ID, things like that,
tailor them to your context.
And it turns out the variables don't vary that much.
So although there are some solutions out that there are more,
complex and try and do data flow modeling and figure out sources and syncs.
They're very heavyweight. They take a long time to run.
Semgrep is super quick. You might already have it in your workflows.
And you can actually get surprisingly far, just a quick reject scan.
And suddenly you know, okay, in this system we knew nothing about, based on the codebase,
it's processing, email addresses, phone numbers, first names.
Now we have some information. We can go and talk to the team about why those are being processed.
And for reference, this is SAMGRAP, S-E-M-G-R-A-P.
Perfect.
We'll add the link to the description of the podcast.
Because it also reminded me of one thing that I've started to advocate for this,
but I think now with this privacy discussion,
have some additional use cases.
When you think about the software delivery lifecycle
and you're deploying a new change in your test environment and you run some tests.
I've been advocating for analyzing your logs, your metrics, your traces on certain patterns.
One example would be do we have debug logs in a higher level environment?
Or as people are moving from logs and to traces, is the same exception now available on a log and a trace,
because then it's duplicated data that nobody really needs.
we could also look
and that there's a new company that just launched
our oldie garden
that defining some best practices
on what is good observability, what is
good tracing. So for instance, do you have
over-instrumentation, right? As we're asking
engineers to instrument their code
are the instrumenting functions that are
getting called at a thousand times per
transaction. Brian, this goes back
to our early days with Dinah Trace
when we had the shotgun instrumentation
and then sometimes we had pure paths
that were kind of, you know, timing out
or like having like 10,000 notes
because somebody instrumented the wrong method
that was called 50,000 times.
And so the same could be done now.
As part of your software delivery life cycle,
you can then analyze your logs,
your metrics, your traces based on patterns
and then act as a quality gate in Dinah Trees
where you can use the site reliability guardian
for that to execute certain queries
against that observability data
and then say, hey, it seems you have
some personal
PII data in your logs
or you've instrumented this new method
and you're capturing data that you shouldn't capture.
And so
kind of like a quality gating. That's cool.
Wow.
Kett,
unfortunately,
as I told you, sometimes
time flies when we
have this discussion and it's unbelievable
that it's almost the top of the hour.
Is there any
anything else
obviously besides we will have you back
and go into more details
but any final thought
for today's episode
that you said I would have wished
I would have said this but there was no time
anything any final thoughts from your side
for our listeners
I would have two final thoughts
one is I fear I've made this sound very complicated
it can be complicated
but I don't want it to seem so complicated
that it intimidates people and think
well we can't be previously preserving
that's too hard. I would say always go back to that example I gave of does this meet expectations,
our users' expectations, my own expectations, that is the simplest way of thinking about it, and it's
quite effective. The second thought is if you're finding that approach falls down, because
you're trying to design for users who have very different experiences from your own, and you can't
put yourself in their shoes very easily. Persona-based threat modeling is something I've been exploring,
and I'd recommend you check it out.
So leveraging the UX personas
you probably already have for your users.
You know who your target audience is,
you're thinking about them as you design
the user interfaces and the whole overall user experience.
Use those as a reference when you're trying to evaluate this.
Does this meet my expectations or not?
That can help you put yourself in the shoes
of somebody from a minority group,
say somebody who might be persecuted
on the basis of this data being shared,
which you would be totally fine with sharing publicly
having on the front page of the newspaper because it's not a problem for you.
And I'm very excited to hear you want to have me back
because we can dig into that much further in the next episode.
And I took notes.
I took notes to say next episode.
We have a lot of topics to discuss.
Yeah.
Yeah.
Brian, do you want to close it?
Yes, I want to close it.
Because I know you have to get these tickets.
Yeah, I got a minute in 54 seconds.
But then I go into it.
a queue, you know how that all works.
Really appreciate it.
And I didn't mean to make it sound complicated as well with my reactions to it.
To me, it's more the complication of once you start thinking of the possibilities and the layers and layers of all this, you realize there's so much to it, right?
I mean, it exists in so many places.
And as an organization, you are exposed to a lot of risk and making sure you have it all locked up.
it's not just like oh yeah well just put a firewall up and we're good right it's very very big
picture right now there are simple tools and there's a simple process as you're explaining that
you go through this stuff and you method you do it very methodologically what the hell's the word
I'm saying there is that right I don't even know I'm not even going to bother trying but the
you know it's a fascinating topic and you know if I were not the old man I was
I would be
years ago
I might have tried
to switch into
security
and then I might be
thinking
oh this is a whole
new field here
with the privacy
because this is like
the new frontier
of excitement
you know
Andy and I
were part of the
frontier of performance
if you will
right
where we were trying
to convince people
it's something
important
right we used to have
to battle people
not physically battle
them but you know
really try to convince people
like you know
performance is
an important thing
now it's standard
everywhere
obviously security has real consequences
and people are taking it seriously
and at a certain point
with an unknown number of people in the organization
it's time to start doing the security
the privacy setup
so it's exciting stuff
and I think there's just like so many worlds
of possibilities
and when you think about the idea that
AI is going to be writing people's code soon
there definitely are other areas
people need to think about, start branching out to
so they can continue
to work in this amazing field.
But, so
don't work in privacy.
Absolutely.
World of possibilities.
Thank you both for a great discussion.
Thank you.
Thank you so much for the insights.
See you soon on the next podcast.
Yes.
Have a wonderful day, everyone.
Thanks for listening.
Thank you.
Thank you.