PurePerformance - What is Privacy Engineering and Why Its not as complicated as it sounds with Cat Easdon

Starting point is 00:00:00 It's time for Pure Performance. Get your stopwatch is ready. It's time for Pure Performance with Andy Grabner and Brian Wilson. Hello, everybody, and welcome to another episode of Pure Performance. My name is Brian Wilson. As always, I have with me, my co-host, Andy Grabner. How are you doing, Andy? Sassy Andy today.

Starting point is 00:00:35 Sassy, why sassy? You're pretty sassy today. Okay. Are you familiar with that term at all or no? I am, yeah. Okay, yeah. You came on to the call with jokes and funny stuff. And, you know, our listeners are going to expect the absolute best episode ever from us today because...

Starting point is 00:00:54 They will. They will. Why will they get it? Well, the thing is, first of the... First of all, I want to remind everybody what you will get at the end of the hour. Hopefully, tickets to Paul McCartney. Hopefully. Hopefully.

Starting point is 00:01:07 Yeah. And I think you earn it because I just came back from a weekend and I saw two concerts. I saw Robbie Williams and I saw Sukoro. I don't know the second one. I know that Robbie Williams, they made a biopic about him and it was an ape. Yeah, it was an amazing movie. I've heard. I've heard it's really good.

Starting point is 00:01:28 But I don't know much about him. But, yeah. In the end, in his movie, there were no secrets and nothing if he was shying away from any privacy. He shared a lot of stuff from his personal life, which I think is a perfect segue now to the topic. It is about time to introduce our guest today. And I'm very happy that we have one of our colleagues on the call today. Kat Easton, thank you so much for being here. I think we met, well, we met probably multiple times.

Starting point is 00:01:59 But I remember when you were giving an internal talk in one of our engineering labs around privacy engineering. And then I approached you afterwards and said, hey, you have a lot of stuff, a lot of knowledge on a topic that I am not at all familiar with. But that I think everybody should have a basic understanding. And so that's why we invited KET to Pure Performance. And now I think I want to pass it over to you. I want to quickly get an introduction, who you are, what you do. where you are, because we see from the Zoom background, you have a beautiful landscape from Innsbruck.

Starting point is 00:02:35 I think that's what you said, but maybe just fill us in. Sure. I'm very glad we had that conversation, Andy. I look to Connectors. Thanks for having me on the podcast. So, yeah, I'm Katz. I currently lead Dantros' privacy engineering team. My background before that was in hardware security research, and I still do a little bit of research on the side. And I'm based in Innsbruck, but as you can probably tell from my accent

Starting point is 00:02:59 and the fact I say privacy rather than privacy, which is a running gag, confuses quite a lot of people. I'm originally from the UK. Cool. And what brought you to Innsbruck? A mix of things. I absolutely love spending time in the mountains.

Starting point is 00:03:16 And also personal reasons my partner is based in Munich. So when I came to Austria, originally I was in Gratz. I think that is actually where we first met Andy years ago when you did a visit to our Graz lab. But then the privacy engineering team didn't exist. I was working in licensing. And then over time, reassessed where I wanted to be based. And Innsbruck was a natural fit.

Starting point is 00:03:41 It's full of crazy people like me. We're just completely obsessed at the mountains. They're best kind of crazy. That's a good advertisement, too. A, for the region of Tyrol. So, folks, if you are... If you're crazy about this. if you're crazy about mountains

Starting point is 00:03:56 I mean I know obviously Denver Colorado would be a good place to Brian but in Sprook Tyrol is also a great spot Well there are no mountains in Denver Yeah but close by I mean

Starting point is 00:04:08 I was just busting your chops I think you've got better better power than us Brian for skiing so Denver does well on that one We do a lot of fake fake snow though isn't a ball of our water

Starting point is 00:04:21 Yeah I think Fake snow, artificial snow is something we unfortunately more and more rely on as well because of changing weather. But let's go back to the topic. You gave me a great list of topics we could talk about and we'll see how much we can actually fit into this episode. And we may ask you back. But two things stood out. And I want to start with privacy engineering.

Starting point is 00:04:48 What is it? What is privacy engineering? What can we as people that are not in that field? understand about privacy engine. What do we need to know? Excellent question. How much time we have? Could we go on till two, three hours? This is a fun question because it's still a struggle to answer, even for those of us, based in the field. I'd say the answer is it's a very broad term, but one of my favorite ways to describe it is it's the art of trying to translate laws and policies to protect privacy into code. And you can do that at lots of different levels.

Starting point is 00:05:24 so some privacy engineers don't write any code, but they work on the intermediate stages. So perhaps translating legal requirements into policy requirements that we can apply across the whole organization, figuring out processes to implement that across the software development lifecycle. Or you can be, as I am and as Dan Trace's privacy engineering team, is really focused at the code level and building privacy features, but also looking at how we can integrate privacy checks into the software development lifecycle, checks on poor requests, for example. You mentioned earlier, and I'm not sure, you know, if you can answer this question, but I'll give it a try. Nevertheless, you said when you started, when we met, you were in Graz, you were in a different team.

Starting point is 00:06:06 There was no privacy team yet. See, privacy. I'll try to adjust my. I also learned British English, but I think I spent too much time in the States. So my question is, back when you were in Graz, you said there was no privacy team. why the move from Dinah Trace to invest in such a team to build such a team

Starting point is 00:06:32 is this something that other organizations also do at a certain size have things changed over the years what are some good indicators when an organization a software organization or any type of organization needs a dedicated team another great question

Starting point is 00:06:50 and I would say the key indicator is when the people tasked with privacy are really struggling to implement what they need to implement across the organization. And that will vary depending on the organization, what their product or services. It could happen at very different sizes. So if this is a really sensitive use case, that could happen just when you're a startup of 10 people, you might already want privacy engineer. We saw that, for example, with the AI companies, Open AI, Anthropic, invested in privacy quite early on because, as we know, people will send absolutely anything to an AI model really sending sensitive details of their lives, and so privacy is essential. Whereas if you have a

Starting point is 00:07:28 product where it's very unlikely, somebody is going to send you any sensitive data, then it might be that only once you get to a very large scale, you start thinking about having a technical component to privacy, because you might be able to handle all the requirements you need to meet and protect your users and maintain your customer's trust and how you're protecting their data with less technical measures, organizational measures, essentially policies that are applied across the organisation. So it really depends on what your development life cycle looks like, what the organisational context is.

Starting point is 00:07:58 I wish I could give you a fixed size of once an organisation crosses 200 people, that's when you hire a privacy engineer would also help create some jobs for us because it's a really niche field. So anyone listening, please establish a privacy engineering team at any size. It is an interest. The reason why I ask it, I've been spending a lot of my last couple of months and maybe a year or two on a platform engineering, right? And the question always comes up, what is a good time to invest in a platform engineering team? What is a good time to think about building an internal product for self-service?

Starting point is 00:08:38 And there are, as you said, there's no fixed number. But it's actually similar to what you said earlier when there's too much effort. or when actually when there's too many people within an organization struggling with the same things or implementing their own self-service portals their own tools to make the day-to-day life easier, this is then a good point when you say, well, maybe we're investing in a central platform engineering team

Starting point is 00:09:04 and that's why I was asking the question if there's any best practices, any surveys, any reports out there that people can look into and get some ideas. I really like that comparison, because something I want to emphasize is it's not just privacy engineers who do privacy and actually build technical solutions for privacy. So it could be that you have engineers across your organization who are working on how to detect personal data, for example, to mask it. And if you find you have three separate teams who are reinventing the wheel each time, then that could be the time when you want a central privacy engineering team who could maintain that offer it as a shared library that's offered to everyone. Yeah. And I think that's also another thank you for that. I will take this into my repertoire when I talk about platform engineering, right? Because in the end, it's once we start duplicating efforts, this basically means you are becoming very inefficient as an organization because you don't hire these people to reinvent the wheel. Often, though, it's not visible, right? Often we lack the visibility that other teams are solving the same problem.

Starting point is 00:10:12 And I think it also needs an organization's maturity to say, hey, we need to look into everything we do. We need to understand where do we have duplicated efforts to solve essentially the same problem and how can we make this easier and provide it as a, as you say, the shared library. In platform engineering, we talk a lot about an IDP and internal development portal where people can go to and then consume things as a self-service. got another question for you what what is what is privacy data

Starting point is 00:10:45 privacy data is this is there a global definition of what is data that needs to be protected or are there many different regulations around the globe that we especially you know we are working for a global

Starting point is 00:11:03 operating organization do we have to then adhere to different regional laws or different regional definitions of privacy? Yes, so DynTrace definitely does, and so do other organizations operating across multiple countries. And this is where it gets really challenging because there are very different definitions. So it's something like an email address. Most laws agree that that is personal data.

Starting point is 00:11:29 So we have some data types where you can reliably say pretty much wherever we are in the world, this is personal data and we need to protect it. But then you have sector-specific requirements. requirements in healthcare, for example, that's the HIPAA privacy rule in the US to look into. And then specific identifiers that may not seem sensitive to us, but in a different context, are sensitive. So, for example, in some regions, the world, your tribal affiliation is considered highly sensitive data about you. That's not something that springs to mind for me in my cultural context. That's where it's really tricky because it's not just about your legal knowledge,

Starting point is 00:12:06 but also some element of cultural understanding and trying to put yourself in the shoes of somebody who has a very different life to your own. So something I find easier to focus on rather than the specific legal requirements. And this is something I use when I'm trying to train up engineers, rather than teaching them, this is what's required in the EU, this is what's required in India. That's too much detail. It becomes overwhelming to somebody who isn't a specialist in this area. So I instead focus on meeting people's expectations. So I ask them to think about the context of this data transfer and say, would this meet your expectations in this scenario, would you be surprised that this data is being collected about you and it's being

Starting point is 00:12:44 used for this purpose? An example may be if you're at the doctors, you're very happy to share sensitive health information because you want the doctor to diagnose you and help you get better. but if you're asked the very same questions in a job interview, that would be a massive red flag, and you're not going to work for that company because something strange is happening at that company. So the context really matters. The setting that you're in really matters. So I find that a more helpful framing than which data types do we need to care about. Of course, we do need to think about data types at times. But when you're thinking about a specific scenario, will this violate some privacy law or not?

Starting point is 00:13:22 instead of each engineer trying to think through legal requirements just think about would this meet my expectations would I be surprised the challenge there coming back to something I mentioned before regarding the putting yourself

Starting point is 00:13:35 in the shoes of somebody who's very different to your own that's where this comparison can fall down if you don't have a very diverse team and somebody thinks well I would be very happy to share this there's no danger to me but actually somebody in a minority group it might be really dangerous to share that information

Starting point is 00:13:49 so that's the limitation with this but at least it saves people having to learn a lot of legal detail. I think this is also a call out that diversity is a very good thing because you have a lot of different perspectives and opinions and backgrounds that together can then really form a strong sense and a strong foundation for good previous engineering and not just previous engineering in general, obviously having many different people with different backgrounds and different thoughts plays out and pays out pretty well.

Starting point is 00:14:26 I was just going to say to the, it was an interesting idea with the question of, you know, what I find this in awkward conversation or, right, and having the diverse broad background thoughts on it because I think this comes up in, not to change the subject in any way, shape, or form, but when you think about like social media or all these different apps that are collecting our data in order for free usage and so many people are like, yeah, well, I don't care, right? but there are a lot of people who do care, right? But that obviously gets put to the wayside for the free product. But when it comes to a corporation and having this data sharing, it's a lot different than just what's in the public view. But the idea of some people would be okay with it, right? The reason you said you need a diverse background of people is to have that setup, and it's something you see all around.

Starting point is 00:15:19 Like, what do I have to hide? but there is a principle behind it which is a lot harder to pin down with just one or two people I don't know if that made sense but it brings me to the next question and kind of like I guess we now we are in 2025

Starting point is 00:15:40 and I guess we ended up here with a lot of regulation because of you know maybe historical events where some people were very relaxed on these things and we're just doing things and collecting data and not protecting it in the right way that they shouldn't have. Is there any, are there any, can you give us a little bit of a history overview of how we ended actually up here where we are and things that we should, that we should know about why

Starting point is 00:16:05 privacy, privacy engineering is what it is today? So I'll assume your listeners are relatively familiar with what's happening. I'm currently how data has been monetized over the last few decades. and that's brought us some amazing advances. So the cool things we can do with LLMs now would never have been possible without those LLMs ingesting enormous amounts of data. Similarly, facial recognition and computer vision algorithms

Starting point is 00:16:31 were trained on huge quantities of images that were collected usually without consent because there was no legal requirement to get consent at the time. So this has enabled lots of cool things that we now enjoy, sometimes depend on, in the systems that we're building or using. But this wild west we had on the internet also led to lots of exploitation of personal data. And I think the most shocking examples of where we saw elections being interfered with, you know, micro-profiling of people based on what seemed like non-sensitive data about them.

Starting point is 00:17:06 You know, Facebook likes, I like dogs, I listen to Britney Spears, all these innocuous sounding things that how could you possibly influence a person's voting decisions with this information? actually, if you have enough of those data points, you can profile them into quite a fine category and guess which interests are, which voting concerns are of the most interest for them, then target them with advertising, which is far more effective than broad spectrum political advertising as we used to have. So this came up, for example, with Facebook. for this case where, and it wasn't solely about election interference, it was more the fact that they gave third parties extensive access to Facebook users' data, which they could use for a variety of purposes, including this micro-profiling and targeting.

Starting point is 00:18:00 They were hit with, I believe it's still the largest fine ever, I may be wrong there, was $5 billion from the Federal Trade Commission in the US. So that, that I'd say was one of the, turning points. Regulators started to notice we really need to fight back against this. This is having very severe societal impacts. It's not just the individual level impact, which is of course important, but societal level impacts. But how do you fight back against it when you have these companies that have based their business model on this exploitation of data? You could try to regulate the amount of existence, but their annual revenue makes them the size

Starting point is 00:18:38 of large countries in some cases. So you end up with these tech companies that are really political actors on the world stage and still nobody knows what to do with them. Nobody knows how to negotiate with these new political actors who have the same power as states. Canada tries to regulate Facebook and Facebook just says no

Starting point is 00:18:56 as if they were negotiating country to country. I don't want to demonize just Facebook here. This is an issue across lots of different companies but they came to mind because of that particularly high fine from the FTC. So in that context and coming back to the EU

Starting point is 00:19:12 context that I'm most familiar with, we have the GDPR emerging. And I think it's really important to view this is not just privacy law, but there's a broader topic of digital regulation, trying to work out how to negotiate with these companies as international actors. It's a grand political project. It's very opinionated. And that's important to understand because we see a lot of pushback against it, even internally within the European Union now, as pushback against this wave of digital regulation, not just the GDPR, but also other accompanying laws. And you end up with privacy engineers and privacy professionals in their organisations as the people who are supposed to act out this political ideology.

Starting point is 00:19:56 It's a political statement, and you're an employee of a company, and you want the company to succeed, but you're also supposed to be acting out this ideology. And that's something that comes up again and again in my discussions with privacy professionals, so lawyers, privacy engineers across the spectrum of privacy roles. When I talk to others in the industry, they face such resistance within their organisations, and some of it is specific to that organisation, but a large part of it is we need to change this organisation's business model

Starting point is 00:20:25 to implement this. How are we going to do that? So it feels a bit sort of David and Goliath sometimes in some organisations. In others, the business model is already accommodating to this, and it's a simple, I say simple, with air quotes because it's not that simple, but as simple as trying to find the technical ways to implement these additional protection measures that are mandated.

Starting point is 00:20:50 And in other organizations, you find yourself up against the CEO, basically telling them they need to change their business model. And maybe to bring an analogy, I guess, and correct me if I got this then right, if you think about energy, right? If you think about all the big oil companies that obviously still want to examine,

Starting point is 00:21:09 exploit all the oil that still comes out of our wells. But if we as a society think that we want to move to other ways of energy, wind, solar, and these oil companies have not, they would basically need to change, or they have a vested interest that they can still dig out oil as long as possible, because they've made an investment and they, rightfully so probably want to get their investment out of, they want to get money back out of the investment.

Starting point is 00:21:40 But on the other side, if we are then dictating that we need to go into a different direction, then I can see the struggle. You want to be obviously playing green, but if green is currently not what your company strategy was, then you need to change your company strategy, your business model or whatever it is. So it feels at least from this is kind of the analogy that I came up with. It's a struggle for some. But it's the right thing to do.

Starting point is 00:22:07 And I guess the question is then who is on the longer lever, right? And who can move faster? Really interesting. One of the things that I, because you were really advocating for, you know, if you please become privacy engineers and please if you're an organization and you don't have a privacy engineering team yet, start one. If people are looking for that type of job, of role, what are some of the necessary skills? Any tips?

Starting point is 00:22:46 Because obviously, you also changed from what you did earlier then to the privacy. Any tips that you have? Any what makes a good privacy engineer? Great question. Yeah. So a bit more about my background. I had a little exposure to privacy before I did hardware security research. I interned at Palantir

Starting point is 00:23:04 on their Privacy and Civil Liberties team and found that absolutely fascinating learned so much from them and was very tempted to stay but then the allure of research tempted me over to Gratz I had already committed to starting a PhD there and although it seems like a huge topic switch

Starting point is 00:23:20 I think the common theme is being interested in how data's leaked how systems are exploited and so I think this this offensive mindset is something that's very easily transferable from security to privacy and it's something we've been trying to do at Dinotrace. We work really closely together with our product security teams,

Starting point is 00:23:38 and we've been trying to cross-train them in privacy. It's a slightly different perspective, but actually you're most of the way there, if you're already familiar with threat modeling systems and thinking about how could we break this, how could somebody attack this? It just tends to be different kinds of data and different threat actors scenarios that you're thinking about.

Starting point is 00:23:58 But that core practice of threat modeling is something that you already know. So huge shout out to everyone working in security. Feel free to stay working in security because we need you. But if you want to cross over to privacy, you're also very welcome and it won't be too hard to transition. So threat modeling, of course, some degree of privacy domain knowledge. It never feels like you know enough because the legal situations around the world are changing all the time. Right now with GDPR, we could end up with that being slimmed down. We'll have a GDPR light.

Starting point is 00:24:28 They're looking at ways to deregulate. So that could drastically change. in the EU, we don't know. Nobody's quite sure yet. The IAPP, the International Association of Privacy Professionals, is a great way to try and keep up with some of these changes. You can become a member and they have lots of online resources, conferences, virtual and in-person meetups, which are a great way to get to know other privacy professionals and share tips, because there is a lot of ambiguity in how you can interpret the regulations. So it really helps to ask people from 10 other different companies, hey, how are you interpreting this? How have you implemented this? Do you feel comfortable

Starting point is 00:25:00 with it. What's your confidence level in that this is the correct interpretation and the courts will agree? So that's really, really helpful. The other aspect, I say, is communication skills. And this is something that continues to surprise me over and over again, just how crucial that is. So I described privacy engineering at the start as kind of trying to translate law and policy into code. And this active translation that there's so much misunderstanding happening. I've encountered engineers who thought email addresses can't be personal. data, right? Because we have to process them, and we wouldn't be processing personal data. So you have to address these kind of misunderstandings, also misunderstandings between

Starting point is 00:25:39 different layers of the organisation. So if as a engineer deeply focused on the technical details, you go to the legal team complaining about these technical details, there will be a total misunderstanding. Even terms like service can cause confusion. So they might be thinking at a higher level of some kind of service that Dinah Trace offers to its customers. When I say service, I mean a microservice. I'm literally thinking of a Java microservice, and I'll start talking to you about which Kubernetes cluster it's running on. And that translation gap is something that time and time again find we need to address.

Starting point is 00:26:16 Yeah, it's great. I mean, and it's examples that I guess the whole terminology definition. I think this is, I just, I remember. one thing, and I've maybe mentioned this in the previous podcast, but Dora is a big topic for me. But Dora for me, means maybe something completely different than for you, Kat, because for me, Dora, the DevOps metrics, the four key golden signals, the four key golden metrics. And for you, probably it's the Digital Operational Resiliency Act. And I remember a conversation I had last year at a conference, and I was sitting down at the

Starting point is 00:26:57 the lunch table and I started talk I heard somebody talking about Dora and then I chimed in and then we were talking about 10 minutes about Dora and everybody was an expert in Dora

Starting point is 00:27:06 and then we realized in the end after 10 minutes we were talking about two completely different things right and it's that's insane because I've been hearing Dora coming up and I keep on thinking

Starting point is 00:27:18 about the metrics and but I've been hearing it come up in like the security or this other aspect and I'm like how is this the same I didn't realize they were two different things. I just learned that right now.

Starting point is 00:27:31 That makes total sense. This makes no sense that there's these conversations like this around Dora. Like, what the hell's going on? Oh, my goodness. All right. And that's why to your example, Kat, right? A service and a service can be two complete different things. We are providing a service to our customers.

Starting point is 00:27:50 This could be a consulting service. This could be whatever. And for an engineer, as he said, it could be, microservice that runs somewhere yeah and to our younger

Starting point is 00:28:00 listeners doors the explorer you know awesome you already made a segue a little bit

Starting point is 00:28:10 into the next topic I want to ask you and this is like how we are doing things within our

Starting point is 00:28:15 organization and obviously in our organization we are in the observability space that means

Starting point is 00:28:22 we do collect a lot of data from our customers that consume our service of observability and security and really user monitoring everything. I know a lot of our listeners, right?

Starting point is 00:28:35 They are somehow very, I guess most of you the listening in are very familiar with what we do with observability, with performance engineering, with platform engineering, DevOps. These are typically the topics we talk about. So I would be interested in what are privacy considerations in observability.

Starting point is 00:28:55 So I would just be interested in hearing what are the thoughts what are the things that we as an observability vendor need to take care of what are the depressing topics that you and your team are bringing to the managers engineering team that's a great question i say our work is split in two aspects there's the customer facing side we're building product features and then the enablement within the software development life cycle so consulting on new design and threat modeling them, giving tips on how can you protect privacy better at the design stage, the way through code level checks and testing before it's released.

Starting point is 00:29:35 I think for this discussion, the most interesting part to focus on is the customer-facing features. So which privacy considerations do our customers have that we then need to offer features for? And the key word here is sensitive data. We talked about the different types of personal data. But when it comes to B2B SaaS, and particularly a product, like Dinotrace where you have customers ingesting hundreds of terabytes per day of data in some cases, it's no longer enough to just focus on a few different identifiers because you could have some data that's incredibly sensitive either to an individual person

Starting point is 00:30:11 that's being monitored by the customer through their use of Dinotrace or incredibly sensitive to the company. These are corporate secrets that they do not want to share. And this is something that I'm sure has come up on the podcast before this question. how do you get customers to trust you as a SaaS provider? How can they have some proof that you can be trusted with their data because it feels like they're just sending this data into the void and who knows what you're doing with it? So I would say our privacy features there

Starting point is 00:30:41 complement all of our other security and compliance features that are part of this larger program of proving to customers that we can be trusted. And some of the features we have, for example, at the, you can look at different stages of the data life cycle when observability data is being ingested. So one phase you could look at is the very start where data is being collected, you might want to mask it at collection.

Starting point is 00:31:08 So if this is very sensitive data, the customer might still not trust us with it. They don't want it to ever go to a Dinotray server. In fact, they might have a legal requirement that they don't send that data outside of the country or to a third party. So it needs to be masked there before it ever reaches us. Then there are further layers of masking, for example, as the data is being processed through open pipeline, they can apply various transformations, they can anonymize it, they can drop certain records that shouldn't have been collected. And then there's also masking at display. So it might be that you genuinely have a good reason to collect this data, but only a small number of people at your company should be able to see it. Or there's one column of the data that is particularly sensitive. And so you make sure that only a small group of people can see that. It's masking at display.

Starting point is 00:31:52 And finally, we have some emerging features where we're looking at how to respect the subject rights and detect mistakenly ingested sensitive data. So if somehow sensitive data has made it pass all these checks, you have a misconfigured rejects say that you're using for masking, and you realize, oh, no, we've collected these bank account numbers that we shouldn't have collected, you need ways to delete the data. But most of all, you need to know that it's there in the first place. So that's something that we're working on at the moment, scanning for sensitive data, can be really challenging at scale.

Starting point is 00:32:27 Hard deletion, of course, is already a feature that's available. And the individual privacy rights requests, these are less common, but we're starting to see more awareness of it. When one of our customers' customers makes a request to them and says, hey, please delete all of my personal data, they need to delete that from all of their systems and all of their third-party systems, and Dinotraces is one of them. And I think there's still limited understanding that observability systems can collect so much personal data. It really depends on your use case, but there may be a lot of personal data in there.

Starting point is 00:33:00 If you have legitimate reason to collect that about your monitored users. And so we also have an app to support that to help with export and deletion requests to find data about a specific person. Cool. So just that I understand this use case, because for me, that's interesting. Let's assume one of our customers is in healthcare. And they're using Dinah Trace to monitor their health care system. And then if one of their customers says, hey, I want you to remove all of my personal data from your records, for whatever reason, they have the legal right to do so. That obviously means that this healthcare provider needs to make sure that really all of the data related to that individual is really either removed or massed or however.

Starting point is 00:33:48 treated with. And that means some of the data may no longer be in their possession because it has been sent to assess solution like our solution.

Starting point is 00:33:58 So that's interesting. And that was there. Wow. And then I know this is an episode that we wanted to record at some point the

Starting point is 00:34:07 what's the privacy app, what's it called again the app that we have in Dinah Trace to then fulfill these requests. That's called

Starting point is 00:34:16 Privacy Rights. Privacy Rights app. Yeah, exactly. That's why you couldn't find it, Andy, because you were typing privacy instead of privacy. You have to talk with a British accent, you know? Yes, exactly. No, it's, you know, for those of you that are listening in and that are actually using our product, be reminded that we have also a privacy app that you will find in either way how you write it.

Starting point is 00:34:45 UK or U.S. pronunciation. Cool. Really insightful. For me, and I think this is why I mentioned in the beginning of before we hit the record button today, Brian and I are always so lucky to have guests like you because you bring such a wealth of knowledge on a topic, on an area that we are typically not exposed to. Right. So, and this is why it's fascinating to learn all these things.

Starting point is 00:35:15 yeah i feel like it's every time we uncover one of these new topics it's my mind just starts exploding with the complexity and it's not like this was this topic was thoroughly explored back in the early days of more simple architecture more simple compute systems and all in a simpler world. It's when these are focused on when everything is complex as this and you start thinking of oh well we'd have to check this, we'd have to be able to do this

Starting point is 00:35:51 like it just blows my mind that people can even start at this level and start getting a handle on it but I guess it's really just starting and chipping away and finding bit by bit where we tackle I'll go with privacy where we tackle the privacy

Starting point is 00:36:09 aspects of this And, you know, you can't boil the ocean and get it all done at once, but just keeping a steady pace and moving forward and moving forward and moving forward as quick as you can because these things obviously have consequences for a company. So you can't also just be slow about it. But, yeah, it's just mind-boggling. Like, you know, Andy, you and I going back to our old, you know, load testing stuff, right? When at first you would just do, okay, let's do a certain amount of people hitting the site, right? But then you start thinking of, you start trying to recreate. tests more accurately. And then the more accurately, you try to recreate the tests, the more you uncover, the complexity of the accuracy. And it keeps getting like, it's a rabbit hole, basically, you know, is where I'm going. So to that point, with this being our first introduction, or at least my first introduction to all this privacy stuff, it's quite amazing. And I'm not literally having my jaw on the floor, but figuratively, I've got my jaw on the floor for most of this episode thinking about all this. So.

Starting point is 00:37:11 Very much thank you on that, Kat. So on the topic of, this is true, obviously, for every SaaS vendor, but let's stay with the observability space. In the end, it's about really building trust. We obviously need to make sure that our consumers, our customers, or any customers of SaaS services really trust that SaaS service, that previous information that shouldn't be captured, it is not captured and that if it is captured only the people that are allowed to see it

Starting point is 00:37:44 are you know can see it is there anything else from an observability like privacy considerations in observability that that you wanted to to highlight any other things that you think this is something that maybe people don't think about when it comes to privacy considerations in observability or have we pretty much covered the key topics Maybe I could give a few examples just to make this more concrete. So I covered all the features of how you'd try to stop sensitive data being seen by people who shouldn't have access to it. But why is there sensitive data in the first place? What kinds of sensitive data are we collecting in observability?

Starting point is 00:38:28 If we just stick to the classics of logs, metrics, and traces. I know that observability dueling is offering many more features now, but if we just stick to the classics, logs, of course, it's really common to see people logging personal data for debugging purposes. something like user X with email address Y logged in just now. I've heard lots of engineers referring to those as audit logs. And then you dig deeper and you ask, hey, do we need this audit log? Is there actually a requirement to have an audit log? If it is an audit log, could you please put it in a special audit log place,

Starting point is 00:38:57 not just into the general logs that anyone can see? So those kind of things are really, really common. You see that all over the place. Then there's less obvious things like API design, for example. you can have an API design that isn't privacy preserving. So you might include personal data in the URL or in the request headers or something else that gets captured in the trace.

Starting point is 00:39:21 And of course, depending on your settings, you could choose to mask that. You could choose not to capture it. But you might want to capture it for most cases, but you have this one API endpoint that is leaking loads of personal data or sensitive financial data, bank account numbers, for example,

Starting point is 00:39:35 credit card numbers, eye bands. Those are also really common. And the real gotcha is even if you don't use observability tooling, somebody else further along the request chain might be using it. So that's something to keep in mind, even if you think it doesn't matter how I design my APIs because I'm not using observability tooling, somebody else might be, so your content delivering network.

Starting point is 00:39:58 And I'm just making notes because for me, this just opened up my eyes on API. First of all, both examples, the audit logs and the API design were really interesting, Because the first one really on the logs is if you log it, do you really need this data? And if you need it, what is the right storage? What is the right place to store it, right? Because obviously many, I guess every logging observability solution has a way to put data into certain buckets. We also call it bucket and I'm not sure what other tools are calling it.

Starting point is 00:40:33 But we can decide where the data gets stored and then that bucket. it has them privileges on who is allowed to access the data. And we even do it on a record level. But I think that's an interesting question. So, A, do you really need this information in the log? And the way you log it out, it will end up, let's say, in a general store, but it shouldn't be there. So you need to think about this.

Starting point is 00:40:59 The API design is also pretty interesting because if you put sensitive data on a URL, it actually means that from the browser of the mobile app until that URL reaches your backend, there might be many different hops in between. There might be caching layers, there might be other web servers reverse proxies, and most of them are logging the URLs for various reasons, right? I mean, typically the URLs always get locked. And then the question is if there's sensitive information on there, then there's a third part in the middle, this is obviously not what you want. And then also on your backend system, you want to vet it,

Starting point is 00:41:38 you want to make sure that certain data doesn't get locked by default, but only if you really need it and then again, think about where this data gets stored. Is this something in your work that you also, are you part of API design reviews?

Starting point is 00:41:57 Are you part of code reviews to validate this? Is this something we can kind of automate or any best practice and how we can make sure that already in the design phase, you are considering all of these privacy rules? That's a great question.

Starting point is 00:42:17 And the classic challenge is how do you scale that? So we would like to be there for every code change, for every new API that's introduced. But how do you scale that with a relatively small team? And I would say AI is opening up lots of new opportunities now. I was at Pepper, Privacy Engineering Conference, most by the Usnix Association in the US just a few weeks ago. And AI and how we could build in LLMs was on everyone's lips.

Starting point is 00:42:42 They were saying, oh, I've been trying it to automate privacy code reviews and threat modeling. So everyone is trying it out. It's still not at a mature enough phase yet. But it gives me hope for the future that we could really scale this out en masse so that every pull request is checked, for example. Until that point, there is a much simpler analog method, which is privacy coding guidelines, and also you can work this into your API guidelines.

Starting point is 00:43:04 So then you can make it clear to each developer, hey, it's your responsibility to follow these practices and try and keep those as simple as possible. No legal wording. Ideally, don't even mention privacy. Just document a requirement. You must not put user identifiers, email addresses, list every single data type, because that is a really common source of confusion if you say PII or personal data. Most people don't know what that means, or if they think they know what it means, they don't know all the different cash crease it can cover. It almost seems like, you know, as there's security scanning tools, that it would be really beneficial for there to be, you know, some privacy scanning tools.

Starting point is 00:43:48 One example I think of a lot is, as we're talking with customers who are sending us logs, they're looking for like, you know, BAAs and all this other kind of stuff about making sure that we don't store certain things. And my thought is always, why don't you just not put that in the log, right? Why are, I mean, obviously there might be some times, but they're like, no, there's no reason for us to do it. We just have no idea what's in the logs, right? And that is the big challenge because they have this legacy of code writing to logs and they have no idea what's in it. So even starting with like some sort of maybe LLM based, you know, log scanning tool, which can go through and identify person.

Starting point is 00:44:34 you know, identifiable data, let developers know, start going back and fixing it, also maybe even doing the code reviews, you know, just like with security, there's all these tools that'll go in and do these scans. It's, again, opening this whole world of possibilities and endless things that are needed. So, yeah, I can imagine at some point in the future, people not only asking us for our security status codes, you know, and all the different benchmarks we pass, there'll be ones for privacy, which, you know, obviously there are already

Starting point is 00:45:08 some of those, but even more of those kind of things going on. It's just mind-boggling. It's such a deep topic suddenly within 50 minutes. If I can add one simple example, if as a listener you're thinking, oh, I wish we had privacy code scanning

Starting point is 00:45:24 but we can't afford to integrate LLMs into our workflows, I know it's expensive, we don't really trust them yet. A very simple option, which you just reminded me of, Brian, when you mentioned can we leverage security tooling? Something we tried was using SemGrip, which we were already using for security code scanning, to do some very, very basic regex checks to look for variable names that look like they might have personal data. So look for VAR email address, VAR email, VAR user profile ID, things like that,

Starting point is 00:45:55 tailor them to your context. And it turns out the variables don't vary that much. So although there are some solutions out that there are more, complex and try and do data flow modeling and figure out sources and syncs. They're very heavyweight. They take a long time to run. Semgrep is super quick. You might already have it in your workflows. And you can actually get surprisingly far, just a quick reject scan. And suddenly you know, okay, in this system we knew nothing about, based on the codebase,

Starting point is 00:46:21 it's processing, email addresses, phone numbers, first names. Now we have some information. We can go and talk to the team about why those are being processed. And for reference, this is SAMGRAP, S-E-M-G-R-A-P. Perfect. We'll add the link to the description of the podcast. Because it also reminded me of one thing that I've started to advocate for this, but I think now with this privacy discussion, have some additional use cases.

Starting point is 00:46:54 When you think about the software delivery lifecycle and you're deploying a new change in your test environment and you run some tests. I've been advocating for analyzing your logs, your metrics, your traces on certain patterns. One example would be do we have debug logs in a higher level environment? Or as people are moving from logs and to traces, is the same exception now available on a log and a trace, because then it's duplicated data that nobody really needs. we could also look and that there's a new company that just launched

Starting point is 00:47:29 our oldie garden that defining some best practices on what is good observability, what is good tracing. So for instance, do you have over-instrumentation, right? As we're asking engineers to instrument their code are the instrumenting functions that are getting called at a thousand times per

Starting point is 00:47:45 transaction. Brian, this goes back to our early days with Dinah Trace when we had the shotgun instrumentation and then sometimes we had pure paths that were kind of, you know, timing out or like having like 10,000 notes because somebody instrumented the wrong method that was called 50,000 times.

Starting point is 00:48:02 And so the same could be done now. As part of your software delivery life cycle, you can then analyze your logs, your metrics, your traces based on patterns and then act as a quality gate in Dinah Trees where you can use the site reliability guardian for that to execute certain queries against that observability data

Starting point is 00:48:20 and then say, hey, it seems you have some personal PII data in your logs or you've instrumented this new method and you're capturing data that you shouldn't capture. And so kind of like a quality gating. That's cool. Wow.

Starting point is 00:48:38 Kett, unfortunately, as I told you, sometimes time flies when we have this discussion and it's unbelievable that it's almost the top of the hour. Is there any anything else

Starting point is 00:48:52 obviously besides we will have you back and go into more details but any final thought for today's episode that you said I would have wished I would have said this but there was no time anything any final thoughts from your side for our listeners

Starting point is 00:49:08 I would have two final thoughts one is I fear I've made this sound very complicated it can be complicated but I don't want it to seem so complicated that it intimidates people and think well we can't be previously preserving that's too hard. I would say always go back to that example I gave of does this meet expectations, our users' expectations, my own expectations, that is the simplest way of thinking about it, and it's

Starting point is 00:49:33 quite effective. The second thought is if you're finding that approach falls down, because you're trying to design for users who have very different experiences from your own, and you can't put yourself in their shoes very easily. Persona-based threat modeling is something I've been exploring, and I'd recommend you check it out. So leveraging the UX personas you probably already have for your users. You know who your target audience is, you're thinking about them as you design

Starting point is 00:50:00 the user interfaces and the whole overall user experience. Use those as a reference when you're trying to evaluate this. Does this meet my expectations or not? That can help you put yourself in the shoes of somebody from a minority group, say somebody who might be persecuted on the basis of this data being shared, which you would be totally fine with sharing publicly

Starting point is 00:50:18 having on the front page of the newspaper because it's not a problem for you. And I'm very excited to hear you want to have me back because we can dig into that much further in the next episode. And I took notes. I took notes to say next episode. We have a lot of topics to discuss. Yeah. Yeah.

Starting point is 00:50:36 Brian, do you want to close it? Yes, I want to close it. Because I know you have to get these tickets. Yeah, I got a minute in 54 seconds. But then I go into it. a queue, you know how that all works. Really appreciate it. And I didn't mean to make it sound complicated as well with my reactions to it.

Starting point is 00:50:56 To me, it's more the complication of once you start thinking of the possibilities and the layers and layers of all this, you realize there's so much to it, right? I mean, it exists in so many places. And as an organization, you are exposed to a lot of risk and making sure you have it all locked up. it's not just like oh yeah well just put a firewall up and we're good right it's very very big picture right now there are simple tools and there's a simple process as you're explaining that you go through this stuff and you method you do it very methodologically what the hell's the word I'm saying there is that right I don't even know I'm not even going to bother trying but the you know it's a fascinating topic and you know if I were not the old man I was

Starting point is 00:51:48 I would be years ago I might have tried to switch into security and then I might be thinking oh this is a whole

Starting point is 00:51:56 new field here with the privacy because this is like the new frontier of excitement you know Andy and I were part of the

Starting point is 00:52:03 frontier of performance if you will right where we were trying to convince people it's something important right we used to have

Starting point is 00:52:10 to battle people not physically battle them but you know really try to convince people like you know performance is an important thing now it's standard

Starting point is 00:52:17 everywhere obviously security has real consequences and people are taking it seriously and at a certain point with an unknown number of people in the organization it's time to start doing the security the privacy setup so it's exciting stuff

Starting point is 00:52:36 and I think there's just like so many worlds of possibilities and when you think about the idea that AI is going to be writing people's code soon there definitely are other areas people need to think about, start branching out to so they can continue to work in this amazing field.

Starting point is 00:52:54 But, so don't work in privacy. Absolutely. World of possibilities. Thank you both for a great discussion. Thank you. Thank you so much for the insights. See you soon on the next podcast.

Starting point is 00:53:09 Yes. Have a wonderful day, everyone. Thanks for listening. Thank you. Thank you.

PurePerformance - What is Privacy Engineering and Why Its not as complicated as it sounds with Cat Easdon

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.