PurePerformance - How I became an SRE in FinTech and what this means with Diana Najda

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody, welcome to another episode of Pure Performance. My name is Brian Wilson, and as always, I have my fantastic, wonderful, lovely co-host Andy Grabner, who loves to make fun of me while we do the intro. And boy, what an intro. We've had such a hard time bringing this episode to you to everybody today. But we do it because we love you. We genuinely love each and every single one of you all three of you we love you right andy hi andy i've been just babbling and babbling and babbling and i'm not going to stop babbling until you interrupt me so you better interrupt me at some point otherwise i'm not going to stop talking and i'm going to start i'm going to pass out maybe are you waiting

Starting point is 00:01:01 to see how long i can continue talking before maybe but it's good today you still have a positive attitude to the whole thing because we literally tried about 20 minutes to get this started and i know you were making fun of me in the attempt where we had the video cameras turned off so you didn't see if i make a funny face now you saw me making a funny face and i did. Yeah. It wouldn't be the same without that. Yeah. But speaking of funny faces, no, that's not a good transition.

Starting point is 00:01:31 That's not a good transition. Speaking of great voices and great inspiration, I think that's what we should talk about. I would love to introduce our guest today. Diana, welcome to our show. I hope you are as equally amused about our little intro as we are. But yeah, welcome to the show. Diana, do me a favor, quickly introduce yourself to the audience because I think it's your first podcast with us. Yes, it is. And I feel like I don't know what I got myself into between the two of you. So my name is Diana Nida. I currently work at one of the

Starting point is 00:02:12 fintechs here in Canada. Primarily my focus is on the SRE team with the DevOps and API platform and focusing on all our monitoring, alerting, tooling, and integrating them with our actual development pipelines. So you have a job that I think half of the IT industry would love to have, at least in the job title. Everybody likes to have SRE, Site Reliability Engineering. DevOps is the big topic that Brian and I, we had so many episodes in the last couple of years talking about it and when i

Starting point is 00:02:45 go around and i fortunately able to travel again and meet customers meet the community everybody's just i think it's you know sre even though it's been out there for a while it's it's still and very it's a job title and task that a lot of people want to get into. So my question to you, Diana, is how did you end up being an SRE? I want to also know what does SRE actually really mean in your organization? I would like to also understand, do you differentiate between DevOps and SRE? I think you mentioned both kind of terms in your job description because people would be interested, I think, in hearing from individual organizations and from individual folks like you, what does SRE and DevOps really mean?

Starting point is 00:03:31 But maybe before we go into all these questions, start with the one, what did you do before? Because I know it's kind of a lengthy... Did you take notes on all those questions, Andy? I have a mental note and I will go through everyone yeah yeah uh yeah so if we start way back when i i went to school for computer science and after computer science i knew like deeply with a passion i'm like i never wanted to code a day in my life like there was something in in university where i was, I don't want to develop. However, I was still deeply interested in computers, in software, I've always loved logic and math. Like those were just some things

Starting point is 00:04:12 that I was really interested in. But I also wanted to have a career where you weren't sitting kind of in a box and just developing code. You were able to go out and interact with perhaps with human interactions and kind of understand the impact of software in the human space as well. Right. So I actually started my first job was at Dynatrace. So I was a Dynatrace consultant for almost six years. I primarily did

Starting point is 00:04:43 a lot of my consulting at financial institutions in Canada. And I had a little bit of experience with some of the ones in the U.S. as well. And after Dynatrace, I actually joined one of those financial institutions as an SRE. So I brought all my experience from Dynatrace into the SRE space. And what I've really, really enjoyed with this role is you're able to see things from end to end. So you see the code being developed. You know what it's going to be used for. And then you also see the impact on our end users as well. That, okay, we've built out this pipeline.

Starting point is 00:05:22 We're working towards making it resilient and how does this impact our development teams our operations teams and all the way down to the actual clients who are actually like what i'm one of the clients that would be using our tools as well and seeing that whole end-to-end process so having that touch like one foot into the software world and then also one foot into understanding the impact in like the real world as you say as i explained to my non-technical friends who always ask me what do you do all day you just sit at a desk i'm like no there's a lot more to it and that's what i really have enjoyed about um sre um so for me what you're explaining is what I always thought a textbook SRE should actually be. Really spanning everything and really kind of connecting all of the different teams involved in, I think, defining the software, defining the requirements.

Starting point is 00:06:19 With requirements, I mean, not only functional requirements, but especially your scalability requirements, your resiliency requirements. In the SRE talk, we talk about SLIs and SLOs. And then really kind of making sure that the system that is built is observable from day one, from the inception of the first line of code. Are you in your role then? So are you mentoring the development teams in order to make the software observable as well? Are you helping them and mentoring them? Or how does this work from a development perspective? So what are your interactions with the development team?

Starting point is 00:06:53 That would be interesting for me. Okay, yeah. So first, I believe I'll answer one of the questions that you asked in your slew of questions in the first one. To me, there's DevOps and and there's sre work right yeah uh the difference between the two of them in my eyes is devops is delivering code delivering code quickly it's more focused on speed get the product out that's what it is whereas sre is making sure that that code that you have now deployed it's resilient it reliable, it does what it needs to do from day one out into production. So one of the things that I work on is observability as code,

Starting point is 00:07:34 so also known as Monaco. So if anyone's a Formula One fan, so you think of Monaco Grand Prix. So if you think of DevOps as the team that has built out the car, right? They just need to make sure the car makes it around the track as fast as possible. That's it. It just needs to cross the finish line. But then your SRE teams are the ones that make sure that all the bells and whistles within the car are letting them know what they need to know as the car goes around the track.

Starting point is 00:08:00 Are the tires okay? Are the brakes okay? Is everything fine? If something goes wrong, does it send out an alert? Does it notify the team that needs to be notified? So that would be in my eyes, like hand in hand, how they work. And so when we're looking at my role specifically and how we make sure we bring that mindset to our dev teams and our operations team. It is working with the dev teams, giving them an understanding of, hey, these are the toolings that we've built out. And getting to understand that what they're building from a development team or an app team perspective,

Starting point is 00:08:39 this is how the operations team is going to interact with it. We want to make sure that everyone's lives is easier, that the operations team is going to interact with it. We want to make sure that everyone's lives is easier, that the operations team understands the importance of the code base itself. So as any developer would probably tell you, their app is number one in their eyes. It's business critical no matter what. But from an operations team's perspective, that's not necessarily what they're looking at.

Starting point is 00:09:07 They're getting all these alerts, they're looking at maybe hundreds a day, whatever it may be. And we really need to start more from the dev teams and getting them to identify what is the criticality of their applications. If it does go down, does the operations team need to bring it up right away? Or do they have some time?

Starting point is 00:09:23 How do we do that? So like bringing that understanding across the whole board and making sure that those conversations are also constantly happening and not as previously may have happened before SRE was introduced. It was more of like a silo basis. You push your code out, you don't think about it. It's fine. It breaks, then someone else is going to take a look at it later

Starting point is 00:09:44 when an alarm goes off i really i love your analogy with the car i mean i know we've we've used this uh you know we we have used this internally and i think others may have used it too but you brought a different angle to it and the kind of something sparked in my in my in my head and let me let me hear your opinion on this. So if I am in a car, and let's take one of these modern electric cars, no brand name. But as a user, if I see this in production, then really what I'm interested in are metrics

Starting point is 00:10:18 like how fast do I get to my final destination? How many stops do I need? And will I get there without burning any tires or something like having in the potential that the tire blows up so these are kind of metrics that are interested for the end it was on the other side like the tire thing and the um the the range of the car these are also great metrics to feed back to the engineering team so that they can constantly improve the whole thing. But then there's many more metrics probably that are collected in the car that are more technical,

Starting point is 00:10:49 not that necessarily important for the end user, but that again, that help then the technicians at the garage, for instance, if something is broken, to say, hey, now I know what the problem is, so I can optimize this. So I really like the analogy of this car with then the telemetry that you put in and then giving the different people dev testing architects uh devops sre and the business then the different metrics that make sense for them right and i think did i get this right yeah and like as you mentioned like

Starting point is 00:11:21 all the different teams involved that's like the other thing that you really have to focus on. It doesn't matter how good of a driver the driver is if the car is not good. And also doesn't matter how good of a car it is, if the driver can't make it around the track or if it doesn't meet the right metrics. Like all these teams have to work together cohesively in order to accomplish the goals that they're striving to accomplish. Yeah, one part I'd add to that, Andy, as well, to complete your picture, is when you have a really good setup, like you have in the computing world, an automated pipeline, you have the constant feedback loops, you have the bi-directional components, is with, at least with this unnamed brand car that we're speaking about, I don't know about with the Formula 1s, besides collecting the telemetry and then being able to bring it in to change the tires

Starting point is 00:12:09 with that other car you can make tweaks to it while it's running right so if you think about your feature flags and your ab testing and all that if you have all that set up while it's all running in the system you could say all right let's there's a hurricane coming in let's extend the mile the mileage range with the flick of a switch right and it can do that back and forth update so i think you know just extending to that additional side of affecting changes while it's running in production without it having to come into the pit or something like that it's it's that's just like the complete picture of a really you know i don't know how many of those are out there where people are doing full-on feature flags and fully automated pipelines and all that.

Starting point is 00:12:49 That's, you know, cutting edge stuff still, but we're seeing more of it. But that, I think, is the big piece. And I'm rambling again and I'm not going to stop. All right, go on. One topic, obviously, that we are kind of centering our discussion around is metrics. What metrics are good? And I think in the terminology of site reliability engineers, we always talk about SLIs and SLOs. I would like to ask you, because this is the biggest challenge that I have, when I introduce people to the concept of SLIs and SLOs,

Starting point is 00:13:21 and they say they understand it, but give me some recommendation on what are good SLIs and SLOs. How do I get started? Especially if I don't get these metrics, the top level business goals from the business owners of this app. Now, in your experience, how do you deal with this? How do you define good SLIs and SLOs? What's the process

Starting point is 00:13:53 yeah so matt if you've looked at dynatrace or any monitoring platform the amount of metrics that it provides you is just it can be overwhelming at some point so yeah we've definitely seen that where you're staring you go to create like a dashboard and you press a little drop down on the UI and that list keeps going and going and going. There's like so much to look at. And if you go through each one, one by one, you're going to be there all day to try and figure it out. So we've started simple. Just our basic availability metric is, hey, let's just measure the amount of our successful calls over the total calls. And then we're making sure that our successful calls, when we're looking at what is being marked as a failure, is it actually a business failure? Is it something that we expect to be a failure? And tweaking our monitoring settings and whatever, just to make sure that

Starting point is 00:14:42 what is a failure is actually a failure. So then when we're looking at our SLOs, that it's exactly what we expect them to be. So that would be from like back-end service perspective. And then from our front-end service perspective, also starting from the basics, just starting really simple, is let's look at the optics. Like let me know how many of our user actions are being marked as satisfied over the total number of user actions. Going through and identifying that, yes, there are many actions that can happen within a flow,

Starting point is 00:15:15 but we've identified our more business critical applications and set SLOs on those as well. It'll be the same thing for our backend service. That would be more just from an overall availability perspective. And then once you get into the weeds of like the response time, like, Hey, our backend services, what should they be coming back with? So that those are discussions that we now have to bring in more from like a business side.

Starting point is 00:15:42 And we have those discussions to figure out that, okay, we have this particular call from a front end perspective. We need it to complete within the two, three seconds. So what does that mean from a backend? And some of these backend calls can be very, very complex. So it's just looking at our frontier applications and making sure that all our back ends are also responding accordingly so they reach that or within that timeout threshold. So it's a lot of conversations with just understanding that this is not a monolithic, no one works with monolithic applications like our applications are very very complex and if you do set an slo on something very high level you might never meet it if you don't also go and

Starting point is 00:16:31 you look at your downstream dependencies as well so it's having that um those conversations with all those teams as well and looking at our slos throughout entire flow, not just starting at the top level. And I think this also answers my question, but just to confirm, you see this in your role as an SRE to connect and talk with the business? Do you actively? Yeah, okay. That's because this is something where I had, I was just on the West Coast and traveling around and talking with a lot of customers.

Starting point is 00:17:06 And there I was sitting in a room with SREs, yet they were telling me they don't know what kind of business objectives the business really has. And then I advocated to them, well, I think you are in the perfect position to bridge that gap between the you know, the technical side that actually, you know, gets that car and operates it to then really figuring out, okay, what are the metrics we're really interested in and what values do they need to be? So really, you know, taking this, bridging the gap, right? That's really what it is. Talking with the business and figuring it out and then translating it down to, okay, how can we monitor this?

Starting point is 00:17:44 And what does this mean from the front end to the back end to all the different services yeah i think that goes in line with uh part of what diane was saying what you were saying earlier right that that you want to be able to interact with people you want to be play more of an active role and i actually came up with another way to fit that into the car model i'm just full of the thing so let's let's say you have a team that's in like last place right maybe you get a better driver but if you as the person in charge of all the telemetry and the electronics and all that component of the car yeah obviously you want to win but that's not realistic so as your job what is it you want to do well you need to talk to the business owners of the car and the team and say, all right, we're in last place now. Where do we want to be by the end of the season? All right. If we can go from last place to finish

Starting point is 00:18:35 around the halfway mark this season, that's a huge improvement. That's something attainable. Now you have that guidance, but you would have to be talking to that owner, the business team, as it were. So I think it makes perfect sense in that role. Like, how do you know what your goals are? How do you know what you're trying to accomplish? Right. We can all say, yeah, let's get to the, you know, the mythical Google what's, what's the response time now, whatever. Right. But that's not realistic in all scenarios. So that, I think it's, it's great that you're advocating Andy for people to do that. And obviously Diana, that you're doing this already is fantastic and sharing that thank you um got a different question on the you mentioned earlier the success the success rate right like you

Starting point is 00:19:17 mentioned you're measuring availability and you're looking at the success rate of requests that are coming in but i think then you said something that that i need to add to my um to my um talk track when i ever advocate to people because you then said you're then looking very closely into how errors are detected because some of those errors might not be business critical right so you may like the question is always is an HTTP 400 that indicates maybe a bad login because you entered a bad username or password, should this be counted as an error? I guess this is what you meant, right? And some other of these discussions, yeah.

Starting point is 00:19:56 Yeah, exactly. There's a lot of, for example, custom exceptions or 400 errors where some of the 400 errors are actually, should be marked as failures. And we do need to be alerted on, they do need to affect our SLOs. But on the other hand, there's some errors that don't necessarily need to mark it. Or be marked as failures and affect our SLOs.

Starting point is 00:20:23 So it is a lot of starting in our dev environments. This is not something that we're looking at just in production. It's starting in dev when we have our developers looking at their monitoring in those environments and ensuring that they expect to see what they expect to see throughout their testing processes as well. So then when we do go into production and we have our SLOs set up there, that they are behaving accordingly

Starting point is 00:20:50 and gathering the metrics that we want them to be gathering rather than potential noise and alerting us on false positives. FRANCESC CAMPOY FRIEDMANN- That's cool. That was actually just to answer my next question. Where do you start with this? So you start early. And are you in a stage now where the developers,

Starting point is 00:21:12 when they change their code or they add a new service, they are doing this as a self-service? So can they already define, if they're introducing a new exception or they find a new exception, can they themselves kind of define through configuration that hey this is an exception that is business critical where the other one is not how does this work the process or do you have to step in all the time so for now i am still stepping in uh to do that sort of configuration but it is something that we are adding into our actual dev pipeline so that they can go in and just, I may have mentioned before, who knows the code base, who knows the application better than the

Starting point is 00:21:52 developers themselves. So it makes sense for them to actually have that power to configure these changes. And so this is something that we are building into our actual pipeline so they can configure these exceptions, configure these rules, and then we have those changes pushed through automatically as well. And this just brought me to a new idea, a new feature right here in Diamond Trace. What we should provide is when a new build is deployed into a dev environment or a test environment at the end of that test run we should spit out a report and say hey we've detected these new errors like these are five new exceptions and then you can go in and say uh ignore ignore ignore yes business critical I think that would be because I mean we just want to make sure that it's easier yeah right for everyone to say okay I want to wanna i wanna finish the configuration of my system so i wanna also automatically get a report on what has changed and i think that's

Starting point is 00:22:53 i need to take a note of this yeah i wanted to ask right anytime you're asking a developer for more time it's difficult right so time can be obviously the second stage of this is building in some of that automation doing doing that component, you know, doing the identifying which ones and somehow promoting it to the pipeline. But you're starting this work, as you said, in the dev environment, which means them spending time with you to discuss these things and go back and forth. How do you approach and get, you know, do you have any advice for approaching the development team for getting them to buy into this, to commit some other time to doing this as opposed to being like,

Starting point is 00:23:30 look, I got so much coding I got to get done. Can you just leave me alone and do this in production? Or did you happen to have an organization that was just fantastically bought in or did you have to win people over and any tips there? Yeah. So I would say also part of my role is going out and advocating for the work that my team does. So anytime we have any new hires or new teams that have spun up or once a month, we run sessions that are not just focused on our monitoring work. It's focused on the platform as well. And going in and showing all these new capabilities that have been built out into our pipeline,

Starting point is 00:24:13 that have been built out into the tools that we use. Because just as any organization, there is a crazy amount of tooling that we use. And it is very difficult for any one person to keep on top of this tooling. So if you have someone like, for example, that what I'll do is I'll go into look at the new features that some of our tools, our vendors have provided, figure out what might be of interest to our teams, and then also go in. And then the same thing will happen on our platform teams that when they built out new features they'll like make sure that we'll go out and actually like show these features off show them how it

Starting point is 00:24:51 works give them maybe work with specific app teams to give them more app specific training sessions because they'll understand everything a lot better when it's within their own code base. Because every time, for example, if we're just doing the same example on the same application, sometimes you don't see that connection there. You're like, okay, well, this is just another tool. This is another, like, why would I use this? I don't see this benefiting me.

Starting point is 00:25:21 Especially when you don't have the time to actually go and dedicate to look at it. So breaking these sessions down just to like the specific app teams themselves, showing them how this actually makes their lives easier. Essentially, that's the point. Be like, hey, like all that time that you spend debugging or looking into issue or trying to pull these metrics using whatever it may be, look how much quicker you could do it here. Or this is like you have the power, like you could do this yourself. This is not like showing them easier how they can get this information, et cetera, like that.

Starting point is 00:26:01 Awesome. Thank you. I got a question for you. I want to go back to... Really, you got a question for me. I know. Another one. Another one, yeah. This is a tough one now because I've been receiving this a lot and I wonder how you answer it or if you have an answer to this. We've been talking about DevOps and SRE for many years now, right? And many organizations have started going down that path. They have invested in hiring new people or just changing job titles, right? Building new tools, buying new tools, moving to the cloud. And now I hear more and more, especially in economic turbulent times that we live in right

Starting point is 00:26:44 now, that management live in right now, that management keeps asking, so, hey, what's the return on investment of all of this SRE stuff we're doing? What's the return on investment of moving to the cloud? What's the return on investment of DevOps? Was it worth the money? Don't let your bosses hear this part. No, no. I'm really your bosses hear this part. No, no, no. No, but I'm really curious to hear,

Starting point is 00:27:10 do you have a way to measure the impact that applying SRE practices had on your organization? Do you know how to measure this and show this, that this is why we're doing this? Yeah, so this is kind of the picture it didn't happen example here. Yeah, so this is, I feel like every, no matter what position that you're looking at, this is something that everyone wants to make sure that's being measured in some way. So one of the ways that we've been looking at it is one of our pain points would be incident resolution like we have an incident how quickly do we find root cause how quickly do we identify and turn this around and remediate

Starting point is 00:27:52 the issue and on the other hand of it we want less of these incidents so making sure that a the number of incidents that we are getting on a day-to-day, week-by-week basis, that the majority of those incidents and alerts are proper incidents and alerts. So to get there, we have to start in our dev environments again and into our production environments, making sure that, yeah, we're going in and identifying those failures. We're creating these SLOs to understand that okay so if we do have these incident if this incident detected is it affecting an SLO if it's not then maybe it's not as important we can take a look at it later or suppress it whatever

Starting point is 00:28:37 it may be so that would be one way I would say that um we could we are measuring is to be able to determine is our operations team sleeping as much as much during the night as they want to? And then before SRE came into play, how much sleep did they get before and how much sleep are they getting after? And when they are getting woken up in the middle of the night, is it actually something that they need to be woken up for? I remember this is a great metric, actually. I remember somebody said, meantime between wake-ups, something like this.

Starting point is 00:29:12 What's the meantime? Meantime between dreams. Meantime between dreams, yeah, exactly. No, but thanks for answering the question in a similar way than I kind of answered it. I think there's a couple of industry metrics that have been bounced around quite a bit in the last years, like the DORA metrics, the DevOps Institute metrics, like the deployment frequency is one of the things, lead time, right? That's all about what you said earlier, DevOps, kind of the job of DevOps to really speed up the delivery part. But then it's about reducing MTTR and in

Starting point is 00:29:46 general just avoiding failure from the beginning. And I think this is exactly what you just said, faster MTTR, fewer incidents. I obviously know what you are working on because we've worked together over the last couple of months and I've seen some of the cool things you are doing with your observability platform, in this case with Dynatrace. I really took a lot of the things back to the engineering team where you were putting, I mean, you mentioned the SLOs, but the other thing that you're doing pretty well is you're adding additional metadata as part of your deployment of your services

Starting point is 00:30:18 to say, you know, which team does this belong to? Is this, you know, maybe also criticality of this particular deployment so that when a problem happens, the Europe's ability platform all of a sudden has more metadata to then really say, how critical is this? It's kind of like rating the problem that comes up in a different way

Starting point is 00:30:38 based on all the knowledge we have about it. That's really cool. So how long have you been, you said when you, when you switched to that organization, you work right now, you SRE was the title from the beginning. Yeah. So when I switched over, I was primarily the SR working on SRE work. And then now through some organizational changes I do still have SRE

Starting point is 00:31:08 in my title but it's more I'm more focused on like the monitoring alerting and the SRE work so that would just be the the difference yeah cool any advice that you can give people that are maybe hearing this now and say, SRE is really cool, but I'm not sure if I am, because some people say, well, I don't know if I have the skills for doing it. I mean, you in the beginning said you never wanted to code, but you're still interested in the impact of software on human beings. I mean, maybe that's, I mean, I always assume that SRE has to do some type of coding, at least some scripting, right. You, which I guess you still do.

Starting point is 00:31:50 Yes, I do. Now I, now I do do that. And I do find a lot more interest in it. Whereas before it just didn't had no, just lost the interest in it. I wasn't working on any applications that I really. Spark that value, that interest into actually going in and developing. Whereas now I do like what I do. So from, I guess, from an advice perspective, I know SRE DevOps,

Starting point is 00:32:21 these are all buzzwords. And sometimes when buzzwords start buzzing, they might lose their value when they're always up there and these titles are being thrown around. But it is a really cool career to have in the sense that, like I said, you're able to work on not only making sure that the code bases themselves are more reliable, but like going out and making your developers lives a lot easier. Like you're giving the power more to the developers,

Starting point is 00:32:52 to the operations team and seeing all these teams interact and you're able, like it's constantly changing. Um, I had someone tell me the other day that in a waterfall methodology, you know what your day-to-day works. It looks like, like, you know that, okay, you do this task, you do this task, you do this task. And you're like, okay, now you can sit back and relax. But like with SRE work and our agile practices, it's like, it's constantly changing. There's always a new thing coming out. There's always a new thing coming out. There's always a new way to integrate. And especially as our applications like grow in complexity and in size and how quickly we're actually deploying changes, it really makes everything a lot more interesting to work with.

Starting point is 00:33:38 I really, I had to, I always try to take notes because some of them also make it into the, into the show notes. And I wrote down the advice. DevOps and S3 might be passwords and they may lose the bus the more you hear them. But it's really cool as you really make the lives of dev and ops easier with the way you help them improve the tools and get more insights. It's a really nice way of putting it. Sorry. Thank you. That was good.

Starting point is 00:34:10 Awesome. I was going to say, earlier you said you didn't want to be just like, I don't know if you said, trapped in a box developing or something all the time, something along those lines. But as you're describing, the benefit of what you're doing is you get to wear many hats. You're doing some development or some coding, but not full-time. You're doing interaction. Yeah, so it's really a multifaceted position. But when you joined and entered into where you're at now, was there a, it's a question in here, but was there already an SRE team who was able to bring you in and then say,

Starting point is 00:34:43 okay, here's what we're doing. Here's what our goals are. Or was it like, hey, we want SRE, you're it. And combined with those, did you have any sort of, or did you discover any good resources, maybe online communities, other good things that help you look at what SRE is, what other people are doing, and how you might be able to incorporate some of that into your current place to make it not just, hey, I have an SRE title, but I'm actually doing some cool things with it and finding cool ways to leverage things. Any advice you can give to our listeners from that point of view? Yeah. So I say when I initially joined, I was not brought into the, it's called the SRE Center of Excellence, for example.

Starting point is 00:35:27 I was just brought in more just for the platform team that supported this slew of applications and working to bring in these SRE practices at more of a smaller scale. So we are an org within an org. So starting out there, whereas going from the more of the of excellence would be going from top down approach, like just bringing it down that way. And we went the other way. And so I think that in the end, both of these positions serve their role as needed within such a large organization that I am part of. But the benefit, I feel, of my role is the fact that I was actually, if I was in a physical office,

Starting point is 00:36:13 I would be sitting actually beside our development teams and actually going out and being able to show them how we're integrating this and seeing the impact right then and there. So it's a lot more of a closer interaction in that sense. This brings me to one more question that I have. It seems like questions are popping up. Only one more.

Starting point is 00:36:34 Only one more. I see, obviously, you're sitting at home. It's a 10-part question. It's a 10-part question. I see you're sitting at home and you were just saying, if you would be in the physical office you would be sitting next to them uh have you pre-covid have you been in the office and yes and what's the plan after covid now are you staying more home are you going back what's the plan okay yeah so i'd say for the last couple months uh i've been going in with a few members of my team just once a week. And then

Starting point is 00:37:06 starting actually this month, we're going in now two days a week. So we're having that every, let's say, Wednesday, Thursday, our teams go in together. We have gone out and reschedule some of our meetings that would usually happen on calls on other days to make sure that they're more in person. And I really see the benefit of that because you really didn't realize how much you've missed organic conversations and how beneficial organic conversations may be that you can just pass someone in the hall and have a quick chat about them or walk over to their computer and get that, see what they're working on, collaborate with them right there. It's a lot more difficult to kind of nurture these other conversations when you are working remotely. the week working at home and remote and being at home with my cat and then also then going into the

Starting point is 00:38:06 office the two days a week to actually interact with my colleagues and having those discussions as well it almost feels like we're doing this with the podcast here right we're just casually opening up a zoom session and invite the people you want to talk to. Casual today. Casual today. No, but thanks for that, for sharing this, because it's also been my observation while we got, I think, pretty good in working remote. I think there's no doubt about the efficiency of the work we can do from home. We don't have to be in an office where people monitor us if we're actually in the office, but the face-to-face interactions are just something that

Starting point is 00:38:50 is extremely valuable yeah it's different kinds of work you can get done in the different settings right and if you have those two it's good to be face to face to do the more socially based ones and then when you just got to get work done it's sometimes a lot easier to just be home without the distractions yeah exactly so it's i mean it's it's not not to go off on a tangent here but it's freaking awesome that there's this you know one of the one of the benefits of the horrors of of kovid was that um you know corporate world's waking up to the idea that there can be this model but anyway good hey um yeah what's right i was gonna say anything else we wanted to discuss here same thing you were about to say i believe so why don't you take it you have such a better voice than i do no i don't know come on you have the voice for radio i have a face for radio that's the expression

Starting point is 00:39:37 no uh no i just wanted to say thank you first for two things. I want to say thank you for doing this podcast with us, right? Because I think it's great that we have people like you that are willing to share these stories, especially how life as an SRE really is and how you ended up there. I think that's great. But secondly, I also want to thank you for another thing that you have been doing for me over the last couple of months, which is playing an active role in the Dynatrace Cloud Automation Guild, which is a group of people that meet on a regular basis where it is about sharing from users to users on what they're doing. And you've done a session in the early days of the guild and we are preparing a second session and I just want to say publicly here

Starting point is 00:40:26 thank you because it's really it's really great that what you do that's all I have I have a lot of notes no more questions no well there will be many more questions but I think considering the time we already took from you I think it it's, unless you want to say, spare some final words. No, it was great. Thanks for having me. I think this was an awesome discussion to have. Thank you so much.

Starting point is 00:40:55 And Andy, going along with the fact that I might have a face for radio, you've got a voice for silent films. I just wanted to get that one in. All right. Diana, thank you so, so very much. This has been fascinating. And thank you for sticking with us

Starting point is 00:41:12 as we figured out the technical difficulties in the beginning of this one. That hasn't happened, I don't know, since for years. But anyway, thank you again so much. I hope you continue doing the awesome stuff you do and continue sharing with the community. We were just discussing,

Starting point is 00:41:28 I think on the last recording, how key that is. You know, success is failures, but just the fact that you and everybody else is sharing has been part of what's

Starting point is 00:41:38 made this entire IT experience and the journey for everybody possible. I mean, imagine where Kubernetes would be if people weren't out there talking about it and sharing successes and failures.

Starting point is 00:41:47 So thank you for being part of that. Thank you for taking the time to be on our show. And yeah, I look forward to if there's an opportunity to have you on again, any new updates or any new breakthroughs, we'd love to have you back on. So thank you. Thanks for having me.

Starting point is 00:42:07 All right. Thanks, everybody. Bye thank you. Thanks for having me. All right. Thanks everybody. Bye-bye.

PurePerformance - How I became an SRE in FinTech and what this means with Diana Najda

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.