PurePerformance - Decrypting software reliability into a plain English with Ash Patel
Episode Date: July 1, 2024"Because I don't want software to go down every single day in my next gig!" is what drives the motivation of Ash Patel, Reliability Advocate and Podcast host of SREpath, to talk about and educate IT p...rofessionals on the importance of building and operating reliable systems.For 15 years Ash used to be Director of Operations at a private health service organization. He has experienced that patients couldn't get the treatment they expected due to unreliable software he was responsible for. In our conversation Ash talks about how he had to close the knowledge gap on technology but also solve the problem by having engineers understand the pain and the requirements of their end users. One way to educate more engineers is through his podcast called SREpath where Observability has become a hot topic recently. Tune in, hear about the memorable stories from his guests from CapitalOne, IKEA and SquaredUp, and lets move towards a world where software is reliable by default.Links as discussed today:Ash on LinkedIn: https://www.linkedin.com/in/ash-patel-srepath/SREpath Podcast: https://www.srepath.com/podcast/Clearing Delusions in Observability https://read.srepath.com/p/30-clearing-delusions-in-observability-2af Boosting your observability data's usability https://read.srepath.com/p/35-boosting-your-observability-datas-3f4 How to Enable Observability for Success https://read.srepath.com/p/40-how-to-enable-observability-for
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson.
Welcome everyone to another episode of Pure Performance.
You can obviously tell by the voice that this is not Brian Wilson who typically does the intro.
Brian hopefully is fast asleep at the time of the recording.
He should be in bed by now. I think it's about midnight or maybe even after midnight probably.
But I am not here on my own i actually have a
another great guest with me ash patel uh ash uh thank you so much for being on the show um i know
i stumbled across your podcast um a couple weeks ago actually it was interesting one of my friends
he pointed it out and then i listened and i checked it out and
said hey this would actually be a cool guest on my podcast now i stopped talking and you can start
talking because first of all thanks for being here again can you quickly introduce yourself
who you are what you do and what is this podcast that i've been listening to
first of all andy it's a pleasure to be on your show. And unfortunately, I couldn't meet Brian, but one day I hope to do that in real life.
So I'm Ash Patel, like you mentioned, and I have a podcast as well.
It's focused more so on advocating reliability practices.
And we recently changed it from just calling it the SRE Path Podcast.
So we're at srepath.com to the Reliability Enables podcast
because that's what my aim is,
to help people who are trying to enable greater reliability,
particularly in the software side of things,
to have an impact within their organization.
That is an area that I feel is still quite lacking
in a lot of situations.
Big Tech has done a very good job at it because they have
robust practices, robust systems based around
having engineers work on reliability. But I think more
importantly, other organizations that serve us
with critical industries, they need to get better at. And that's my aim
to share ideas so they can join the fun as well.
Ash, thank you for the introduction.
I'm just reading off of your LinkedIn page.
Folks, by the way, every link that I mentioned, whether it's LinkedIn
and also some of the podcast episodes that, Ash, you sent over to me,
kind of that are touching also on observability, which is a big topic for us,
you find all the links in the description of the podcast. But I want to just quickly read
your two sentences because I really like this. Once upon a time, I ran operations at a healthcare
business. Right now, I'm having an extended eat, pray, love moment and focusing on advocating
greater reliability in software because I don't want software to go down every single day in my next gig.
And I thought this was a really, you know, great thing to say,
because just as Brian and I and many of the listeners that we have,
you know, we've been trying for years to really make sure
that these critical systems don't go down every time, right?
And we are trying to do what we can, depending on which role we are in.
Brian and I have a big background in performance engineering and performance testing.
So we typically brought systems to a critical state
and then trying to figure out where they are breaking
and then give this feedback back to the engineers.
I think we've seen a shift in where you really think about reliability engineering, like straight or
resilience engineering, where you, from the beginning, try to make sure that you're making the
right decisions to keep systems reliable. In your
world, you said you went from the
SRE podcast to the reliability podcast from a
terminology perspective. what has changed for
you since you have been in your kind of like eat pray love phase of your life and what has
changed over the last months and years well it's been about 18 months and i can definitely say
it's been less stressful working with a few interesting organizations here and there to help them understand things a little better.
And I'm working on some interesting projects here in Australia right now.
And that's why Brian couldn't join us.
It's 2 a.m. in the Eastern time zone.
He's in the Eastern time zone, right?
He's in Denver, so he's mountain time.
Oh, okay.
But it's still, it's at least 1 a.m.
Yeah, yeah, yeah.
Yeah, it's got to be midnight or 1 a.m.
Yeah, something like that.
So yeah, it just came about from the fact that you mentioned that Eat, Pray, Love moment
just came about from the fact that when I was running operations, I had a whole bunch
of other things in my technology portfolio other than just dealing with our software.
So to have to deal with it every day,
people complaining, end users complaining directly to me every day.
We had a fairly flat structure in our organization.
So even a cashier or someone who was dealing with billing
could reach out to me and say,
we're having this problem.
And it was becoming annoying.
And I'm noticing that our organization was not the only one experiencing this.
So many different places that I've been talking with,
they're experiencing this right now.
From the software that they're using, whether it's external, it's internal,
it's just something that we need to resolve.
Let me help you understand.
You said people just came
to you what type of role did you had that everybody knew i gotta go to ash in case something doesn't
feel right director of operations so essentially yeah it wasn't just technology oriented as well
there's also a people side of things as well so it it's a smaller organization. So we wear multiple hats.
It's not just saying director of, you know, I started off as a sysadmin.
I always thought I would just be a technical dude.
And that's not how it pans out in the real world.
You know, sometimes you have the role molds you into shape rather than the other way around. So there was a lot of people side of the operations
in that organization, working with clinicians,
bringing them up to speed.
It was a very multifaceted role.
I've learned a lot in the 15 years I spent there.
15 years, that's a long tenure for working for a company.
I mean, you've done the same, right?
I know, I know.
It's been 16 and a half.
I know I love it, right?
I love my job and I love the company where I am.
So that's right.
But it's still, I think it's still rare, right?
Compared to looking at other folks in our industry,
there's typically more change.
So that's why having 15 years in one organization
is quite an achievement.
I got a question for you.
So looking back into, you mentioned cashiers came to you or different people came to you.
Can you give us an overview from a software perspective?
What were kind of the most common reasons why systems didn't behave as expected?
Why they crashed?
Why they were slow?
Why they were simply not available, resilient.
What are the top reasons?
So some of the reasons I can outline were that there was a disconnect
between what the engineers were developing in terms of our internal software
and the capabilities of our systems to handle it.
So we were doing a combination of on-prem as well as cloud.
Once we shifted to cloud,
that's actually when the problems worsened, funnily enough.
Yes.
Because the expertise that we had at that time,
and we're not talking a very long time ago,
we're talking only six, seven years ago,
was very little in terms of cloud computing.
When we're trying to bring in people into our space
with cloud computing expertise,
but they didn't have the domain knowledge.
That was a challenge we were having.
The people with the domain knowledge
didn't have cloud computing expertise.
So what they would do is they'll make software
with the mindset of,
I'm making things for this kind of rack.
It's going to go into this kind of rack,
and I'm trying to get them into the mindset.
You're actually working with VMs.
The ops guys are doing this.
You need to be, we had a kind of a you build it, you run it type model.
The biggest problem was delineating those shared and owned responsibilities.
Even though I'd make it clear,
somehow we would all get lost in translation.
And it got to the point where I had to codify what everybody was responsible for, to what
extent and how to do it.
And I think that's what a lot of organizations, a lot of teams need to be able to not be stepping
on each other's toes and saying, well, that's not my job.
That should be that person's job.
So we were having a lot of issues with the VMs
just not being able to handle the load
because they were not properly configured
to the requirements of our workloads.
The code was very inefficient,
which I guess is a problem in a lot of environments
and in general we did not do
performance testing at all
so folks if you listen to this
not only Brian but also folks like Mark Tomlinson
who has actually inspired us to do this podcast
about 80 years ago he's big in performance, and I met him through performance engineering work.
From my perspective, I've been in performance engineering for so long,
it's sad to hear that these things still happen, that people are not doing performance testing.
And I know it's not always easy, it's not always top on your mind,
because you also have a pressure to get your features out.
But I think that's also
what you're trying to advocate for now, right?
Because I hope so, at least.
Ashen, please say yes
if you're working with organizations
that performance engineering
should be top of mind.
That is interesting.
So you said VMs don't handle the load,
code is inefficient.
As you moved from on-premise to the cloud, I assume there was also parts running already in
the cloud communicating back to on-premise. I think that's, I would assume, a common architectural.
Did you also experience any latency issues that were related with the fact that all of a sudden
applications or services had to communicate
from one environment into the cloud and back, right?
And obviously latency is an issue, throughput, cost, I would assume.
Was that also an issue for you?
Yes, there was a big issue because we had a dynamic CRM, a bit of a lower end version of a SAP system.
We weren't going to spend tens of millions on a system.
So we spent actually not much less.
So these were boxes sitting on site
and they would communicate with our cloud-based systems.
And yes, there was latency.
There were latency issues.
And sometimes the latency would get bad
to the point where end users would complain
as to why am I watching this thing just going,
you know, that loading icon?
Why am I just watching this thing go around
and around in circles for five, 10 minutes?
It wasn't just us.
I can't fully put it down to the internal people as well.
It was a lot of vendor issues as well.
So I was doing vendor management as well.
So we'd be dealing with external vendors
who were not providing adequate service
and they couldn't give answers.
Our suppliers as well.
In the healthcare space,
you deal with a lot of suppliers
who also were going through the same issues and change
because they were just used to people sometimes calling in,
faxing in things.
This will just blow minds of people who are in software engineering.
But to this day, in certain industries, people still fax orders in.
Yeah, it's crazy.
It is crazy. It is crazy. So 15 years of experience in an organization,
being responsible for operations, having to deal with the people that complain, obviously in that
kind of cloud migration. I think what you said nicely earlier, you were bringing people in that
may have some cloud experience, but they had no domain experience and the other way around.
What were some of the things that you then, how did you solve this?
So what measures did you take to ease in the problem?
Did you start to re-architect for the cloud?
Did you just try to really optimize the systems for the workloads?
What did you do to mitigate some of these issues?
So the first step was to actually look at re-architecture.
But of course, as you may know, that's not a simple step
with all the spaghetti mess of whatever you already have.
I'm not talking months.
It might be a multi-year project.
And for us, that's what it appeared to be.
So we decided to educate first.
And I think that's probably why I'm doing this now and trying to educate the engineers as to how they can better understand
what our needs are, become more intimate with what the users are expecting.
So that meant actually getting them for the first time in their
possibly their lives to actually talk with
end users.
Actually
communicate with them and not leave it to
we didn't have PMs
because we were doing
internal services.
So it was a lot of
requirements being built
by analysts
and being sent to developers.
It sounds very old school when I say it now, doesn't it?
No, but for me, the interesting thing,
and this is not the first time that I hear this,
what you're explaining, talking with the end user,
understanding what the real problem of the end user is,
that they want software to solve for them,
is something that shouldn't need
a cloud transformation project where all of a sudden you see things don't go bad so i mean this
is it should be basic common you know approach to software engineering but i do know that this is
typically um in many cases not the case that people are actually having the engineers sit down
with your end users,
like really sit down if it's possible,
and then watch them, how they are dealing with the day-to-day tasks.
Because I think that then provides also some empathy, right?
I mean, you understand where they're struggling and why they're struggling.
And I don't know the healthcare business as good, obviously, as you do.
But if you then all of a sudden see what impact bad software has, not only on the end user, but probably also, I don't know, a patient or somebody that they are interacting with.
And if you feel this the first time as an engineer, I think you have a much better appreciation for really building better quality software.
Well, I can tell you a story that I would then tell some of these engineers just so it really sunk in,
you know, it really hit it home with them.
Well, I hoped it hit home with them
because it was really an emotional situation for me
to experience that,
that I was actually there on site at one of the sites.
These are primary health services
for people with chronic conditions
who the government contracts us to provide them so the general practitioners physicians would say
all right this person can have their health care optimized we're having some issues can your
clinicians look into it and actually provide advice as to how to improve their health condition
so it's quite an advanced practice and obviously we need to have good technology working to make
all of that work effectively there was a situation on site where a patient and these are people who
may have taken a long bus ride to get to where you are or they may have had other things going on
in their day people might think they're not busy but people have things going on in their lives and
they got really upset because the software was not loading the software just kept crashing out
and going slow when it was loading to the point where this patient just walked out
and just said, forget about this.
We'll look at it some other time.
And this is someone with a chronic health condition
that needs our help, that needed our help.
And we weren't able to provide them that service
because of software not working effectively.
And we had the features there, but the performance wasn't there.
The uptime wasn't there.
If you look at things on a graph, everything might look okay.
But then when you're looking at that individual moment in time when that software was not working effectively that's when you
really feel it if you're there yeah yeah it's a very powerful story as sad as it
is hopefully as you said it impacted the next line of code that the developers were writing or you know reconsidering
the importance of performance of good architecture of good best practices and how to build resilient
systems right that we we have a big responsibility in the world with with the software that we are
they were creating i remember i had the luxury situation to spend some time with Kelsey Hightower, a big name in the Kubernetes industry.
And he told also a story where he was working for an organization that was in the US.
When you don't have that high income, you get like food, not food stamps, but the digital version of food stamps.
And you could basically go to the cashier and you try to go to the supermarket already in a situation where, right, obviously not in a good situation in life if you're depending on this.
And then you swipe the card.
And there were moments when the system was down.
And so people, right, that are trying to get food for their families cannot pay. And then you have a line of people behind you and they already see you are already on
these food stamps and it's not nice.
And this is why we need to make sure we need to do whatever it takes to at least the stuff
that we built and that we are responsible for works as expected.
Yeah, we got to remember we're educated professionals, so we don't possibly consider this, but it's a case of
dignity.
That situation, that food stamp situation, their dignity was compromised.
Yeah, because other people could see that even the food stamp is not working for them.
We don't want that to be the case.
Yeah, exactly.
So, Ash, this means your experience in your previous job
helped you to understand it's important to educate.
That made you create the podcast.
How long have you been doing this podcast now, you said?
18 months months a little
longer yeah just over just under 18 months i'd say i've been writing on reliability topics since
2021 so a bit longer yeah yeah yeah so we should definitely make sure to get all these links out
so folks can follow up with with your stories because education is just
is key in our industry and in every industry can you if you think back about the last 18 months
where there's some episodes some guests that you had where you said hey this is actually
this was a really cool and interesting moment an interesting story anything any episodes that you
would like to highlight also for our listeners maybe to go back in and say, hey, you know, if you are in performance
engineering and reliability, then here are a handful of things
that you enjoyed in the discussions with your guests.
Well, this year I've been focusing a lot on observability
because there are a lot of problems in the space to solve and
ways we can
improve on them so i think that's those are the episodes that are probably the most memorable for
me there are a whole bunch of other people that i spoke with there was someone who spoke about
chaos engineering on the demova he worked at bmo which is a very large bank in canada
and he handled their chaos engineering resilience engineering that was a very large bank in Canada. And he handled their chaos engineering, resilience engineering.
That was a very interesting topic. But I think observability still is top
of mind. And I think for me, to me in particular,
it is a foundation capability of reliability work
of any operational work.
I think you've called it XOps at one point.
Observability seems to be foundational
to a lot of XOps.
It can be things like AIOps,
SRE work, MLOps.
Everything needs to have that data in place.
I guess it's fitting that this podcast
is related to Dynatrace.
So I think we should focus on some of those topics
because that was an interesting area
and I've built a more rounded perspective on that particular area
than anything else yet.
Cool.
Yeah, obviously, you know, observability is something that Brian and I,
we've been living and breathing over the
last couple of years um and it's great that you're focusing on this topic before i have a question in
mind but first i want to go back to your previous kind of life when you were head of operations
did you have proper observability then
no no but yeah you must have had some some type of observability
right because we did we did but when you say proper observability i'm thinking about the entire
work stream that i've built out and i'm sharing with people i'm thinking whoa we didn't have that
we didn't have that we had quite a bit in place we knew when systems were down, when to respond to issues, what kind of, you know, we had good coverage, fairly good coverage of the four golden signals, except for maybe latency.
Like I mentioned, that performance testing was not to the level that I would hope for.
But it was a massive transition.
You're going from people working with on-prem to hybrid to then everything is all on
the cloud it's a big shift all in the span of a couple of years yeah yeah it sounds straightforward
but it's not yeah yeah so um yeah that is definitely that was definitely the case that
we didn't have effective observability so i do want to help whoever wants to have effective observability
to do it right.
And I've put out a reliability blueprint on the SRE part site.
Observability is one of the work streams there,
and you can see all the different facets that you need to be mastering
to be effective at observability.
So be sure to check that out.
Yeah, and as you said i will definitely
link to this every time when i just want to remind people right folks if you listen to this you may
be in a car in a bus and uh on a plane wherever you're listening make sure that you check out the
details with all the links yeah sorry i didn't want to interrupt you. Go on. Oh, no problem. No problem at all. So I was going to say that because I've learned a lot from the people that I've spoken with on the podcast,
and there's a reason I do the podcast as well, it's a lot from my learning as well.
If I were just to passively listen, I probably wouldn't learn as much.
I'm one of those people who needs to actively communicate with people,
and then that's the most effective way for me to learn when someone's telling me something.
So there are a few people that I would like to highlight and specifically what they've done if we have time for that.
Yeah, definitely. Please go ahead.
I just want to highlight the same thing for me and Brian.
We love these podcasts because we learn so much from our guests because they all bring in their perspective
on various topics and it's it's the best um the best educational piece for us right just learning
yeah so please go ahead who are the people that crossed your podcast as a guest and that we should
know about so there are three people who just really put it all together really well for me.
And it's at different stages of the observability lifecycle.
So the first person is David Cottle, who is an engineering manager at Capital One, a very large bank in the US.
And he's very frank about his views on observability.
He did a talk recently at Monitorama, and you should see the slide he put up.
I'll see if I can find it.
We can put that in the links as well.
So essentially his idea is that
there are a lot of delusions
around what observability can do
and what people think their problems are.
So to him,
a lot of people have this delusion
that they have some kind of scale problem
that they think that they're at that level
where they need to have
the highest end tooling
that they need to do all kinds of tricky things.
And the one thing that I learned from that is that you need to have proper alignment between what
your problem actually is and the solution. A lot of people oversell
their problem internally and they end up going
for shiny objects or failing in their projects because
there's a disconnect between
the problem and the solution too fancy solution for too basic a problem is just going to make
life hard for everybody i'm sure you've seen that yourself yeah that's an interesting one so
basically would that would that also be um interpreted like like we have a problem,
we don't know exactly what it is, but in order to solve it,
we just go the quote-unquote easy route by saying,
we're buying this tool, and then this tool will make it go away.
Kind of like not hiding the facts, but kind of avoiding
to actually build and architect a proper solution.
And just like saying, hey, you know, we can,
all these problems will go away if we will buy shiny tool ABC,
whatever that is, instead of actually fixing the real problem,
which might be completely unrelated to that, right?
Because observability may not get it there.
I think I don't want to talk down the need of observability, right? I think we both know that right because observability may not get it there i think i don't want to talk down the need of observability right i think we both know
that we need observability but i yeah that's interesting yeah um kind of like you know we
are at a stage where what i also often see is that people go to conferences i'm i'm speaking
at a lot of conferences right and we all get inspired by these speakers i think most speakers
and that includes me as well but we always tell a very nice story, right? And we all get inspired by these speakers. I think most speakers,
and that includes me as well, we always tell a very nice story. Sometimes we overtell,
we make it even look nicer. And this tool can solve all of our problems. But we sometimes
hide the fact that a lot of other things actually were necessary as well, and that these tools
alone didn't solve it. But then if people go to conferences,
listen to podcasts like ours,
and then they say,
hey, Ash or Andy,
they talked about doing this and this,
and now we need to do this as well,
and then all our problems will go away,
so please give me the funding.
I understand why people may do this,
but obviously it will not solve
maybe the real problem that people have here.
Well, there was one thing that David said to me,
and I want to summarize it because I think
I've seen this even with the engineers I worked with.
They get excited by the really cool problems
or the interesting problems, the ones that require...
Let's go extreme.
The ones that look like they might require quantum computing
or something like that, right developing your own ai well i guess that's that's not as
cool now because everyone's doing ai but you know years ago they wanted to almost do an ai
which was like okay that's more than what we need let's try and actually just solve the actual
and i know it's boring the actual underlying problem that we're just having
we can fix this in the next two or three sprints so how about we do that yeah and then everyone
just kind of looks dejected but we have to accept that reality yeah that we need to be focusing on
solving specific problems rather than going for the shiny objects
or going for these chasing windmills, essentially.
Are we just...
Actually, that's not the term, David.
Who's chasing ghosts in the system?
Could be.
You know what?
I'm not a native speaker,
so I'm pretty sure there's certain proverbs
that are out there that I don't know
how they are called in English.
But it sounds interesting.
The sad thing is I am a native English speaker.
I get stuck with idioms.
I'm like, yeah, I think that's the one.
I think a lot of people do.
And we just kind of, we just wing it, you know, we just try and it's like, yeah, you get it.
Right.
Yeah.
Yeah, I got it.
Solving the boring problems and don't always chase the shiny objects.
I think that's an interesting one.
Solve the, I mean, boring, I understand from a challenging perspective,
it might be boring, but impactful, right?
Solve the impactful problems and don't just chase the shiny objects.
And it's an interesting trade-off as well, because obviously you want to keep your engineers happy,
so you want to give them also something that is exciting and new,
but you cannot just do this for the sake of not focusing
and solving the problems that actually advance the business,
because in the end, the business is the one that is paying the money
to actually have all these people employed.
Yeah, that was the issue that I was having,
because I was actually also responsible
partially for a balance sheet.
So I'm thinking, let's not waste money here, guys and gals.
So yeah.
Cool.
So David, that's an interesting, I need to follow up.
I think there's a link.
You sent me a link to his podcast.
So folks, if you want to hear from David,
Capital One, very well-known financial entity in Canada,
or in the US, at least in North America. In the US, yeah.
Yeah, yeah, yeah.
I remember I met a couple of Capital One engineers.
It was pre-COVID when they were talking about
how they're doing continuous delivery.
You said it's another big topic of mine.
Now, Capital One has been doing some great stuff.
I would say
they did a lot of things early on
with the cloud and with new technology,
even working
in a highly regulated
industry in finance.
That's really cool.
Yeah, absolutely.
They do some amazing things.
We're not wanting to sound like an ad for them
but yeah yeah what's interesting people working there yeah what's that what's what's in your
wallet wasn't that that's their slogan right if you look at their ads um i think that's their
slogan what's in your wallet it's a capital one i'm not sure i'm one of those guys used to skip
the ads you know record it or just
watching netflix or just have ad blockers so i haven't seen their ads in a long time yeah
cool so david was the first one i think you had a couple more that you wanted to talk about
that's right that's right so the second person that i thought of and I spoke with him very recently, about a month or two ago, Tim Mahoney.
And he works as part of the enabling team,
observability enablement team at IKEA.
So in North America, or in the English-speaking world,
we'd say IKEA.
Yeah, so IKEA.
So he brought up some very interesting points. The first one, and I've rarely
heard people even talk about this, even engineering managers, maybe I'm not listening well enough,
I don't know. But he mentioned the concept of actually having an effective engineering baseline.
What is expected of all the teams and engineers?
How often do we talk about engineering baselines?
And it's a concept I've heard before, but I haven't heard people
say it to me often enough.
I think it's important to bring that into mind,
that these are the requirements for you to be an effective engineer.
So that means that's interesting
because when you said engineering baseline,
the first thing that for me came to mind
was how do we measure the productivity
or the effectiveness of engineers?
But I guess as you kept talking,
it's more like what are the skills and um what are the skills and the kind of like what are
engineering practices what should be the baseline for engineering practices right not necessarily
maybe the output maybe the output is something that comes with it if you are following these
guidelines um but baseline for me immediately triggered uh engineering productivity because
that's a big topic that i hear a lot right now when we talk about platform engineering. That's another
topic that I speak about where it's all about how can
we make developers' lives easier, how we can make them more efficient
by reducing things that are not allowing
them to contribute to what they should do, like building
cool new shiny objects maybe.
But sorry, but go ahead.
It's just like whenever I hear something,
sometimes I just need to say something.
And sometimes I...
Oh, no problem, no problem.
I was actually fascinated listening to that
because I think yours is, I would say,
a more precise technical approach
to looking at engineering baselines.
But I was talking with Tim about how this would pan out
and I was interpreting it in a more management perspective
because that's where I have been playing for the last,
for most of my career.
So for me, just to say, okay, we're just measuring things.
Yes, okay, we're measuring things,
but it's our job as management not to just say, we're measuring you on this, we're just measuring things. Yes, okay, we're measuring things, but it's our job as managers
not to just say we're measuring you on this.
We're measuring your productivity.
How are we going to make sure
that you do reach that number,
that number that we're expecting from you?
How are we going to make it happen?
And for me, that's when I,
when we were talking about the baselines,
that's what it came across as.
And maybe have a listen to that episode and tell me if I'm wrong,
but that's what it sounded like.
There are a few things he mentioned, things like that.
If I remember correctly, he was saying that observability is not just a checkbox.
It's not just, yep, we're done.
We've done this, this, and this.
We've installed Dynatrace.
We've installed this instrumentation.
We've done open telemet, and this. We've installed Dynatrace. We've installed this instrumentation. We've done open telemetry.
We are now done.
So there's that whole fallacy of the maturity model that we wanted to talk past,
that this is always a continuously moving object.
The observability enablement team isn't just a project team.
They're not just doing this, and then they're going to be done in two years' time.
They're constantly going to be updating these baselines.
I guess the best way to explain
the baseline from how I interpreted it
would be the numbers that you're seeing
like Dora metrics
are the lagging indicators
in terms of KPIs.
But the leading indicators
are going to be things like
how many of our teams
are actually implementing all the things we need to do
in observability. Are they doing this? Are they doing this? Are they actually doing
the right tooling? Are they having this process that we brought in last week?
How many teams are actually doing that? That is going to directly contribute
to how we see our lagging indicators,
the door metrics.
That's how I would say it.
I like that.
This is, I think, the first time I heard Dora metrics
in context of that they are lagging.
You're completely right.
Because if you're doing things right up front,
they will benefit the Dora metrics.
You cannot just expect the Dametrics to magically get better without
investing upfront. This is like, if we come back to maybe a sports analogy, right? We are expecting
a sprinter to make the 100 meters in a certain time. And obviously, once they make it, that's
great. But how do you get there? It's through training, training, training, and giving them right advice, and maybe different styles of running or whatever it is,
right? And I think that's the same thing here. So the leading indicators are how we are enabling
our engineers to get their job done, how we make sure that they're following the right practices,
that they're educated the right way, that they know how to use all these tools, that they know how to do certain
things, and this will then
in the end, if everybody's
doing the right thing, will impact
metrics like DORA,
which I also, just funny
enough, on the DORA side, I think we both know
what DORA means, for us at least, right? It's a
DevOps metric, but DORA
in Europe right now is the Digital Operations
Resiliency Act so
folks don't confuse it same acronym yeah yeah I wish they actually looked up what
DORA means before they created that act in the European Union hey well we'll
figure it out exactly cool so we had David yeah yeah yeah yeah so actually
just to double down on that sports analogy, I really liked it.
So imagine Usain Bolt at 10 years old or whatever age he started at being told by his coach,
okay, run hard. I'm going to measure you.
And every time you run, I expect you to be doing even better than what you did previously.
Just that, just that piece of advice. I'm measuring you.
Do your magic because you've got
the inbuilt talent
you've had the training
to run
you're a natural runner
I'm going to make sure
you run a 10 second
100 meters
nothing else
I don't know how that would work
yeah
that's right
I mean that's why
I think it's a great...
Thanks for that.
Thanks to Autometrics for the lagging indicator.
And we need to invest upfront in the baseline of our engineers,
which means we need to help them do the right things.
And we need to train them.
We need to give them feedback.
We need to have teams like Teams, Teams, Teams, right?
That they are an observability enabling team,
which is not a project-based thing,
but it's like continuously mentoring, continuously working with engineers to leverage, let's say,
the power of observability to get better in their job and find problems earlier, fix them faster.
And this translates into better Dora metrics, yeah? Cool. Exactly. There was one other thing Tim said just before we move on to the next person
next person is very intriguing as well and how he talks about observability he actually talks
about an area that we don't really think about too often um so we're going to get to richard's
ideas in a second but tim from ek i mentioned the dunning-kruger effect and i didn't hear that
at all before he spoke about it i I'm sure engineers might know about it
I hadn't heard it before but essentially it's people overestimating
their ability to do something that they think that they're better at something than they
actually are. Unfortunately I've fallen for that previously
where I'm thinking yeah I'm good at this, confidence is good but
then you don't actually do as well.
I think we need to be a bit more humble about
how good are we at observability?
Teams, especially product teams that may be given that responsibility
might not actually be good at observability
and they just need to be okay with it.
And then they can follow the engineering baseline
and follow the guidance, follow the framework to actually get better and better just like how they did at coding
simple as that yeah i um the only thing that i can say to this maybe this is also why i think
agile planning you are you're playing the planning poker where it's not an individual,
but the whole team basically then needs to put in the number and say, you know, how long
do we need for this?
And hopefully this, unless the team overall is very, you know, is just overestimating
themselves, every member.
But ideally, the more people you have in the team you get
a more accurate estimate but yeah i can see this and i think we all fallen into this right we
always thought that we can we're much better until yeah sure kubernetes i can easily do this until
you actually start with it and then you say oh yeah just watching these youtube videos didn't
didn't didn't solve it. Yeah.
What happened to that cluster?
Where did it go?
Exactly.
It took another break.
Oh, no, it broke.
Okay, yeah.
All right.
So Richard, huh?
Yeah.
Richard Benwell.
Yeah.
So Richard Benwell was a person who spoke about observability from a different angle so he
is has been in the monitoring space for 20 plus years um so he's the ceo of squared up now they
are a vendor but we try to keep that conversation very much focused on the problem rather than talk
about anything they do i'm not too fussed about the ins and outs of what each piece of software
does how is it going to solve the problem?
We have to understand the problem first
before we even start looking at solutions.
And there's a whole bunch of solutions
you can pick for this.
His area is boosting your observability
that is usability.
Right?
So just scanning through my notes,
I found that his insights were quite specifically
on the fact that it's not all rainbows and unicorns
once you get the data.
There's a lot of focus and observability
on the technical problems of collecting
and storing the data.
But then the usability aspect
kind of falls in the too hard basket
because, well, really, we've done our job, right? It's up to the people who are going to look at the data now aspect kind of falls in the too hard basket because well really
we've done our job right
it's up to the people who are going to look at the data now
to figure it out but we've got to remember
that humans are still the people who are solving
problems and
if they're going to be the ones solving the problem
they need to make effective use of that
observability data and
the best way to do it is to make
it easy to make sense of
yeah it's funny that you mentioned Richard because I met him we build data. And the best way to do it is to make it easy to make sense of.
It's funny that you mentioned Richard because I met him, I'm not sure if it was, it must have been kind of like two or three years ago at a conference and he showed me his stuff
that they built with the Square. I think they're doing a lot of cool dashboarding if I remember
right. That's what they do. So kind of really making the data tell a story.
And then, as you said.
But I'm taking also a lot of notes here.
The usability aspect of observability, that is a really, really interesting way to phrase it.
Because we can capture all the data in the world.
But if nobody's looking at it or if nobody knows what this data actually tells them and i mean we've been trying to solve this problem
with different approaches right automatically detecting patterns automatically highlighting
important data but yet in the end right the question is who is going to look at this data
because in every organization you have different people that need to look into this observability data with different
backgrounds, different experiences. And therefore, you need to provide something that is flexible
enough to adjust to the individual requirements of every organization and use case.
Exactly. Funny thing you mentioned was that he did some really cool dashboards.
The one thing that i
learned from him was the key is to make all that data you're visualizing actually meaningful if
you're just putting up pretty graphs and dashboards you're not helping anybody yeah but what you
really should be asking yourself and this is something that i'm saying that i've learned
from all the conversations that i've had with people in that space, in the visualization spaces,
will this dashboard actually tell me
where the issues are at a glance
when I'm not even thinking about looking at the screen?
People think about creating these dashboards
when they've got nothing else going on.
They're focusing on making that dashboard.
But you've got other things on your mind.
You're thinking about what's happening at home.
You're thinking about the incident
that you're looking at right now.
You're thinking about your boss's messages.
You're thinking about your colleagues' messages.
You're looking at emails.
You're trying to figure, patch everything together.
So you need something that's effective
at giving you an effective picture at a glance.
That's the key takeaway I took away from that.
Yeah, and to add to this, somebody builds a dashboard because they have an understanding of the system.
But the problem is if they're the only ones that understand what they put on the dashboard,
what if they're sick, what if they move on, somebody else needs to take it over, dashboards
need to be as intuitive as any mobile app that I install and I need to know how to use
this, as you said.
And what we have been doing is also a little bit of measuring actually which dashboards
are even used.
Because in the end, if you're building dashboards over dashboards are even used because you know in the end if you're
building dashboards over dashboard over dashboard and you maintain them but never nobody ever uses
them right because there might be people that building dashboards but then there's others that
consume them and so simple things like you know let's measure which dashboards are used which
dashboards are not used you can also think about like a thumbs up, thumbs down. Is this dashboard useful? Does it tell you anything? Maybe you just rotate like on a sprint basis, you know,
pick the top, pick a couple of dashboards and show it to people and ask them, is this any
meaningful for you? What does this dashboard tell you? And then with this also learn which
dashboards are actually not effective because nobody understands what's on there
yeah one tip i would add and i always used to train my staff on this was don't ask closed
questions don't ask a yes or no type question always ask for input input so is this dashboard
useful for you people will be like yes people will say yes that's a problem we used to have yeah yes it's
useful and then later on like hey i can't figure out how to use this dashboard used to happen far
too often yeah so you want to ask a question like what does this dashboard tell you. Yeah. It's almost like an exam.
Yeah.
Scary as that might sound to put people
in that situation.
Keep it casual.
Hey,
what do you think
this dashboard
is showing you right now?
Can you just give me
an interpretation?
I'm trying to figure out
if people are able
to understand it well.
Hey, Ash,
thank you so much
for three episodes.
David,
Tim,
Richard. The links are in the
description of the podcast uh it's really great that i mean i also love podcasting so so do you
and so you know keep going keep doing this um we hope that some of our listeners will you know
tune into some of your episodes because it clearly sounds like some great content.
I also know you told me earlier that thanks to the podcast now,
people are being made aware of what you do and who you are and that you're passionate about this.
And you actually actively be brought in and contacted by organizations that need some help on this. And just like want to confirm that.
So in case people are, you know,
having a need of an expert,
um,
yes.
Just want to make sure they can reach out to you.
Yeah.
I'm enjoying the beautiful weather in Australia for,
I think it might be a while.
So if any,
yeah,
it'll take a,
probably a bigger issue for me to want to go anywhere else at this point.
But yeah, always happy to have a conversation with people who just need a bit of guidance
as to what's happening in their reliability journey.
And thank you, Andy, for having me on and love what you do here.
Thank you so much.
And Brian, sorry that you could not take part of this conversation today.
Hopefully you had a chance to listen to it, though.
And yeah, I will make sure.
Well, this should probably air, actually, in about two weeks.
We always record where we have a little bit of backlog, but I think in two weeks this will air.
So that means at the early July, people will be able to listen to this.
And hopefully then you will see some traction on your podcast and
also hopefully some people will reach out. And while
you are in Australia, the world is a small place.
The world is connected. So folks,
don't shy away even if you are
somewhere completely
remote from Australia.
Pingash.
Great connection to have.
Exactly. Exactly. great connection to have exactly
exactly
yeah any final words
Ash before we close this episode
no
I think that's it that's all
I think Andy yeah I appreciate it
I also am a
salsa dancer by the way
really yeah
on one on two Cuban what do you do on one on two Cuban
what do you do
Cuban
Cuban
nice
yeah
it's rare
it's quite rare
but
it's the thing
that gets me going
yeah yeah
cool
how long have you been dancing
10 years
10 years
nice
yeah
yeah
it's awesome
yeah you met your wife
I did
yeah
in Boston on the dance floor that's amazing yeah exactly yeah
yeah she's colombian so she obviously knows her stuff uh but yeah it seems i made an impression
when i met her and she uh she definitely then uh after the second time dancing, she accepted my request to ask her out.
And yeah, seven years later, seven years married.
That's worked out pretty well.
Yeah, that's amazing.
Like, has she tried to get you to learn the Colombian way?
It's a little bit too fast for me, to be honest with you.
Yes. bit too fast for me uh to be honest with you yes and also she uh she learned it uh you know she she learned a different type of salsa than we do when we go to dance clubs here um but um
now we enjoy it very much oh well so what what style do you do uh i i also started cuban in the
very beginning uh then did some rueda but. But then I switched over to On1.
I just like a lot.
I mean, I dance, whatever.
I don't care in the end, but On1 is my favorite.
Yeah, I think I'm going to actually take lessons for On1
just to be more versatile rather than just sitting out
when all the followers only know on one you know yeah well yeah definitely
i will be i'll be calling you next time when i am in australia i remember my first my first
couple of trips i always went uh dancing in sydney but obviously things have changed uh i think with
the pandemic which clubs are are still out there and which ones are new so yeah, it's good to know
good to know to have a salsero
around the corner when I make it down
to Australia
amigo de salsa
amigo de salsero
¿tú hablas español también?
sí, sí, yo hablo español pero
algunas veces está muy difícil porque
nadie habla
español aquí en Australia.
Sí, yo entiendo.
Sí.
Y tú hablas, ¿sí?
Sí, un poco, porque mi esposa es colombiana.
Pero solo aprendo con tu lengua.
Y sabes lo que es lo divertido?
Aún estamos grabando y esto todavía va a estar en el show.
Así que veámos quién está escuchando hasta el final. still probably be going to be on the show. So let's see who is listening until the very end.
Folks,
if you listen until the very end,
and if you hear us talk about salsa and a little bit of Spanish talking,
make a comment on LinkedIn or wherever you find this on the podcast.
It would be funny to see who else is dancing salsa out there.
That'd be awesome.
That'd be amazing.
All right.
Hey,
with this
I need to say goodbye
thank you so much
sure
stop the recording
but
until next time
it was an honor
it was a pleasure
having you
for sure
appreciate it Andy
cheers
cheers
bye bye