Screaming in the Cloud - Episode 7: The Exact Opposite of a Job Creator
Episode Date: April 25, 2018Monitoring in the entire technical world is terrible and continues to be a giant, confusing mess. How do you monitor? Are you monitoring things the wrong way? Why not hire a monitoring consul...tant! Today, we’re talking to monitoring consultant Mike Julian, who is the editor of the Monitoring Weekly newsletter and author of O’Reilly’s Practical Monitoring. He is the voice of monitoring. Some of the highlights of the show include: Observability comes from control theory and monitoring is for what we can anticipate Industry’s lack of interest and focus on monitoring When there’s an outage, why doesn’t monitoring catch it?” Unforeseen things. Cost and failure of running tools and systems that are obtuse to monitor Outsource monitoring instead of devoting time, energy, and personnel to it Outsourcing infrastructure means you give up some control; how you monitor and manage systems changes when on the Cloud CloudWatch: Where metrics go to die Distributed and Implemented Tracing: Tracing calls as they move through a system Serverless Functions: Difficulties experienced and techniques to use Warm vs. Cold Start: If a container isn't up and running, it has to set up database connections Monitoring can't fix a bad architecture; it can't fix anything; improve the application architecture Visibility of outages and pain perceived; different services have different availability levels Links: Mike Julian Monitoring Weekly Copy Construct on Twitter Baron Schwartz on Twitter Charity Majors on Twitter Redis Kubernetes Nagios Datadog New Relic Sumo Logic Prometheus Honeycomb Honeycomb Blog CloudWatch Zipkin X-Ray Lambda DynamoDB Pinboard Slack Digital Ocean .
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, cloud economist Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
This week's episode of Screaming in the Cloud is generously sponsored
by DigitalOcean. I would argue that every cloud platform out there biases for different things.
Some bias for having every feature you could possibly want offered as a managed service at
varying degrees of maturity. Others bias for, hey, we heard there's some money to be made in the cloud space. Can you give us some of it?
DigitalOcean biases for neither. To me, they optimize for simplicity. I polled some friends of mine who are avid DigitalOcean supporters about why they're using it for various things,
and they all said more or less the same thing. Other offerings have a bunch of shenanigans,
root access and IP addresses.
DigitalOcean makes it all simple.
In 60 seconds, you have root access to a Linux box with an IP.
That's a direct quote, albeit with profanity about other providers taken out.
DigitalOcean also offers fixed price offerings. You always know what you're going to wind up paying this month,
so you don't wind up having a minor heart issue when the bill comes in.
Their services are also understandable without spending three months going to cloud school.
You don't have to worry about going very deep to understand what you're doing.
It's click button or make an API call and you receive a cloud resource.
They also include very understandable monitoring and alerting.
And lastly, they're not
exactly what I would call small time. Over 150,000 businesses are using them today. So go ahead and
give them a try. Visit do.co slash screaming, and they'll give you a free $100 credit to try it out.
That's do.co slash screaming. Thanks again to DigitalOcean for their support of Screaming in the Cloud.
Welcome to Screaming in the Cloud.
I'm Corey Quinn.
Joining me today is Mike Julian, who's the editor of the Monitoring Weekly newsletter.
He's the author of O'Reilly's Practical Monitoring and is a strong, fierce, independent monitoring consultant.
Welcome to the show, Mike.
Hey, Corey. Thanks for having me.
Oh, always a pleasure to talk with you.
So you've done a lot of things over the past year.
You've been working on a monitoring newsletter, you've written a book,
and you've been telling those of us who bring you into engagements that we're monitoring things wrong.
What brought you to a place of being effectively the voice of monitoring for 2018?
I don't know that I'd go that far. However, I started up Monitoring Weekly, and now everyone
kind of knows me as the guy who runs Monitoring Weekly, I guess. So that was cool. But starting
that was more, there's nothing out there for this. Like DevOps Weekly and SRE Weekly and Cron Weekly.
And, you know, like, why are all these newsletters named Weekly?
That's kind of weird now that I think about it.
So I started this, and it's been good.
I have a lot of followers now.
And then the book launched.
So my book, Practical Monitoring, released last December after two years in the works.
And it was a lot of work. So really,
it's more that all these things have just kind of come together. But I've been working in monitoring forever. It's 2006 is when I really started getting into it. So the past year has been a
culmination of many years of work. And now I'm finally telling people about it. So the past year has been a culmination of many years of work, and now I'm finally telling
people about it. Wonderful. So before we dive too far into that, let's get something out of the way
and irritate at least half of anyone listening to this. Is it monitoring or is it observability?
And whatever you answer, by the way, you're going to give rise to a thousand well actuallys. You know, why not both? There's been some really great discussion between a bunch of people online.
Charity majors come to mind. Copy Construct on Twitter and Baron Schwartz have all had
fantastic takes on this. Observability really comes from control theory. So hit Wikipedia and look it up.
But the idea is to go with Charity's take on it, which I think is probably the most concise.
Monitoring is for the things that we can anticipate, or at least can reasonably anticipate.
If I know that, let's say Redis.
When I'm monitoring Redis, I really care about key evictions.
So I'm just going to set some alerts up on that, and I'm going to have a dashboard already ready to go so I can pay attention to that sort of thing.
But once you start getting really complex infrastructures, especially in microservice architectures, the things that you can anticipate, there's not a lot of them. So at this point, now you have to have an application that is actually observable, a way for you to really dig into the application and the infrastructure to understand actually what's going on with it and to ask questions of it that you can't anticipate.
So observability versus monitoring, it's not an either or to me.
It's not a versus. It's a we
need both. So yeah, let's see if I, did I just upset 100% of people? We'll find out.
Exactly. The only consensus is that you're wrong. Wonderful. I love it when we can bring people
together like that. But let's back up a second. In the abstract, we take a look at the entire
technical world as it stands today,
and I think we can reasonably say that monitoring is terrible in general. As soon as I ask a
question like, how do I monitor? I can't get the rest of the sentence out before I have a pile of
vendors who are jumping all over me trying to sell me something. And it's overwhelming. It's
just not something that I am equipped to or want
to deal with. So I shrug, I give up, and I hire monitoring consultants like you. So why is that
the way that things have evolved? Every other area of technology has more or less gotten better over
the past decade, but it feels like monitoring is still a, confusing mess. You know, I was wondering that myself.
It's really weird to me that we have ops engineers that have been, they'll spend entire days thinking hard and deep about deployment methodologies or config management or Kubernetes. Like, take your pick, and these people will spend just days and weeks thinking about that
one problem. And yet, as an industry, we kind of gloss over monitoring, and we think, oh, God,
it's that thing. I don't want to do that. And we just, like, you know, spend as little time as
possible on it, which is really weird, considering that monitoring, at some point, we're going to get
woken up in the middle of the night. It's just kind of
an inevitability.
Better monitoring can help with
that. So why
not focus more on it?
Why are all these vendors
pushing everything? It's because
as an industry, we've
pushed monitoring out
of our own sphere of influence into someone else's because historically we didn't want to deal with it.
Our tool sucks.
We didn't build better ones.
So we said implicitly, hey, vendors, come build this for us.
And as vendors are wont to do, they ran with it.
So now you end up with hundreds of different tools all doing roughly the same thing.
And now things are just more confusing than when we started.
When I first got to a point of giving up on my old approach of using things like Nagios or whatnot or bolting them together myself and reach out to have someone else start building these things for me, it came from a place of not wanting to be responsible for a game in which there was no way to win. Anytime there's an outage, there's always the question
of, well, why didn't the monitoring catch that? And in the event that a monitoring does catch an
issue before it becomes a big issue, great. You fix it and you feel good, but it certainly doesn't
get the visibility of three hours of downtime and the entire C-suite taking up residence behind you in your
open plan office and you get a stress test.
Yeah. That whole question, that whole
scenario of, hey, we just had an outage, why didn't monitoring
catch it? It's kind of a counterfactual.
Why maybe the absurdity of the question could be called out of why did we have an hour to begin with?
Like why didn't anyone anticipate that unforeseen event?
Like, well, wait a minute.
It was unforeseen because it's unforeseen for a reason.
There are a lot of things in systems and complex stuff that we just can't we don't foresee
computers go sideways in the middle of the night because they just do so yeah monitoring is never
going to be perfect the applications the infrastructure they always change monitoring
always has to evolve with them there's going to be a lot of things that we just can't anticipate
a question that i've always wondered too is
so if I take my understanding of monitoring
and I start rolling things out, okay, so I build a web app
now I, it's 2018, I'm probably
not going to roll Nagios out for a modern architecture
nothing against them, it's just not how we roll anymore
but I'll bolt in Datadog, I'll probably put New Relic in
for application
performance monitoring. I'll bring in together something like Sumo Logic to wind up doing
logging work. I'll have some form of tracing involved. Maybe I pay someone else. Maybe I do
it myself. Maybe I run Prometheus for time series. Maybe I can pass that buck. I'll bring Honeycomb
in to look at high cardinality events. And very quickly, my personal
positioning is that I fixed the horrifying AWS bill, but I just built a monitoring system that
costs more to run than the application it's supposed to care about if I continue down that
path. At what point is this almost a failure of the tools and systems that we're running in that they're this obtuse to monitor.
Yeah, I mean, I agree with you.
If you look at, because monitoring tools are,
they tend to be fairly specialized.
Once you start thinking about all the different specializations
that you need to cover, you end up with, you know,
eight to a dozen different tools
that all do something very specific.
They're really good at that one thing, but now the bills for monitoring are obscene.
That's a ton of stuff.
And all that data going in is not cheap.
That said, it's usually more expensive to run it yourself anyways. So maybe it's a good
idea to outsource it all. And there's the, I just upset another half of your audience. So
now we're at 150% of people upset. Outsource everything, folks. It's great.
Past a certain point, it makes sense. If it's not your core competency why devote time energy
and personnel to it no absolutely um i i've talked to i've talked to companies here in san francisco
that are when i ask them like so how many people do you have building out all these monitoring tools
that you've custom built they're like oh well i've got like this many people i'm like you're
spending 10 million dollars a year on salaries alone to run a monitoring infrastructure.
Why don't you go spend half that
on paying someone else to do it for you?
And sometimes the response is,
huh, I never actually thought of that.
You're the exact opposite of what a job creator is.
Exactly.
So this stuff is expensive to do,
but at the same time, it's, can you really afford not
to? If you're running Twitter for pets, Mr. Corey Quinn, you're probably not doing a whole lot of
revenue. I imagine that makes all of a few cents a year. So it probably doesn't make sense to spend several thousand a year to monitor it.
That said, if you're a company
making several hundred million a year,
then yeah, why don't you spend
a couple million to monitor it?
And just stop paying attention to the bill.
It's fine.
At some level, this becomes risk mitigation.
To that end, as we move into a direction where
everything is cloudier than it used to be uh monitoring seems to have gotten orders of
magnitude worse in that things that historically we could peg to a relatively high degree of
certainty have now become non-deterministic uh latencyency between two given instances now can be all over the map.
It seems that by moving into AWS or Azure or GCP or any of the other large-scale players,
you're getting an awful lot in terms of business capability for that migration,
but monitoring has somehow managed to get even worse than it used to be.
Is that just me being terrible at my job?
I mean, we can't take that off the table.
Very fair.
So yes and no. Things have gotten different. Once companies start moving infrastructure to
Azure and Amazon and GCP and all these other cloudy environments,
the way we have to think about infrastructure is different.
When I'm running on-prem and I have a data center
a couple floors below me,
I can generally expect that latency between any given server
is going to be sub-millisecond.
And that latency is probably going to be pretty static.
If it starts to rise, I probably have a network problem
and I can go fix it right then. Or at least I will
know where it's at. But when you outsource infrastructure to someone else
like, say, Amazon, you give up some control.
In exchange, you get a lot of stuff, so it's probably
net good. But that means that how you
monitor your systems and how you manage them changes. And I think this is what
a lot of companies don't really get. They overlook
it, that moving to Amazon is not a forklift operation, it's a
re-architecture. And what that means is
how I monitor changes from looking at
latency and saying, oh, well, like one millisecond is fine, and I can generally expect it to be one millisecond, and I care about this individual server.
Now I actually care about the service.
The service could be any number of different servers behind it or any number of different resources.
So rather than monitoring at individual resource level, I should start monitoring at the
service level. And that gives me a much better leading indicator of what's going on with the
service I'm providing to customers. And yeah, maybe a core idea underneath that is rising
latency between instances. But how do I know that that latency actually impacted anyone?
So you have to completely change how you think about monitoring
and how you think about management
once you do move to the cloud infrastructure.
Yes, monitoring is more complex than it was,
but we gained a lot in exchange for that.
It's not that it's worse, it's just that it's different.
And that makes a fair bit of sense.
Taking a step back from a service level perspective,
the tools that, and to be fair, I'm going to pick on Amazon here
just because that is the cloudy environment in which I have the most experience.
If Azure or GCP somehow has a revolutionary, amazing counterpoint to this,
please call it out.
But today, for example, I can shove metrics from
instances and services and applications into CloudWatch relatively easily, which is generally,
from my experience, where metrics go to die. The dashboards are not intuitive. It's not easy to get
a terrific viewpoint into things. And while I get the sense that it's incredibly powerful,
I've never yet seen an environment where CloudWatch metrics were set up
in such a way as to give actionable and meaningful insight into the environment.
It's quite possibly one of the worst UIs I've ever seen.
It's gotten so much better. What it looked like
even two years ago is
unusable.
Nowadays, at least, you can actually use it and understand what you're trying to do, but you're right.
It's still a little weird.
Metrics do tend to go there to die.
A lot of people don't realize that there are a ton of stuff there
that they didn't ask to be put there.
It just is. It comes for free.
I mean, free to,
for some value of free.
Yeah, that's the problem with CloudWatch metrics.
They cost, ha ha ha, we'll figure that part out
later. Right.
And then if you want to pull them out
into, say, some other third-party
tool, those
API calls cost money.
So, you know, have fun.
Your bill for, say, Datadog,
depending on how often Datadog's hitting that API,
could actually change.
And a lot of these tools will.
They throttle all the stuff on the back end
so you don't actually see this.
But, you know, if you have a lot of metrics being pulled out,
then you may have to pay more for those API calls
in order to get the data quicker and have it more up-to-date.
Oh yeah, I had a warning come through once from a monitoring provider,
who I won't name, that complained that,
oh, they're throttling API calls, go ahead and request a limit increase to this,
or otherwise we're going to have delayed output for you in our time series database.
Okay, so I made the request, and bless their hearts, AWS support came back and said,
well, we can do that, but it's going to cost you over $30,000 a month if we do. Are you sure?
At which point, yeah, I put up with delayed graphs. That seemed
to be the better answer for most of the problems I was looking at. So something else that Amazon
has been getting into lately is the idea of distributed tracing with the idea of serverless
with container workloads, the idea of tracing calls as they move throughout the system.
There are open source offerings in this place like Zipkin,
but they're behind what they term X-Ray.
Have you had any deep dives into that yet?
You know, I have not run into a single person
that's actually using X-Ray.
I have, but I'm not sure that it counts
because they were Amazon employees.
I'm pretty sure they're the only ones using it,
and I can't imagine that they are doing so willingly
Is that because tracing itself is immature or is that because
X-Ray as a tracing implementation has roads to go yet?
I get the sense that looking at the product
I get the sense that looking at the product
Amazon released that tool, released that service,
mostly to say they had something.
So everyone I know using tracing either,
well, they tend to fall into two camps.
One is they're using tracing
and they've spent dozens of people hours
making this thing work, or not dozens of people hours making this thing work.
Or, you know, not dozens of people hours,
dozens upon dozens and hundreds and thousands of hours making tracing work for them.
And then there's another section of people that they have some sort of tracing tool,
and they say, yeah, we have it, but we don't actually find much value in it.
Which makes me think, like, what exactly is the purpose here?
Because all of the people that are talking about tracing are giving the same demos, and the demos look awesome.
But I haven't really found anyone that's actually using this stuff in production successfully.
Right. For this podcast, and for my newsletter, Last Week in AWS, which is written in the same vein as a lot of the stuff you did for Monitoring Weekly, just a few weeks behind that.
Thanks for the tips on that, by the way.
My pleasure.
A lot of this is done by a series of Lambda functions that I've stitched together.
My God, what have you built?
Oh, it's a Frankenstein architecture.
One of these days, I'll take people
through it in a blog post or a podcast, and everyone can look at my secret shame. But as I
build these Lambda functions, it always asks me, would you like to enable tracing on this function?
And my response is, ha ha ha, no. And then I move on. But the way that these functions are built,
each one does a specific purpose. I have one that validates my links. I have another one that copies all of my content from my pinboard account and shoves it and says, hey, your archive hasn't updated itself in six months. You want to take a look at that? And then, oh, crap, I figure out what the problem is, I jump about the availability of specific Lambda functions
or any type of serverless function in that context?
I haven't found a good answer yet.
No.
The fun thing about monitoring Lambda and really monitoring serverless
is that the answer is seemingly, good luck, sucker.
So sorry to have to say it, but, you know, good luck.
That said, there are some techniques that you can do with that that aren't – they're not great, but they work pretty well. blog where someone from SNCC, which we'll throw in the show notes, had a, they monitor
at minimum, or they log at minimum, the
start of requests and the end of the requests. And it's not tracing, although it really sounds like
that. It is actually just straight logs. And
this would actually apply really well to a Lambda function,
or a series of Lambda functions.
So that way you can say every time a request comes in
and a request leaves,
you could make a call out to another logging system
that says this Lambda was started, this Lambda ended,
and just keep passing those messages along
so you could reconstruct the entire path
of your series of Lambda functions.
For better or worse,
those start, stop, and invocation duration metrics
show up in CloudWatch by default,
CloudWatch logs specifically.
Yeah.
Oh, heavens.
Now I've looped myself,
and here we are back stuck in a conversation.
Yeah.
And it seems to me that a number of different Lambda use cases Yeah. The idea of a warm start versus a cold start, if the container isn't already up and running,
having been triggered recently, it has to set up database connections that might pre-exist. So you wind up with a widely variable start time for invocation of a function,
depending on whether or not it's been invoked recently. So you wind up with a 95th percentile,
in some cases, of function behavior that has wild outliers as a direct result
in either direction. That just seems to be adding another log to the fire of monitoring problems.
Yeah, I can see that. And this is where I come down to one of the things that I was really
adamant in my book, and that I tell all my clients is that
monitoring can't fix a bad architecture. If your application sucks, then adding more monitoring
to it isn't going to suddenly make it better. If you keep having things break and you say,
well, my solution is let's add more monitoring to it, it's still going to keep breaking. You
haven't fixed anything.
Monitoring can't actually fix anything.
In situations like that, instead of looking at more monitoring,
you should be looking at improving the application architecture.
Yes, monitoring may help you,
but it's not going to solve that sort of problem.
So I guess we've sort of been dancing around this a little bit
in some of our previous conversations, but I've noticed that whenever something breaks and I'm seeing behavior that I can't quite explain in an AWS context, the first thing I do is I pull up Twitter and take a look, not for my own crappy application,
because frankly, no one besides me generally tends to care about that. But rather, is this
something that's going on globally? The Amazon status dashboard is invariably going to be saying,
green, everything's fine. I'm gaslighting you.'s your terrible code our platform is terrific but at
three in the morning particularly devops twitter comes to life during one of these moments and
the solidarity is incredibly touching as everyone scrambles to figure out what the heck just
happened is that the best we've got today for global awareness of these platforms that are increasingly becoming, I guess, a monolith?
Yeah.
So that's a tough problem because Amazon is never going to be completely forthright with us on the state of their infrastructure.
They're never going to tell us exactly when S3 goes down or doesn't go down or that, oh, hey, we actually host a status page on S3
and we can't access it.
They're not going to tell us these things
until someone says, hey, maybe they're lying to us.
So once you start relying on all this stuff,
it gets really difficult
because they're not going to tell you all this,
which means you're left looking at your
own monitoring, which may or may not rely on their monitoring, which is another problem,
to understand the state of their infrastructure. So I know that there are some companies out there
that they rely so heavily on Amazon and have such a level of traffic that they can see when Amazon
services start going sideways before Amazon knows,
or at least before Amazon tells anyone. So I guess really what it comes down to is trust but verify
and hope that your system isn't going to be hugely impacted by Amazon doing wonky stuff.
We'd like to hope anyway. The other side of it too and there is a safety in this
when amazon takes an outage which credit where due is becoming far less frequent than it used to be
that's the day the internet is broken it's not your site that gets called out in the front page
of the new york times it's there's been an the internet is just in a terrible state. And suddenly that blame sort of passes over and
lands at the feet of the cloud provider in question. Is that a viable business strategy?
This really comes down to what's the level of risk that you're willing to accept?
Those are one of those things. It's a once in a blue moon that those sorts of events happen.
Are you willing to accept that?
Amazon has their SLA that they provide
to customers. For the most part, they keep it.
But they tend to have a habit of breaking the SLA all at once
with a four-hour outage.
Whereas when we think about those SLAs and when we reason about what's the expected downtime that I might have,
we're thinking about it in terms of some sort of regularity.
Well, we expect them to maybe have an outage of 15 minutes every month.
Well, that's not actually true.
They're much better at running infrastructure than we are.
So their outages tend to be really big outages and last for hours at a time.
So can you really afford to be down for, say, four to six hours once every two years?
And credit where due, not only have the outages gotten
better, but there's an incredible amount of engineering and intelligence that goes into
all of these providers' platforms. There's also been an improvement in the messaging coming out
of them. We experienced an issue across less than 1% of our load balancer fleet is reassuring to the general internet.
But if you're in that less than 1% of the load balancer fleet and none of your systems are responding, it feels like they're trivializing your pain.
And to their credit, I don't see the messaging taking that tone anymore.
I would really love to know exactly how many 1% of load balancers are.
I get the sneaking suspicion that not only will they never tell us, but that that number is almost
completely irrelevant within days or weeks after they'd give one. Oh, I'm sure. It's dynamic.
Right. And even the interesting thing about all that is Amazon has such a large fleet and so many customers doing just tons and tons of traffic and lots of resource utilization that 1% is an impossibly large number, I'm sure.
1% of, say, load balance for a fleet is probably larger than the entire infrastructure that most of us are used to thinking about.
Oh, absolutely.
Yeah, if you're within that 1%, well, that's still a pretty big number.
There's also the question of visibility of outages.
I freely admit that I've bought monitoring tools in the past that if they took a two-week outage,
I don't think anyone at the company I was at would have noticed or cared
because that was
how long we would go in some cases between looking at certain dashboards. Whereas other companies,
a great example of this is Slack, given the way that they tend to interact and be used across so
many different places, it's almost like everyone has a monitoring client open with focus on their
desktop or mobile device at all times. And if they more or less drop a single packet, it feels like the world is ending.
So there's a definite perception bias as far as how outages and pain is perceived.
Yeah, absolutely.
I expect different services to have different availability levels depending on what that
service is.
I have the system that I use to send out emails for monitoring weekly.
I only log into it every week, like once a week.
So if it's down for six days out of the week, I don't really care.
But if Slack is down for, say, five seconds, well, I'm freaking out.
Have you ever had that moment where, oh, Slack is down? That's unfortunate. I should tell someone and you pull up Slack to do it.
Yes. Yes, I have. It is terrible. It is awful. And I feel bad.
Yes. At least I'm not alone in that particular neurotic compulsion. Thank you.
Yeah. After a while, I just, I put the computer down and I just stare at a wall because I'm not sure what to do with myself anymore.
So in conclusion, to sum up what we've just spent half an hour talking about, monitoring is painful.
It is likely to remain so for the foreseeable future.
But the path forward, as best I can tell, is to come to a realistic understanding and assessment of what it's going
to take to get proper visibility into your application for its risk tolerance. Would
that be a fair summation? Yeah. Monitoring has improved dramatically over the past five years,
even. And it improves every day. There's so many incredibly smart people thinking hard about this
problem. They're actively improving things.
So it's getting better every single day. So I'm looking forward to what it's going to look like in another five years. It's strange. I was about to argue that point because I remember back when
I first started at a university many years ago, our monitoring system when I got there was the
help desk would let us know if there was a problem.
It seems horrifying, but no one ever decided not to enroll in a university for another semester because the website was broken at least 10 years ago. And then I wound up rolling out Nagios,
and it felt like it was a better monitoring story than anything I've had in recent memory.
But the other side of the coin, from a fairness
perspective, is today I have hundreds of instances, or thousands, depending upon which
environment I'm talking about, in autoscaling groups, in containers, and whatnot. There,
I had three mail servers, four web servers, and a database. That was it. Nagios was pretty good at keeping an eye on systems that never changed in
any meaningful way. If Nagios went off, there was an actual problem. At one point, it was 120 degrees
server room. At other times, it was that something had gotten unplugged. The set of problems that
were likely to occur were much smaller then and less nuanced than they are
today. So it isn't that I think that monitoring has gotten worse. I think you're right. It has
gotten provably better. It's just that our environments have gotten so much more complex
and have found new and interesting ways to catch fire. Yep, I agree. Well, thank you so much for
joining me today, Mike. Yeah, thank you for having me. Is there anything that you've got going on that you want to mention or tell our listeners about?
Yeah, so I was talking with you before we joined on this that I'm working on a new project.
It is, by the time this is published, it will be live.
I'm working on a new project to help engineering teams hire great people.
You can find out all about it at mikejulian.com,
which will also be in the show notes.
Perfect. Thank you so much for your time,
and I'm sure we'll speak in future days.
Absolutely. Looking forward to it.
My name is Corey Quinn, and this is Screaming in the Cloud.
This has been this week's episode of Screaming in the Cloud.
You can also find more Corey at screaminginthecloud.com or wherever fine snark is sold.