Screaming in the Cloud - The Need for Reliability with Lex Neva
Episode Date: March 28, 2023Lex Neva, Staff Site Reliability Engineer at Honeycomb and Curator of SRE Weekly, joins Corey on Screaming in the Cloud to discuss reliability and the life of a newsletter curator. Lex shares... some interesting insights on how he keeps his hobbies and side projects separate, as well as the intrusion that open-source projects can have on your time. Lex and Corey also discuss the phenomenon of newsletter curators being much more demanding of themselves than their audience typically is. Lex also shares his views on how far reliability has come, as well as how far we have to go, and the critical implications reliability has on our day-to-day lives. About LexLex Neva is interested in all things related to running large, massively multiuser online services. He has years of SRE, Systems Engineering, tinkering, and troubleshooting experience and perhaps loves incident response more than he ought to. He’s previously worked for Linden Lab, DeviantArt, Heroku, and Fastly, and currently works as an SRE at Honeycomb while also curating the SRE Weekly newsletter on the side.Lex lives in Massachusetts with his family including 3 adorable children, 3 ridiculous cats, and assorted other awesome humans and animals. In his copious spare time he likes to garden, play tournament poker, tinker with machine embroidery, and mess around with Arduinos.Links Referenced:SRE Weekly: https://sreweekly.com/Honeycomb: https://www.honeycomb.io/
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
This episode is sponsored in part by our friends at Chronosphere.
Tired of observability costs going up every year without getting additional value?
Or being locked into a vendor due to proprietary data collection, querying, and visualization?
Modern-day containerized environments
require a new kind of observability technology
that accounts for the massive increase in scale
and attendant cost of data.
With Chronosphere,
choose where and how your data is routed and stored,
query it easily,
and get better context and control.
100% open-source compatibility
means that no matter what your setup is,
they can help. Learn how Chronosphere provides complete and real-time insight to ECS, EKS,
and your microservices, wherever they may be, at snark.cloud slash chronosphere.
That's snark.cloud slash chronosphere. Welcome to Screaming in the Cloud. I'm Corey Quinn. Once upon a time, I decided to start
writing an email newsletter. And, well, many things happened afterwards, some of them quite quickly.
But before that, I was reading a number of email newsletters in the space. One that I'd been
reading for a year at the time was called SRE Weekly. Still comes out. I still wind up reading it most weeks.
And it's written by Lex Neva,
who is not only my guest today,
but also a staff site reliability engineer at Honeycomb.
Lex, it is so good to finally talk to you
other than reading emails that we send to the entire world
past each other like ships in the night.
Yeah, I feel like we should have had
some kind of
meeting before now. But yeah, it's really good to finally meet you.
It was one of the inspirations that I had. And to be clear, when I signed up for your newsletter
originally, I was there for issue 15, which is many, many years ago. I was also running a small
scale SRE team at the time. I found this useful as a part of doing my job
and keeping abreast of what was going on in the ecosystem.
And I found myself, once I went independent,
wishing that your newsletter and a few others
had a whole bunch more AWS content.
Well, why doesn't it?
And the answer is, is because you are, you know,
a reasonable person who understands
that mental health is important
and boundaries exist for a reason.
No one sensible is going to care that much about one cloud provider all the time.
If only we're all that wise.
Right. Well, first of all, I love your newsletter and also the content that you write.
I mean, I would be nowhere without content to link to.
And I'm glad you took on the AWS thing because much like how I haven't
written Security Weekly, I also didn't write any kind of AWS Weekly because there's just too much.
So thanks for falling on that sword. I fell on another one about two years ago and started the
Thursdays, which are last week in AWS Security. But I took a different bent on it because there
are a whole bunch of security newsletters that litter the landscape. And most of them are very good, except for the ones that seem to be entirely too vendor captured.
But the problem is, is that they lacked both a significant cloud focus as well as an understanding that there's a universe of people out here who care about security or at least should, but don't have the word security baked into their job title.
So it was very insular. Using acronyms, they assume that everyone knows.
Or it's totally vendor-captured,
and it's trying to do the whole fear, uncertainty, and doubt thing,
and that's why you should buy this widget.
Will it solve problems?
Well, it'll solve our revenue problems at our company that sells the widgets,
but other than that, not really.
And it just became such an almost incestuous ecosystem.
I wanted something different.
Yeah, and the SNark is also very useful
in order to show us that you're not in their pocket.
So yeah, nice work.
Well, I'll let you in on a secret now that we are,
what, I'm somewhat like 300 and change issues in,
which means I've been doing this for far too long.
The snark is a byproduct of
what I needed to do to write it myself, because let's face it, this stuff is incredibly boring.
I needed to keep myself interested as I started down that path. And how can I continually keep
it fresh and funny and interesting, but not go too far? That's a fun game. Whereas copying and
pasting some announcement was never fun. Yeah, that's not,
I hear you. I'm trying to make it interesting. One regret that I've had, and I'm curious if
you've ever encountered this yourself, because most people don't get to see any of this. They
see the finished product that lands in their inbox every Monday. And in my case, Monday,
I forget the exact day that yours comes out. I collect them and read through them all at once. But I find
that I have often had cause to look back and regret the implicit commitment in last week in AWS as a
name, because it would be nice to skip a week here and there just because either I don't particularly
feel like it or, wow, there was not a lot of news worth talking about that came out last week.
But it feels like I've forced
myself onto a very particular treadmill schedule. Yeah. Yeah. It comes with like calling it SRE
Weekly. I just followed suit for some of the other weeklies, but yeah, that can be hard. And I do give
myself permission to take a week off here and there, but you know, I'll let you in on a secret.
What I do is I try to target eight to
10 articles a week. And if I have more than that, I save some of them. And then when it comes time
to put out an issue, I'll go look at what's in that ready queue and swap some of those in and
swap some of the current ones out just so I keep things fresh. And then if I need a week off,
I'll just fill it from that queue if it's got enough in it. So that lets me take vacations and whatnot. Without that, I think I would have had a lot harder of a time
sticking with this or there just would have been more gaps. So yeah. You're fortunate that you have
what appears to be a single category of content when you construct your newsletter, whereas I
have three that are distinct. AWS releases and announcements and news and things to make fun of
for the past week.
The things from the larger community,
folks who do not work there,
but are talking about interesting approaches
or news that is germane.
And then ideally a tip or a tool of the week.
And I found at least lately
that I've been able to build out the tools portion of it
significantly far in advance
because a tool that makes working with AWS easier this week is
probably still going to be fairly helpful a month from now. Yeah, that's fair. Definitely.
But putting some of the news out late has been something of a challenge. I've also learned by
getting it wrong that I'm holding myself to a tighter expectation of turnaround time than any
part of the audience is. The Thursday news is all written the week before, almost a full week beforehand, and no one complains about that. I have put out the newsletter a couple of times,
an hour or two after its usual 7.30 Pacific time slot that it goes out in. Not a single person has
complained. In one case, I moved it by a day to accommodate an announcement, but didn't explain
why not a single person emailed in. So, okay, that's good to know.
Yeah, I've definitely gotten to like Monday morning,
like a couple of times, not much, not many times,
but a couple of times I've gotten to Monday morning
and be like, oh, hey, I didn't do that thing yesterday.
And then I just release it in the morning
and I've never had a complaint.
I've canceled last minute because life interfered.
The most I've ever had was somebody emailing me and be like, you know, hope you feel better soon.
Like when I had COVID and stuff like that.
So, yeah, sometimes maybe we do hold ourselves to a little bit of a higher standard than is necessary.
I mean, there was a point where I got I had major eye surgery and I had to take a month off of everything and took a month off the newsletter.
And, yeah, I didn't lose any subscribers.
I didn't have any complaints. So people, I think, appreciate it when it's there.
And, you know, if it's not, they just wait until it comes out.
I think that there is an additional challenge that I started feeling as soon as I started
picking up sponsors for it, because it's, well, at this point, I have a contractual obligation
to put things out. And again, life happens, but you also don't want to have to reach out and apology tours every third week or whatnot.
And I think that's in part due to the fact that I have multiple sponsors per issue,
and that becomes a bit of a juggling dance logistically on this end.
Yeah, when I started, I really didn't think I necessarily wanted to have sponsors because,
you know, it's like I have a job.
This is just for fun.
It got to the point where it's like, you know, I'll probably stop this if there's not some kind of monetary advantage.
And having a sponsor has been really helpful.
But I have been really careful.
Like, I have always had only a single sponsor because I don't want that many people to apologize to.
And that meant I took in maybe less money than I could have, but that's okay.
And I also was very clear, even from the start, having in the contract that I may miss a week without notice.
And yes, they're paying in advance, but it's not for a specific range of time.
It's for a specific number of issues. Whenever those come out, that definitely helped
to reduce the stress a little bit. And I think without that, you know, having that much over
my head would make it hard to do this. You know, it has to stay fun, right?
That's part of the things that kept me from honestly getting into tech for the first part
of my twenties. It was the fear that I would be taking a hobby, something that I loved and turning it into something that I hated. Yeah, there is that. It's almost 20 years
now. And I'm still wondering whether I actually succeeded or not in avoiding hating this.
Well, okay. But I mean, are you, you know, are you depressed? So there's this other thing,
there's this thing that people like to say, which is like, you should only do a job that you really
love. And I used to think that, and I don't actually think that anymore. I think that it is important to have a job that you can
do and not hate day to day, but there's no shame in not being passionate about your work. And I
don't think that we should require passion from anyone when we're hiring. And I think to do so
is even like privilege. So, you know, I think that it's totally fine
to just do something because it pays the bills.
Oh, absolutely.
I find it annoying as hell when I'm talking to folks
who are looking to hire for roles and,
well, include a link to your GitHub profile
as a mandatory field.
It's, well, great.
What about people who work in places
where they're not working on open source projects
as a result, and they can't really disclose what they're doing? And the expectation that, oh, well,
outside of work, you should be doing public stuff too. I used to do a lot of public open source
style work on GitHub, but I got yelled at all the time for random unrelated reasons. And it's,
I don't want to put something out there that I have to support and people start to ask me questions about. It feels like impromptu unasked for code review.
No thanks. So my code, my GitHub profile looks fairly barren.
You mean like yelling at you like, oh, you're not contributing enough or, you know,
we need this, this free thing you're doing like immediately or that kind of thing.
Worse than that, the worst example I've ever had for this was when I was giving a talk called Terrible Ideas in Git. And because I wanted to give some hilariously contrived demos that took a fair bit of work to set up, I got them ready to go inside of a Docker container. And because I didn't trust that my laptop would always work, I might have to borrow someone else's, I pushed that image called Terrible Ideas up to Docker Hub. And I wound up with people asking questions about it.
Like, is this vulnerable to shell check?
And it's, you do realize that this is intentionally designed to be awful.
It is only for giving a very specific version of a very specific talk.
It's in public just because I didn't bother to make it private.
What are you doing? Please tell me you're not running this in production at a bank. No comment. Right. I don't
want that responsibility of people yelling at me for things I didn't do on purpose. I want to get
yelled at for the things I did intentionally. Exactly. It's funny that sometimes people
expect more out of you when you're giving them something free versus when they're paying you
for it. It's an interesting quirk of psychology that I'm sure that professionals could tell me
all about. Maybe there's been research on it. I don't know. But yeah, that can be difficult.
Oh, absolutely. I used to work at a web hosting company and the customers spending thousands a
month with us were uniformly great. But there was always the lowest tier customer of the cheapest
thing that we offered that seemed to expect that that entitled them to 80 hours a month of support from engineering problems and whatnot.
And it was not profitable to service some of those folks.
I've also found that there's a real transitive barrier that begins as soon as you find a way to charge someone a dollar for something.
There's a bit of a litmus test of can you transfer a dollar from your bank account to mine?
And suddenly, the entire tenor of the conversations
with people who have crossed that boundary change.
I have toyed on some level with the idea
of launching a version of this newsletter
or wondering if I retcon the whole thing.
Do I charge people to subscribe to this?
And the answer I keep coming away with is not at all,
because it started, in many respects respects as marketing for AWS bill consulting, and I want the audience as vast
as possible. Artificially limiting its distribution via a pay-for model just seemed a little on the
strange side. Yeah, and then you're beholden to a very many people, and there's that
disproportionality. So years ago, before I even started in my career,
and I guess, you know, things that were SRE before SRE was cool. I worked for a living in
Second Life. Are you familiar with Second Life? Oh, yes. I'm very familiar with that. Linden Labs.
Yeah. So I worked for Linden Lab years later, but before I worked for them, I sort of
spent a lot of my time living in Second Life, and I had a product that I sold for two or three
dollars, and actually it's still in there. You could still buy it. It's interesting. I don't
know if it's because the purchase price was 800 Linden dollars, which equates to like two dollars
and sixteen cents or something like that. The original cryptocurrency. Right, exactly, except
there's no crypto involved.
But people seemed to have a disproportionate amount
of how much of my time they expected for support.
I'm going to support them a little,
but you have to recognize at some point,
I actually can't come give you a tutorial
on using this product
because you're one of 500 customers for this month.
And you've given me $2 and I don't have 10 hours to give you.
You know, like, sorry.
Yeah, so that can be really tough.
And on some level, you need to find a way to either charge more
or charge for support on top of it,
or ideally, and I wish more open-source projects would take this approach,
huh, we've had 500 people asking us the exact same question.
Should we improve our docs?
No, of course not.
They're the ones who are wrong.
It's the children who are getting it wrong.
I don't find that approach to be particularly useful,
but it bothers me to no end
when I keep running into the same problem
onboarding with something new and I ask about it.
And oh yeah, everyone runs into that problem.
Here's how you get around it.
This would have been useful to mention in the documentation. I try not to ask questions without reading the manual first. Well, so there's a,
wow, a couple of different directions I could go with this. First of all, there's a really
interesting thing that happened with the CoreJS project that I recommend people check out.
Another thing, though, I think the direction I'll go at the moment, we can bookmark that other one,
but I have an open source project on the side
that I kind of did for my own fun, which is a program for creating designs that can be processed
by computer-controlled embroidery machines. So this is sewing machines that can plot stitches
in the XY plane based on a program that you give it. And there really wasn't much in the way of
open source software available that could help you create these designs. And so I just sort of hacked something together and started hacking
with Python for my own fun and then put it out there and open sourced it. And it's kind of taken
off, kind of like gotten a life of its own. But of course, I've got a newsletter, I've got three
kids, I've got a family and a day job. And I definitely hear you on the like, you know, yeah, we should put this FAQ in the docs,
but there can be so little time to even do that.
And I'm finding that there's like,
you know, people talk about work-life balance.
There's like work slash life slash open source balance
that you really, you know,
you have to like balance all three of them.
And in a lot of weeks,
I don't have any time to spend on the project.
But you know what? It still kicks along. And people just kind of, they use my terrible little project as best
they can, even though it has a ton of rough edges. I'm sorry, everyone. I'm so sorry. I know it has,
the UI is terrible. But yeah, it's interesting how these things sometimes take on a life of their own
and you can feel dragged along by your own open source work, you know?
It always bothers me.
I think this might tie back to the CoreJS issue you talked about a second ago,
where there are people who are building and supporting open source tools or libraries
that they originally constructed to scratch an itch,
and now they are core dependencies
of basically half the internet,
and these people are still wondering on some level,
how do I put food on the table this month?
It's wild to me.
If there were justice in the world,
you'd start to think these people would wind up with
never have to work again if they don't want to positions,
but in many cases, it's exactly the opposite.
Well, that's the really interesting
thing. So first of all, I'm hugely privileged to have any time to get to work on open source.
There's plenty of people that don't. And yeah, so requiring people to have a GitHub link to show
their open source contributions is inherently unfair and biased and discriminatory. That aside,
people have asked all along, like, Lex, this is decent software. You could sell this. You could charge money for this thing, and you could probably make a decent living at this. And I categorically
refuse to accept money for that project because I don't want to have to support it on a commercial
level like that. If I take your money, then you have an expectation
that especially if I charge what one would expect. So this software, part of the reason I
decided to write my own is because it starts at $200-some-odd for the competitors that are
commercial and goes up into the $5,000-$10,000 for a software package. Mine is free. If I started
charging money, then yeah, I'm going to have to build a
support department. We're going to have to have a knowledge base. I'm going to have to incorporate.
I don't want to do that for something I'm doing for fun, you know? So yeah, I'm going to keep it
free and terrible. It becomes something you love turns into something you hate without even noticing
that it happens, or at least something that you start to resent. Yeah, I don't think I would
necessarily hate machine embroidery because I love it.
It's an amazingly fun little quirky hobby.
But I think it would definitely take away some of the magic for me, where there's no
stress at all.
I can spend months noodling on an algorithm and getting it right, whereas if we start
having to have deliverables, it changes it entirely.
It's odd. It seems on some level
too, that the open source world that I got started with has evolved in a whole bunch of different
ways. Whereas it used to be right, a quick fix for something. And it would get merged in, in many
cases, by the time you got back from lunch. And these days it seems like it takes multiple weeks,
especially with a corporate controlledcontrolled open-source project.
And there's so much back and forth,
and even getting the boilerplate,
like the CLA,
the Contributor License Agreement, aside,
and winding up and getting other people
to sign off on it.
Then there's back and forth,
in some cases for weeks,
about, well, the right kind of test coverage
and how to look at this
in the right holistic framework.
And I appreciate that there is validity
and value to these things, but is that where the bulk of the effort should be going?
When there's a pull request ready to go that solves a breaking customer problem,
but the test coverage isn't right, so we're going to delay it for two or three releases.
What are you doing there? Someone lost a plot somewhere. And I'm sure there are
reasons it makes sense given the framework people are operating within. I just find it
maddening from the side of having to deal with this as a human. Yeah, I hear you. And it sometimes
can go even beyond test coverage to something like code style. It's like, oh, that's not really
in the style of this project, or I would have written it this way. And one thing I've had to
really work on on this project is to make it as inviting to
developers as possible.
I have to sometimes look at things and be like, yeah, I might do that a different way.
But does that actually matter?
Like, do I have a reason for that that really matters?
Or is it just my style?
And maybe because it's a group project, I should just be like, no, that's good as it
is.
So you've had an interesting career, and clearly you have opinions about SRE as a result.
When I started seeing that you were the author of SRE Weekly years ago, I just assumed something that I don't believe is true.
Is it possible that you have been contributing to the community around SRE, but somehow have never worked at Google?
I have never worked at Google. I have never worked at Google. I have never worked at Netflix. I have never worked at any of those
big companies. The biggest company I've worked for is Salesforce, although I worked for Heroku,
who had been bought by Salesforce a couple years prior. And so it was kind of like working for a
startup inside a big company. And here's the other thing. I created that newsletter
two months after starting my first job where I had it, like the first job in which I was titled SRE.
So that's possibly contentious right there. You know, I hadn't thought of it this way,
but you're right. I did almost the exact same thing. I was no expert in AWS when I started
these things.
It came out of an effort that I needed to do of keeping touch with everything that came out that had potential economic impact, which it turns out are most things when you understand that architecture and cost are the same thing when it comes to cloud.
But it was more or less gathering what smart people were saying.
And somehow there's been this osmotic effect where people start to view me as the wise old sage of the mountain when it comes to AWS. And no, no, no, I'm just old and
grumpy. That looks alike. Don't mistake it for wisdom. But people will now seek me out to get
my opinion on things. And I have no idea what the answer looks like for most of this stuff.
But that's the old SRE model or SS admin model that I've followed, which is when you don't know
the answer, well, how do you get to a place where you can find the answer? How do you troubleshoot this?
Click the button. It doesn't work. Well, time to start taking the button apart to figure out why.
Yeah, definitely. I hear you on people. So first of all, thanks to everyone who writes the articles
that I include, I would be nothing without, I mean, literally, I could not have a newsletter without content creators.
I also kind of started the newsletter as an exploration of this new career title.
I mean, I've been doing things that basically fit along with SRE for a long time.
But also, I think my view of SRE might be not really the same as a lot of folks or like the Google passed down from the Google book model.
I don't, I'm going to be a little heretical here.
I don't necessarily 100% believe in the SLI, SLO, SLA error budget model.
I don't think that that necessarily fits everyone.
I'm not sure it even suits the bigger companies as well as they think it does.
I think that there's a certain point to which you can't actually predict failure and just slowing
down on your deploys to cause there to be fewer incidents so that you can go back to passing your
error budget, to passing your SLO. I'm not sure that actually makes sense or is realistic and
works in the real world.
I've been left with the distinct impression that it's something of a framework for how to think about a lot of those things. And for folks at a certain point of their development along whatever maturity model or maturity curve you want to talk about,
it becomes extraordinarily useful.
And at some point, it feels like the path that a given company is on will deviate from that.
And on some level, if you don't wind up addressing it,
it turns into what it seems like Agile did,
where you wind up with the cult of Agile around it,
and the entire purpose of it is to perpetuate the cult of Agile.
And I don't know that I'm necessarily willing to go so far as to say
that's where SLOs are headed right now,
but I'm starting to get the same sort of feeling around the early days of the formalization of frameworks like that and the
ex cathedra proclamation that this is right for everyone. So I'm starting to wonder whether
there's a reckoning in that sense coming down the road. I'm fortunate that I don't run anything
that's production facing. So for me, it's, I don't have to care about these things, mostly.
Yeah, I mean, we're in 2023.
Things have come so much further than when I was a kid.
I have a little computer in my pocket.
Yeah, you know, hey, math teacher.
Turns out, yeah, we do carry calculators around with us wherever we go.
We've built all these huge, complicated systems online
and built our entire society around them.
We're still in our infancy.
We still don't know what we're doing.
We're still feeling out what SRE even is,
if it even makes sense.
And I think there's, yeah,
there's going to be more evolution.
I mean, there's been the like,
what is DevOps and people coining the term DevOps
and then it getting, you know,
almost immediately subsumed or turned into whatever other people want. Same thing for observability. I think same
thing for SRE. So honestly, I'm feeling it out as I go. And I think we all are. And I don't think
anyone really knows what we're doing. And I think that the moment we feel like we do is probably
where we're in trouble because this is all just so new. You look
where we were even 40, 30, even 20 years ago, we've come really far. For me, one of the things
that concerns slash scares me has been that once someone learns something and it becomes rote,
it sort of crystallizes in amber within their worldview.
And they don't go back and figure out, okay, is this still the right approach?
Or has the thing that I know changed?
And I see this on a constant basis just because I'm working with AWS so often.
And there are restrictions and things you cannot do and constraints that the cloud provider imposes on you.
Until one day, that thing that was
impossible is now possible and supported. But people don't keep up with that, so they still
operate under the model of what used to be. I still remember a year or so after they raised the
global per-resource tag limit to 50, I was seeing references to only 10 tags being allowed per
resource in the AWS console, because not even internal service teams are allowed to talk to each other over there, apparently.
And if they can't keep it straight internally,
what hope do the rest of us have?
It's the same problem of,
once you get this knowledge solidified,
it's hard to keep current and adapt
to things that are progressing,
especially in tech,
where things are advancing so rapidly and so quickly.
Yeah, I gather things are a little feudalistic over inside AWS, although I've never worked
there, so I don't know.
But it's also just so big.
I mean, there's just like, do you even know all of this?
Like, I challenge you to go through the list of services.
I bet you're going to find one you don't know about, you know, that the AWS services, maybe
that's a challenge I would lose.
But it's so hard to keep track of all this stuff
with how fast it's changing
that I don't blame people for not getting that.
I would agree.
We've long since passed the point
where I can talk incredibly convincingly
about AWS services that do not exist
and not get called out on it by AWS employees, because
who would just go and make something up like that? That would be psychotic. No one in their
right mind would do it. Hi, I'm Corey. We haven't met yet, but you're going to remember this,
whether I want you to or not, because I make an impression on people. Oops.
Yeah, Mr. AWS snark, you're exactly what I would expect to do that. And then there was
Hunter, what's his name? The guy who made the, these are the many services of AWS song.
That was pretty great too.
Oh yeah, Forrest Brazil.
He was great.
I loved having him in the AWS community.
And then he took a job,
had a content over at Google Cloud.
It's, well, suddenly he can't very well
make fun of AWS anymore,
not without it taking a very different tone.
So I feel like that's our collective loss.
Yeah, definitely. But yeah, I feel like we've done amazing things as a society,
but the problem is that we're still at the level of we don't know how to program the VCR
as far as trying to run reliable services. It's really hard to build a complex system that,
by its nature of being useful for
customers, it must increase in complexity. Trying to run that reliably is hugely difficult, and
trying to do so profitably is almost impossible. And then I look at how hard that is, and then I look at people trying to make self-driving cars,
and I think that I will never set foot in one of those things until I see us getting good at
running reliable services. Because if we can't do this with all of these people involved,
how do I expect that a little car is going to be, that they're going to be able to produce a car that can drive and understand the complexities of navigating around
and all the hazards that are involved to keep me safe? It's wild to me. The more I learn about the
internet, the more surprised I am that it even works at all. It's like, well, at least you're
only using it for ridiculous things like cat pictures, right? Oh, no, no. We do emergency
services and banking and insurance on top of that, too. Oh, good. I'm sure that won't end horribly one day.
Right? Yeah. I mean, you look at how much of a concerted effort towards safety they've had to
put in in the aviation industry to go from where they were in the 70s and 80s to where we are now, where it's so incredibly safe.
We haven't made that kind of full industry push toward reliability and safety. And it's going to have to happen soon as more and more of the services we're building are exactly, as you say,
life-critical. Yeah, the idea of having this stuff be life-critical means you have to take a very
different approach to it than you do when you're running, I don't know, Twitter for pets, though I
probably need a new fake reference startup now that Twitter for reality is becoming more bizarre
than anything I can make up. But the idea that, well, our ad network needs to have the same rigor
and discipline applied to it as the life support system, maybe that's the wrong framing.
Or maybe it's not. I keep finding instances of situations, maybe not necessarily ad networks,
although I wouldn't put it past them, but situations where a system that we're dealing with becomes life critical when we had no idea that it could possibly do so. For example,
a couple companies back, there was this billing situation where a
vendor of ours accidentally billed our customers incorrectly and wiped bank accounts. And real
people were unable to make their mortgage payments and unable to, like, their bank accounts were
empty so they couldn't buy food. Like, starting to become life critical and it all came down to a single, this could have
been any outage at any company.
That's going to happen more and more, I think.
I really want to thank you for taking time to speak with me.
If people want to learn more, where's the best place for them to find you?
SREweekly.com.
You can subscribe there.
Thank you so much for having me on.
It's been a real treat.
It really has.
You'll have to come back
and we'll find other topics to talk about,
I'm sure, in the very near future.
Thank you so much for your time.
I appreciate it.
Thanks.
Lex Neva,
staff site reliability engineer at Honeycomb
and author of SREweekly.com.
I'm cloud economist Corey Quinn,
and this is Screaming in the Cloud.
If you've enjoyed this episode, please leave a five-star review on your podcast platform of
choice. Whereas if you've hated this episode, please leave a five-star review on your podcast
platform of choice, along with an angry, insulting comment that inevitably Lex or I will start to
gather and then send out in the form of a weekly email newsletter.
If your AWS bill keeps rising and your blood pressure is doing the same,
then you need the Duck Bill Group.
We help companies fix their AWS bill by making it smaller and less horrifying. The Duck Bill Group works for you, not AWS.
We tailor recommendations to your business, and we get to the point.
Visit duckbillgroup.com to get started.