Software Misadventures - Charity Majors - On database outages, journey as a co-founder, thriving under pressure and growing as an engineer - #7
Episode Date: March 20, 2021Charity Majors (https://twitter.com/mipsytipsy) is the co-founder and CTO of Honeycomb.io. Before this she worked at Facebook, Parse and Linden Lab on infrastructure and developer tools, and always s...eemed to wind up running the databases. She is the co-author of Database Reliability Engineering book and also has an amazing blog at charity.wtf. We love the content in her blogs and have learned a lot from them. We had a lot of fun speaking with Charity in this lively conversation! We learned about her journey from being an engineer to co-founding Honeycomb, what it was like being on-call when she was only 17, and staying calm during production incidents. We talked about various production outages throughout the episode and our favorite involved driving to a datacenter to flip a DB switch. Charity also shares what it takes to build an awesome engineering culture, the engineer/manager pendulum, and qualities Charity looks for when hiring senior engineers.
Transcript
Discussion (0)
For me, like a senior engineer, first and foremost, and this, you can call this my back-end bias, which I think is partly fair, but not entirely.
It's rooted in production, right?
For their instincts, you know, I want to be able to trust their instincts, which means that their little data corpus needs to be trained on reality, which means production, right?
Like, I don't think I don't
feel anyone can call this as a senior engineer if they don't know what happened to their code
after they hit March. Welcome to the Software Misadventures podcast, where we sit down with
software and DevOps experts to hear their stories from the trenches about how software breaks in
production. We are your hosts, Ronak, Austin, and Guang.
We've seen firsthand how stressful it is when something breaks in production,
but it's the best opportunity to learn about a system more deeply.
When most of us started in this field, we didn't really know what to expect,
and wish there were more resources on how veteran engineers overcame the daunting task
of debugging complex systems. In these conversations, we discuss
the principles and practical tips to build resilient software, as well as advice to grow
as technical leaders. Hey everyone, this is Ronak here. Our guest in this episode is Charity Majors.
Charity is the co-founder and CTO of Honeycomb.io. Guang and I had a lot of fun speaking with her.
We learned about her journey from being an
engineer to co-founding Honeycomb. We talk about what it was like being on call when she was only
17 and how to stay calm during production incidents. We discuss various production
outages throughout the episode and our favorite one involved having to drive to a data center to
flip a DB switch. Charity also shares what it takes to build an awesome engineering culture,
the engineer manager pendulum, and qualities Charity looks for what it takes to build an awesome engineering culture, the engineer
manager pendulum, and qualities Charity looks for when hiring senior engineers.
There was much real talk in this episode, and we really appreciate Charity for sharing
her honest thoughts with us.
Please enjoy this super educational and highly entertaining conversation with Charity Majors.
All right, Charity, super excited to talk to you today. Welcome to the show.
Thank you.
So one of the things we wanted to start with is we read somewhere you went on call when you were 17 or you started going on call at the age of 17.
So I have two questions there. What was this first job and how did you get
it? And what was it like to be on-call at that age? It was fine. I mean, when you're young,
you literally don't know any better. They can do anything to you. You're like, yeah,
this is just how the world works. Okay. No, I was assistant in for my university.
I don't know if they still do this, but back in those days, they just gave students root.
And I think my very first job was running the math stat department computers.
And from there, I moved up to running all of the university's computers.
And then later, I ran the CS department computers.
And in between there
i worked for like a local web development firm running their computers and all before i ever
owned a computer of my own oh wow uh and also that you i think you were studying music and
then you also studied electrical engineering so you were oh dude i also studied ancient greek and latin and uh you know literature i i really i i was diagnosed last
year with adhd so you know it's a belated acknowledgement of my life story so when you
were on this job managing these computers did you pick things up along the way or first of all
did they interview you for the job uh i'm curious um kind of
um now the honest the honest truth is um and a dear friend of mine had been the math stat
department's computer assistant in first he was graduating they asked him who he recommended um and he told
them that if they hired me he would back me up and anytime i didn't know something he would help me
out so i owe a huge debt of gratitude to this friend who just kind of like randomly saw me
struggling and was like she could use a hand. That is very sweet.
So when you were on this first job,
as you mentioned,
you're managing all these computers and you had Rode
and you said every other thing.
And I knew nothing.
Let's be clear.
I knew jack shit.
Do you remember something that happened
you wish hadn't happened?
Goodness. Do you remember something that happened you wish hadn't happened? Godness.
Yes.
I don't know, man.
Like every day is a new horror when you work with computers.
It's, you know, you never know what you're doing, you know? And I mean, I think that most of these stories have in common is, you know, so yeah, like
the hard drive story.
Like, I had never used hardware.
I had never owned a computer, right?
From my perspective, there were things that I SSH'd in from the terminal in my dorm basement,
right?
And being faced with hardware is fucking terrifying.
It's terrifying.
And then, you know, there's the time when, you know,
I'm working at the little web development shop
and, you know, their custom software goes down.
It's called like InfoArch or something.
And like the old system had gone
and nobody knew where it was deployed.
And, you know, all of these stories end with just like
people just like looking at me, expecting me to help.
And me being at least as lost as them and just kind of going, well, somebody's got to do it.
And, you know, and I guess the thing that I learned from all of them is just like nobody knows what the hell they're doing.
And a great key to success in life is just to be willing
to be the person that they can rely on to figure it out oh yeah that is well said for sure uh so
moving on to some of the writing part uh i think i came across your blog and there are some really
good articles uh at least i've learned a lot from them one of the things that also I found interesting was the domain of your personal
website. It's charity.wtf. I'm curious. Is it the best TLD ever? It is. It certainly is. How does
everyone not have this? So I think the first time I saw that TLD was on your website and I was like,
oh, that's really interesting. I'm sure the content is also super interesting,
which it is.
I'm curious, what prompted that TLD?
Was it in response to something?
I saw it and I knew it had to be mine.
Makes sense.
So you're the co-founder and CTO of Honeycomb.
And I read somewhere that you always like
to be more on the side of let's do stuff.
You mentioned somewhere that you've been the person who is always the person where
if someone has an idea, I'll make it work for you and we'll run this thing.
Yeah, I've never been an ideas person.
I've never been an ideas person.
I've always been a person who, if it's worth doing, I'm an implementer, right?
That's where my heart is. I love optimizing. I love performance tuning. I like if it's worth doing like i could i'm an implementer right like that's that's where
my heart is i love optimizing i like performance tuning i like figuring it out i don't even like
writing software i like understanding it you know i like making it better i which is why it's very
it's you know i i've never been one of those kids who's like i'm gonna start a company someday like
i really kind of loathe the whole founder industrial complex.
I'm just like, oh my God, you started something.
You must be so much better.
Honestly, whenever I see a founder with a C-level title now,
I internally roll my eyes and go,
you didn't deserve that shit either.
You just gave it to yourself.
Nobody thought you deserved that but you.
Sorry, where were we
no no that's okay
what I was trying to ask is
this is something that I wondered
as a CTO what does the role of a CTO
look like like what does it typically
whatever the hell you want it to be literally
there isn't like I
mean
and this is true for
this is not true for every C-level role.
This is more true for the CTO role than I think any other C-level role that I've ever seen.
Because, you know, I will say that there is kind of a broad differentiation between, you know, CTOs who grow up with the company who, you know, they've founded it and, you know know versus the ones who are hired in the ones who
are hired in i think have a bit more of a template but and it's just i think it's a reflection of the
fact that you know every company is a technology company these days but what that looks like and
what the needs of the company are for the person that role are so different like for for some
companies you know it's like the person who writes all the
hardest code you know and all this stuff and and i think that that's that's that's losing favor
because i think you know it kind of inevitably has to become more of a organizational and
visionary and people roll a little bit higher up the stack and everything um but still it can be
it can be you know like the person who is intently involved in what's the next generation for this company's needs and figure out all the things that will actually make it work and file patents and everything.
Or it can be a largely ceremonial role like mine where I'm just giving talks and doing bullshit.
I can't even remember the last time I had such a had a machine which makes me feel very sad um and and like it's almost a marketing role there right where it's an education and it's about
you know and honestly the reason that my role as cto is so outward focused i'm gonna brag a little
because we have our shit together there's literally nothing for me to do on my engineering
guard because it runs like it's they're they're so good and they're so tight and and my
you know and this this has been a you know a there's been a process like everything but um
i think that your role as a leader is always to do ever needs to be done um and it's always to
look for you know your team your team's job is to execute on the incredibly full plate that they have in front of them.
And your job is to be looking ahead at what's next.
Like, how do we get more customers?
And for us, for Honeycomb, so much of our success is going to be tied to can the world get better at writing software?
Because most teams can't really make effective use of honeycomb fully
you know because they aren't doing continuous delivery so i'm totally focused right now i'm
trying to help everyone else you know get like a decade farther along in their journey to writing
better software because sorry we were talking about cto and i'm just like off and running
it's you know i don't think there's i don't think there's any one way to do it
but i that's not the same thing as saying that there's no wrong way to do it there are many
wrong ways to do it many probably way more wrong ways to do it than right ways to do it and it's
just that there's no general answer i i am curious the um sort of that what that progression looked
like right because i imagine in the early, both you and your co-founder,
y'all were both technical, pretty head down, just building it out.
I was CEO for the first three and a half years.
Interesting.
Christine and I just swapped places about a year, year and a half ago.
I see. And what has that been like?
Oh, way better. Oh my God. god i got the great i really won that trade
ceo is the worst fucking job in the world i hated every second of it i think i cried every year
for every day for almost two years um what made it so tough is it just like so many things that
you never knew how to do and learn on the job or there's the answer is many things but
i think two really stand out for me personally um first of all i found it really really really
emotionally challenging to give up or self-sacrifice some of my cherished identity as a technical
person be a person in a lead technical role.
And,
and to watch my team,
like being tasked with solving those problems and have it be my job to go deal
with lawyers and rent and,
and hiring sales people,
you know,
and it just really fucked with me.
I had nightmares.
I had about being unemployable about how nobody would ever hire me again to do
anything but PM work about how there'd be this gap in the resume, and like, I would fall behind the rest of
the world. And none of this stuff was rational. But it was all like very bubbly in my subconscious,
and it made me very unhappy. And secondly, well, let me do three things.
Secondly, you know, it is this just ultimate stress of if the company fails, it's your fault.
You can't share that burden with anyone.
And no one really understands it.
And you're kind of getting the shit kicked out of you all the time because all the problems come to you.
Like you're never able to spend time doing things that are working well it is your job to
spend 100 of your time on the most fucked up shit and you can really lose perspective about overall
health and like i think it was starting to look like a success story i will remind you for the
first three and a half to four years we were like skin of our teeth surviving,
like just beating the odds every year.
And every year I was just like,
we haven't failed yet.
Like this will be the year.
I'm just, I know this is the year when we fail.
Like I just believe that from the day.
And most other people did too, I think.
And so that was on me.
Like that was my fault.
I had dragged so many of these
beautiful people who I love so much off to do this failed crazy thing with me. And I was
responsible. They could have been making like three times as much money. They could have all
these things. And instead they were leading me, they were following me and I was leading them off
this cliff. Um, and all right. The third thing that was difficult was that uh i had
to be traveling 50 of the time and so i was never doing a good job and honestly the other thing
five things i realized but like this might have been the biggest thing was that i'm just not
temperamentally suited for that role and and this has been a real it's kind of a painful journey of self-discovery and self-knowledge but
um i i don't know if you you're familiar with like the four tendencies which is like what
motivates you deeply there's there's like external expectation for you that what everyone else
expects of you and there's the internal expectation the goals you set for yourself and a lot of what defines you personally is
how do you respond to those motivations um and you know christine is is like the upholder type
where she gets off i'm like if someone else has a goal for her she will fucking hit it and feel
great about herself if she has a goal for herself she will fucking check that thing off and feel great about herself she is like the ultimate like checklist maker structured
person loves all this stuff i am the literal opposite of that i am the type that rejects
external expectations of me what's exactly the opposite of them and also psychologically tricky
but rejects rejects and resists my own expectations of
myself. As soon as I set a goal for myself, it's the last thing I'm going to do. And I think it's
very challenging for my personality type to be in the CEO role, because I think companies like
small children depend and thrive on structure and predictability and, know showing up at the same time at the same place
every day for you know and i just i was doing a constant doing just like deep psychological
warfare with myself for three and a half years to try and fit into that box interesting and was
there like an aha moment where you were like shit you know it's time to try something else and then
no i got so deep in it that you know it was
the choice was made for me I mean it was the right choice because I was so unhappy and just like but
I get so fucking stubborn the harder I'm feeling the more the more I dig it I do not know how to
quit and I do not say that to brag. Ultimately, it's not a good
thing if you don't know how to quit. But you know, it worked out the way it worked out. Things have
stabilized since then. They're better. We wouldn't have survived if I hadn't been CEO for the first
three and a half years. And we, I don't think we would have survived if we hadn't done the switch
when we did. yeah thank thank you
for sharing that uh and being so open about it i i don't think it's easy to talk about this uh it's a
pathology of my personality i think no it's not actually i was raised not talking about
not sharing feelings and all this stuff. And it's been like part of my self-work as an adult
has been like overcoming that.
And some would say that I have overcompensated.
So here we are.
Well, there's so much to unpack in what you shared.
And I want to come back to some of those things.
But one thing which you touched on just now
is that you identify that you are very stubborn. I'm curious, in which situations has
it served you really well? You mentioned that you wish you weren't that stubborn in some cases, but
in what cases you're like, yes, this characteristic of mine helped so much?
Oh, I'm ultimately unstoppable if I want to do something.
Well, that's pretty awesome. I have to caution because I will do it.
Damn the consequences and the side effects and everything.
It's not necessarily a great personality.
But, you know, I'm a woman in tech.
And, you know, and it has kept me here.
And, in fact, the aspect of my personality that thrives,
that is fueled almost solely by fuck you.
That's not been a bad thing for me.
Like the more people tell me to give out a check, the more stubbornly I am here.
Right. Like I feed off that shit.
It's great. So, yeah, tell me I'm going to fail. Do it.
Well, at this point, Honeycomb is succeeding.
So and congratulations on the recent series V-Round that you just did.
Thank you.
I'm curious, how involved are you in fundraising as a CTO?
As involved as I want to be.
Okay.
That's the amazing thing about being not CEO.
You get to pick and choose what the fuck you want to do.
That sounds like freedom.
It really is.
So changing subject a little bit I was reading one of the interviews and I want to
read a small part of the answer that you said I want to and want to follow that up with a question
so one part where you read was oh you said was I'm really good at firefighting and staying cool
in the middle of a crisis I never panic or freak out and make bad decisions under extreme pressure.
That is an incredible skill.
It doesn't come naturally to a lot of people.
Well, not to me, for sure.
So I'm curious, how did you develop that skill?
So honestly, last year I was diagnosed with ADHD.
I know, the world's least surprising diagnosis.
It was to me, it was a, I had never considered
the fact that I might have an attention disorder. But apparently that's kind of,
it's one of the side effects is like your brain is so just like buzzing all around. But when you,
when you give it adrenaline or crystal mass or Adderall, it slows down and it can focus.
So I can't really take any credit at all for it.
It's just kind of a byproduct of my psychology.
But yeah, like the moments that I remember feeling like the most alive are the ones when
the site's down.
If I don't fix it, the company will go under and there's no one else who can do it.
And I just go to my happy place. And I just go, I just like
go to my happy place. And I'm just like, cool. And it's just like, I can focus. And I just like,
I, you can't stop me. I, it's, it's wonderful. And I, I, I love those moments and I know that's
terrible, but they're wonderful. It's amazing that the entire team can rely on you to be so
calm and stable in those moments. So for people who are...
It'd be great if they could rely on me to be calm and stable in all the other moments.
Yes.
What I was going to ask is,
for folks who are kind of dipping their toes in the operational journey
and who have been on call but haven't seen many outages yet
and who are still learning
to maintain their composure in these kind of moments where there is too much pressure
how do you recommend they develop that skill oh that's such a great question because first of all
although i do think this is like this is to some extent biologically you know determined i also
think that is entirely a learnable skill.
Like I have seen, I'm sure that I've improved at it, you know, just by being, you know, doing it so many times.
And I've seen other people just dramatically.
I've seen, you know, interns come in who just like, my friend Jeff Wad, who you work with, although you've never met, has these great stories about being an intern.
Just like freezing freezing you know i don't need you know
yes it is a learnable skill and i think it's a great skill to learn because when when when your
adrenaline is is pumping it generally makes people make way worse decisions and the worst thing you
want to do in the middle of a crisis is have two crises or make it worse right and so yeah like i think that you know training yourself to like
not react is the first thing right like adrenaline spikes and and our little lizard brain just like
ah i must jump i must react i must do something and it's training yourself to like stop take a
breath take two nothing is going to do the chances of you uh making the situation worse by not doing
something in five seconds there are very small and then and the chances you make them way way
worse or way higher right so like just take a few seconds and like deep breath deep deep breaths
until you feel your pulse kind of return and and also like take control of the environment around
you if people if the reason you're take control of the environment around you.
If people, if the reason you're freaking out is because everyone around you is freaking out,
like, we're kind of herd animals.
It's going to be incredibly hard to keep your head if people are just like, ah, ah, ah,
or just, like, over your shoulder.
Like, if people are, like, if people are causing other people to become tense,
just, like, you know, take control of the situation.
You know, speak in a slow, calm voice. You know, take control of the situation. They, you know, speak in a, in a, in a slow, calm voice, you know,
let's, let's sit down, let's take a breath.
Let's think about what's going on.
Can we have a moment of silence, two or three deep breaths, you know,
and, and, and then, you know, regulate your voice.
We respond to feedback loops, right?
And so when you're in the middle of a feedback loop that is winding people up,
your job is to consciously take control of the situation
and start a feedback loop that winds people back down, right?
Gets them back to their normal self.
So anything you do to kind of slow, dampen, modulate,
is really going to have an impact well beyond you.
And here's the thing, it works even if you fake it.
It works 1000% as well if you're faking it
and your heart is beating the entire time.
Well, humans are great at tricking themselves.
And this is such great advice.
I've actually seen a lot of seasoned engineers
who when there is an outage
and everyone is kind of like oh this might
be the hype everyone is just trying to help and share hypothesis it might be the end of days what
if we can't get the data back you know and they're just like okay and you and you hear them
settle into this tone of voice it's like they're at a kindergarten right just like
okay let's take a look at this yes i think no this is an interesting question what did you see you know and it's just
like very slow and and yeah i mean think and imagine like if you like cosplay or whatever
slip into the role of a fireman right who just like arrived at the scene just been like okay
what what's going on here you know is it is today a good day to die today is a great day to die you know yeah uh i when i remember when i hired
my first hour they were still shadowing someone as an on-call and uh the person i was shadowing
was a seasoned engineer and he had lots of experience with dealing with incidents
and when things broke it was saturday 6 a.m in the morning and i freaked out out of bed saying
what the hell just happened um and when we got on slag we got on a call and this person is like okay let's look
at these logs let's do x y and z and i was like how can you be so calm right now when i'm just
freaking out thinking it's not working it's not working we need to get this back up uh but over
time i've realized just you know taking that breath or two is an incredibly
helpful thing yeah uh talking about crisis we love discussing war stories or production outages and
on the show and i want to lead with one so i was listening to one of the podcasts that you were on
and you mentioned at some point in your career you had to drive to a colo and flip a db switch uh what what happened oh this is very this is
routine this is just like you know another saturday night uh i i did this for for years
you know i i worked at like a remote hosting company but this is in the days before we had
like remote hand software um there was during the day, there was a guy sitting in the co-host who would do this,
but if it happened after hours, well, that was on, that was on us to like call a cab,
go to the colo, do this.
Or, you know, when I was at Lindelab for four years, we, you know, we had our own, our own
co-host space with, you know, with racks of machines and you know and something went down.
We scheduled a monthly trip to the Colo to do
you know, we didn't go
down there every time there was a hardware problem
but we had some points of failure
like the primary MySQL
server for which, yeah, if it went down
day or night, we just had to go
down there and flip the switch.
I will
never again touch a server and sometimes when i
think the world's just going to shit i just i just think about that and go you know it's not all bad
yeah there were days when you had to drive to a colo it's it's much better with the cloud these
days uh someone else has to do that someone else says yeah uh So talking about MySQL and some of the outages, you have been, or Honeycomb as a company,
has been really open about sharing incident reports.
I read, I think you wrote the first major outage incident report on your blog, which
was amazing.
Yeah.
When I read that, I was like, oh, that is so cool that you're willing to share and talk
about this.
My entire career, I've been so i've chafed it like
what i've been instructed to say you know publicly about our you know outages and i just remember
like having to you know be vetted by the cto the ceo and they'd go over it just like you know
wordsmith and oh we can't say this we can't mention the vendor's name and all this shit
and so you know when we had honeycomb we're like aha finally i get to fuck this up my own way so i have never proofread a
single post-mortem i trust my team they do stellar work i wouldn't and i'm always like more detail
like the more i have learned so much from like aws's post-mortems fucking phenomenal and the
thing is that like all I think everyone who's editing
or doing the micromanaging thing
is doing their own team a disservice
because nothing builds confidence
with other engineers.
Like just being,
assuming you're not doing dumb shit,
just like on the regular,
nothing builds confidence
by just being transparent
and telling them exactly what went wrong.
Oh yeah, yeah.
It builds trust with people so much.
How has it helped you in hiring? I know I'm digressing a little bit, but.
No, it's been, it's been tremendously helpful with us in hiring.
You know, it's hard for me to tease that out from, you know,
I think I'm going to brag from it here too.
We have never had a problem with hiring.
And this is the moment when an industry, like our VCs tell us every time, every time we have a board meeting, they're like, are you guys able to hire or keep up?
We're always like, yeah, not a problem, man.
We've never had any recruiters.
The only place we engaged recruiter for, I think, was our sales PPP. But I think that people are drawn to Honeycomb
because we try to practice a lot of humility.
And we would rather, there are a lot of things
that we're just angry about in the tech industry.
And so we don't do them.
We do them differently.
And then we talk about them
and we don't claim to be perfect we are not perfect and i think that sometimes people
come to honeycomb and they're disappointed because it's not perfect um but i can guarantee that we
will be transparent and we will fail differently well it's amazing uh so coming back to some of the crisis again, well, I should stop saying crisis.
The shit storms, the trash fires, the dumpster heaps.
Oh, yes. All those are much better than just the word crisis.
Anyways, are there any other outages or war stories that you could share with us yes um absolutely uh would you like me to open my mental file uh
marked mongodb for a while oh yes please go for it so when honey when honeycomb started out
it was it was you know in the early days of mongodb there was just one lock per replica set a step um and uh it and we were a massively multi-tenant system right like we had 60,000
mobile apps um after you know a couple years over a million by the time i left one lock they're all
you know trading for and you know the sharding stuff didn't work for us because sharding works
really well when you know you have big data sets you you can stripe across shards. What we had was lots and lots of little ones.
And so the hotspot, the problems were manifest.
That said, Parse would never have existed as a company
if it wasn't for MongoDB.
So I will give them some credit.
They were definitely on to some stuff.
And side note, product marketing, I believe,
is the reason that MongoDB is still alive as a company.
The marketing, the community building,
it bought them time for the technology to grow up to fulfill its promise.
I still have the T-shirt that says MongoDB is web scale.
So let's not forget their catastrophic first decade or so. Yeah, so let's not forget they're catastrophic first first decade or so um yeah
so like let's see well the the kinds of outages that we had you know someone's app would hit the
um the itunes um top five and and it would take us down immediately and then we'd have to go figure
out what was wrong um god it's almost hard to come up with like there were there are the ones where
like we we hit the uh replication bug uh oh sorry i'm just like that's okay oh god past traumas
oh here's a story about my sequel i think maybe the maybe the MongoDB stuff is still a little too fresh in the process.
So at Linden Lab, we used MySQL for all of our user data.
And we tried upgrading from 4.1 to 5.0.
And we'd been running on 5.0 for all the secondaries for an entire year.
And we had done all of the benchmarking and the sysbench and all this stuff.
All the benchmarks showed that definitely MySQL 5 was going to be way faster than 4.1.
And so we flipped the switch.
And the entire world went down.
Oh, wow.
What happened?
And, you know, we kind of got it back up.
It was limping along, you know.
And we finally realized, for whatever reason, it was limping along you know and then finally realized it for whatever
reason it was not faster for our workload so we had to because there was no backwards you couldn't
do backwards migration on the data so we had to roll the world back for a day or two and bring it
up on the old 4.1 primary go back all the secondary it It was, oh my God, it was so painful.
And so my job after this catastrophe
was to figure out why and make it safe.
And I spent almost a year,
I wrote some software to capture replay
so I could capture 24 hours worth of traffic
and then replay it at various speeds
using a bunch of
clients and all this stuff.
And then, so, okay,
I validated that for our workload
it was actually
1.4 times slower,
not 20% faster.
And then started,
this is around the time that, you know,
Mark Callaghan and the team on MySQL at Google
published the InnoDB patches
that will let you bind to more than one core
and a bunch of the state.
So, like, I upgraded the code in MySQL
and I tweaked with those things and bought.
So I got it up to about parity
and did some other stuff with the disk and bought so so I got it up to about parody um and did some other stuff
with the disc whatnot um whatever and and finally got it to a point where um you know I was like
yes this oh we fixed queries that were you know the the the query planner query planner had some
bugs we actually did some queries that were underperforming we wrote
um and a year later i was like yes i guarantee this this this will work um it will it will um
it will be faster um and so we flipped and it did only a year worth of work. And of course, the reward is nothing happened.
And I'm like, yes!
And everybody else is like, what?
And then, of course, the part of the story that puts the cherry on it is,
and six months later, we got SSDs.
And all of my work was totally pointless if we had just went and just done the SSDs.
So, ops work.
And what you mentioned about the migrations and nothing happened, it's amazing, right?
Like, migrations that don't make any noise are one of the best migrations.
But then they get talked about so rarely because, well, no one knows a migration happened.
Yeah.
Well, you know, and the thing is the difference now, and this was like a decade ago, right?
And the difference is that now, you know,
it's because at the time I wrote this big long blog post and, and it was great.
I wish I could find it. It's, it's been destroyed.
It's a way back machine, but there was no community then to, you know,
there wasn't really the Twitter community. There wasn't really the, you know,
and I feel like we're getting better at this. You know,
I feel like we're getting better at sharing, you know, more you know more widely the the the story of the things that went well so we can learn from each other there and you know
reusing some of that work because also all that year of work that I put into my sequel also you
know it all like poof because there was no you know I tried to get someone interested in taking
over the tools I couldn't maintain it I moved jobs shortly after that. But yeah, and this, but I feel like this is it.
This is obviously a treatable skill
because we have all learned to be like ecstatic
when it's quiet, right?
We're like, yeah, this is amazing.
And so like so many things,
like it's just a question of being conscious
and aware of it and then choosing what you celebrate
and choosing what you, you know, what you value.
And like your
body's your body's internal reward systems like dopamine and all this stuff like they catch up
as long as as long as you do that so i think this is i kind of love that we are the we are the
we're kind of the contrary engineers we celebrate when everyone else is is quiet and we're quiet when everyone else is
celebrating. I like that about us. Like the red diffs, not the green ones.
Exactly. Red diffs are the best diffs. So I was reading one of the incident reports
published on the Honeycomb blog post. And in one of the paragraphs, it was mentioned that
Honeycomb does like burn rate based alerts or kind of SLO burns.
So for our listeners who might not be completely aware of burn rate, can you briefly describe what a burn rate is?
Yeah, it's that thing that makes you only get paged in the middle of the night if it's really important.
It's that thing that makes you able to do most of your engineering work during the day instead of at 2 a.m.
So I think the SLOs, bird-based alerts, are, it's kind of the next,
seeing a lot of interest, like, it is one of those things where you kind of do
have to be this tall to ride this ride.
If you just, if you have, like, an org where, you know,
everything's just a mess, it's not going to do any good.
Not going to help. It's not where you know everything's just a mess it's not going to do any good not going to help
it's not you should spend your time but if if things are working pretty well um you know but
you just have a large and noisy you know system like many of us do it's a really impactful way
to invest in engineering effort and get back a lot of um a lot lower frustration, you know, higher time invested value.
So what it is,
is just the concept of
you don't page about symptoms like,
oh, this just fired,
you know,
oh, this CPU,
like,
fuck a CPU alert.
No one should ever,
ever have to think about a CPU alert. Ever again.
Turn them off.
They're worse than useless.
They're literally worse than useless.
They're burning people out and not doing any good.
But there's also the next generation of individual nodes that go down in a system behind a load balancer.
Shouldn't we have to give a shit about that at 2 a.m.?
There are two categories of things that can go wrong.
There's the things where it's
perfectly fine for it to wait till morning. And then there's the things where either users are
currently being impacted or they will soon, right? Users are currently being impacted. Let's stick
with that, right? You shouldn't actually page anyone about users are going to be impacted soon.
You should page when they're impacted now. But like when your systems are well architected
enough that you're kind of in this zone, just moving to a state where nothing pages anyone
out of hours unless it is, you know, you're in a state where you're burning your budget, right?
You've got your SLO for your company, which is we think it's pretty much fine if 99.9% of the time people are getting an okay, right? Well, if that,
if that's 99.8, I don't think I want to know about that until morning. Right? But if it's 60%,
yeah, I'd like to know about that now. You know, and if it's, you know, if it's, you know,
98, well, I don't know, that's's up to you you know um how long can you can
you go with an elevated error rate before someone should investigate it but it's so much more you
know it's a way of like stepping back from the the front lines of the firefighting and being like
all right we've grown up a little bit we we're now dealing with thresholds not up or down we're
now dealing with gray areas right yeah
makes sense uh it reminds me of one of the things that you said uh nines don't matter if users are
not happy yeah uh it was so much truth in that uh what prompted this by the way this this particular
statement it's just a random thing that came out of my mouth one day and I wrote it down and, uh, and I,
and I put it on my slide at,
um,
uh,
which conference was that?
The one that's in like red state.
I'm not,
sorry.
Um,
uh,
strangely,
strangely.
I'm sorry.
Um,
yeah.
And,
and,
and,
you know,
it did,
it struck a chord with someone.
This is one of the proudest moments of my life was that year at SRECon.
Every single presenter had that quote on their slide.
Oh, nice. It was like a full bingo card.
Like every single one of them had that quote on their slide.
I was just like, I don't think anything is ever going to top this career moment for me.
Oh, yeah. pose or slide i was just like i don't think anything is ever going to top this career moment for me oh yeah i mean i certainly think it should be a banner in front of or on every office on
made a bunch of like stickers with like rainbows and hearts and unicorns that say
if you guys send me your addresses i will send you a sticker like love pack we would we would
totally love that uh no that goes to the host not to everyone listening to this podcast yes we feel special yes
yes uh one quick bit on the error budgets actually so you mentioned that on the burn rate you're
seeing how soon are you going to violate the error budget on bad days one would violate the
error budget so in those cases like what are some of the practices you've seen work well when you're trying to predict the next cycle and say, okay, these are some of the things we'll
do so that we don't violate it again in the next cycle? I mean, this is just your reliability work,
right? Like, it's like, how can we perform better in a slightly degraded state? Right? Like, if
you're going down every time,
well, that's obviously where you start, right?
But then if you're staying up,
but you're losing 30% of all requests,
how can you not do that, right?
How can you do a failover more rapidly?
Or how can you offload those failing queries
or retry or something right it's just
there's just that's just like the nuts and bolts of like what we do
like resiliency is not about you know having less errors it's really not it's about being able to
absorb like more error impacting events and recover from them more gracefully and with less
human intervention yeah well in in the distributed systems world something is almost always breaking
it's just that sometimes you get aware layers of layers of fuck this upon more layers of it it's
just like this is the world that we live in and this is the ground upon which we stand and there
are many holes in it and it's fine because together we're strong.
Changing gears a little bit.
I think a question that every engineer asks themselves at some point is, do I stay, you know, IC or do I try management?
That's actually how I first came across your blog a few years ago.
You described this engineer management sort of pendulum.
Can you tell us more management sort of pendulum.
Can you tell us more about that?
Yeah, that's the most popular blog post that I've ever written.
I like it.
I love that people are still coming up to me years later and going,
oh, that really, and it makes me really happy because I think we're really ripe for sort of a reimagining of the relationship between engineering and management. Yeah, I just,
I firmly believe that, you know, the best engineering leaders that I've ever worked with
or known or learned from, all the people who've spent some time in management, but don't like go
there and stay, right? They go back and forth a few times over the course of their career.
And I started thinking about it and was like, you know, the best managers that I've ever worked with, the best line managers in the world, I think, are never really more than three, four, five years removed from doing real hands on work themselves.
There's an intimacy with the familiarity with the subject knowledge that you just can't fake,
you know, and I think that as much as some people love to proclaim that it's great when
they have a non-technical manager who just like gives them the keys to the kingdom and never
criticizes or has an opinion about their technical work, to me, that's not a great thing.
You know, it's not a great thing. It's not a thing to brag about. It's not a great thing. It's a great thing to have engineering managers who, you know, don't own the decisions,
but who have good taste and good judgment and can help groom and suggest and help you grow.
And that's something that, you know, I think that there's really two, if you want to go into
management, you can do one of two things. You should either go in and decide to climb the ranks or you should go into management and then go back and forth a couple of times.
Because otherwise you lose your edge and you're not as effective for the people that you support and you serve.
And likewise, on the other side, like the best senior engineering leaders, the ones with the most, you know, the most empathy, the most, the best,
the best sense of how to motivate people and inspire them.
The best sense of like how to take this massive project and break it down
into sub projects that somehow for everyone on the team gives them
something that challenges them. It doesn't overwhelm them. You know,
that's like something that will push their boundaries, but not, you know,
so much that they, that they stall. Like that push their boundaries, but not, you know, so much that they that they stall.
Like that is an art form and a human form just just as much as it is a technical thing.
And and all of those engineering leaders, even the RICs, they've spent some time in management.
Right. They spent some time where it was their job to do nothing but think about the human interaction parts and how to connect, you you know our engineering work to the business side um and like i i just i think that i think that i also think it builds so much empathy for me
for among engineers who have spent a little bit of time as man it demystifies it right and
when you haven't done it it's kind of like you're just like you're waiting to be tapped for the
promotion or the opportunity to move up to be better than your peers and all this shit.
After you've done it once, you're like, oh, this is just a different pile of shit.
You don't actually have that much power.
It's just different.
And it's just like the glow vanishes from it.
And you have so much more empathy for your own manager who's stuck stuck like pressed between two between little shits
like you below and like their manager they're just trying to do the best they can and you know
you're just like oh yeah you poor guy so I'm I'm just a big believer in you know try it if you have
it's also it's nothing but a collection of skills and practices. It's been broken down and done through many different configurations, right?
And so even if you don't try being a manager with a title, I think you should learn some
of those managerial skills.
Go to your manager and ask, what can you delegate to me of your managerial skills?
Can I run meetings for a quarter?
Something like meeting running, I shit you not,
is like one of the most underappreciated, undervalued skills.
It's so much time is wasted for people just sitting in meetings
or not run by somebody who knows what the fuck they're doing.
And it frustrates everyone and nobody really understands why.
But it's just like those grains of sand that kind of gum up the works,
you know, in the course of your workday.
So yeah, learn some of those skills um yeah what what i really took away from that i
think is the empathy aspect that you talked about i think you really can tell like as an engineer
when the um when the manager is empathetic like they've gone through it and then they understand
how these things work versus when they're just kind of indifferent where they don't know what
the hell is going on so they're just relying on you know whatever you come up with
they should be and that's like a huge difference like i i believe that managers who don't
they should be on call if they can if they can't they should regularly pinch it including
overnights and weekends or else they don't deserve to be called sales managers because
it's if you're asking someone else to
wake up on behalf of the company you'd be the first to jump off that cliff i'm gonna need to
quote you on that uh at some point um very nice so so and also i really like this uh series on
your blog post uh called questionable advice where it's like a advice column um and you know
where you show like the emails you've got asking for advice and you know you write about like what you would actually tell
them so so first of all what what makes a good question you know i you must get a ton of emails
and i really want to know how i can make my email one day like stand out and you know get featured
on your blog it's just something that makes my mind just sort, you know, start, start spinning.
And I, and I usually like write back to them first and then it, I respond to almost all questions that I get.
So seriously, if you want to ask me a question, just DM me and I will respond to pretty much
all of them.
It's mostly the ones where I was just like, oh, that's kind of a good answer or something.
Or if it's just like a really common
one, I'm like, ah, I can put this on my blog and then fewer people will ask me this. But yeah,
I encourage all questions. I'm very, I respond to pretty much everybody. But honestly, like,
another thing that does like really, it's just when the person is clearly in some angst
or some you know crisis or some you know like this is why i also have a link on my calendar
for people to sign up uh if they want to just like have a phone call and talk about their career
trajectory or or something i will i have benefited so much from so many people in in this industry
just like giving me their time and their
advice and and i've been so fortunate this is something that's been kind of weird for the past
couple years is is like someone asked me a question shit starts coming out of my mouth
and then i stopped and i go oh shit i didn't know i knew that or oh that's that kind of sounded good
and now i'm like oh more people should benefit from all this fucking knowledge and so
you know to the extent
that I can yeah you can sign up
I'll have a call with you and I will
I will just talk about what
because people don't usually reach out unless they're
in some kind of like turning point
in their career or they're not happy and they aren't sure
what to do next and I'm happy to
share at least what I know
must be nice mine usually comes out the other way but um oh
that happens too I just I just immediately ignore it and forget about it selective hearing
yeah I like it um so so a recent post in this series is titled the trap of the premature senior
I was like oh my gosh that's me at my first job.
Can you tell us more about the situation?
Yeah, totally.
I was talking to this kid,
and his first job,
he'd been there for two or three years,
and he was at the top of the fucking mountain.
He was the most senior person there.
He knew everything about how the system worked.
He gets pulled into all the the high level planning meetings like you know it
just and and he's able to do less than this i see work of his own but he you know it's very
validating to be needed and to be wanted and and and he's like i feel like you know this might not
be the best thing for me but you know how how do i i feel like maybe i should switch jobs but then
how can i be sure to get the same comp how can i be sure to get the same comp? How can I be sure to get the same stature? I don't want to just end
up at the bottom of the totem pole, like, you know, you know, doing errand boards that, you know,
low level stuff when I've gotten used to having this very high level of influence and, and, you
know, say, and what I do. And my advice to him was like, get the fuck out of there. It is too soon
for you to feel that way, honey. Like you, you
haven't earned it yet, you know, and it's not going to be good for you. You'll start your growth,
right? Like first, you know, five, six, seven years of your career, at least you need to be
optimizing for, you know, your career is the single biggest, greatest, it is a multi-million dollar appreciating asset.
And you should manage it for the long run, right?
If you get hooked on those feelings of being in charge, knowing everything,
like having a high salary and everything the first couple of years,
it's not going to be good for you in a couple of years.
So don't get used to it. Get out of there.
I think, yeah, like what really struck me was like the
familiarity versus like we've all been there man
it's so good it feels so good
so what one thing that you touched on you know is like you know having worked at different places
and you know different sort of problems, it gives you perspectives.
And that's sort of what makes you more senior, which I think is absolutely spot on.
What other aspects or skill sets do you like expect from a senior engineer, say, if you're hiring for one?
You know, and I want to be clear that in that article, I was like, quit your job.
And I'm not saying that everybody has to quit their job after two or three years.
There are ways to maintain that growth
within a single company by moving teams,
you know, by experiencing different things.
But the point is just to be mindful of it, right?
And to optimize for that.
Yeah, for me, like a senior engineer,
first and foremost,
and this you can call this my back end bias, which I think is partly fair, but not entirely.
It's rooted in production, right?
It's for their instincts.
You know, I want to be able to trust their instincts, which means that their little data corpus needs to be trained on reality, which means production, right? Like, I don't think,
I don't see how anyone can call this as a senior engineer if they don't know what happens to their code after they hit merge, right?
Like you need to know how,
how does it get out there to users and, and, and, and be able to like,
you know, look at it in production and say, you know, what's happening?
Is it doing what I expected it to do?
Does anything else look weird? And do some,
you know, at least basic debugging. I think that's a table stake for anyone.
I think there's some knowledge of exposure to data modeling, you know, some knowledge of,
you know, because code is, and this is all about, like, what did you know beyond some, some knowledge of, you know, cause, cause code is, and this is all about like, what, what did you know beyond the code itself?
Right. Cause there's, there's like the data structures and algorithms,
but, but like, you know, that's just step one. There's also like, you know,
what you need some, some,
some sense of like what data is backing this and what the code that you're
shipping is going to do to it. And, and where those, you know, edges are,
I think for sure is also table stakes.
That's mostly it.
You know, I think that beyond that,
it's just like,
personally, I like to see that someone has been
at least two jobs, you know,
and they sometimes call this T-shape,
you know, that you've shown that you can and have gone really deep in one area and that
you have a broad understanding.
Um, I like that.
Uh, it's certainly benefited me.
I think that my, my T, I went really deep on, you know, my SQL debugging.
Um, and, and then later on I was able to very quickly and efficiently like reproduce that
on my MongoDB and Cassandra because I had that, that depth of context and stuff.
What about the soft skill side? Like what?
Soft skill side? Yeah. I mean, you,
you, you, you have to be,
you can't be an asshole. You know, people can't dread having to talk to you,
you know, because that creates, like it's like it's like having a you know a service that
regularly turns requests away like you wouldn't accept that in your system you can't have people
that regularly just warp and distort the flow of communication by causing people to avoid them
right uh i think that like i do think that there are many archetypes for senior engineers.
Some people will get very dogmatic about,
you must do more mentoring than writing code as a senior engineer.
I really don't think so.
I do think that, you know, you need to be able to, all right, here's a,
you must be this tall. You must be able to explain what you're doing. You must be able to explain it to people who aren't experts in it. You should be able to talk through your code. Here's the interesting thing. above and beyond technical skills, right? If somebody can clearly and, you know, and, and,
you know, gracefully talk us through their solution and why they chose it and, and what
the trade-offs were and, you know, what else they might've tried, um, and came up with the wrong
conclusion, I would, I would rather hire that person than a person who came to the right
conclusion and, and couldn't really explain why or how they got there 100% of the time.
And that's actually, we've found, it's excluded a lot of people who write great code.
But we won't compromise on that because it's such a part of our team's culture is to be
able to explain.
And I think that leads to a greater understanding.
And it certainly leads to greater shared understanding for the team.
So I think that that's a bare minimum for senior engineers, just to be able to clearly explain yourself and to be willing and eager and friendly to help those around you.
But then some ICs get more and more and more powerful, but they just grow at being masters.
They write tons of code very fast, or they, you know, there, there's an archetype. You find mostly at large companies where they may only write five lines of code in the quarter, but they saved the
company $10 million and no one else could have found them and wrote them, you know, so there's
that archetype. Uh, then there's the archetype that's much more about, much less about, you know, writing product software, product features, but it's much more about understanding and, you know,
re, you know, the master of the migration and the re-architect to like get more performance
out of the system on the second or third rewrite, you know. And then there's the type that is very
much about coaching and mentoring and about spreading those skills and being the glue in the team.
May not write much code at all, but can bring up all of the engineers around them.
And I think all of those architects, no shame in being any of them.
Be who you are, who you want to be, but just be sure to find a place that needs that type from you.
That's well said um and i want to go back to the first post of your advice column where
someone asked you um after being a manager can i be happy as a cog um but i want to change it up a
little bit and ask instead and this is something that we touched on a little bit earlier before
it's like after being a technical co-founder or cto can you be happy as a cog? I can't fucking wait to be a cog. I started this
company, my serious intention was to go in a corner and write Go code for two or three years
because I was so burned out. And so I was starting to feel so wobbly in my technical skills. I was
just like, I just want to, I just want to like put my headphones on and write software for a
couple of years. I got to do that for three months.
So my hopes and dreams are on hold.
I want to be a coach.
You know, some of the easiest people to manage are people who have been managers.
Some of the most wonderful people to have your team, people who have been co-founders,
because they're just so chill and so happy to not be them anymore.
They're just like, dude, whatever you need from me. And then they clock out at 5pm and go home, you know, and they're just happy as a clam with
their wife and their daughters. And I'm just like, I hate you. No, I, I think that it's this is a,
it's just like being a manager, right? Like once you've been there, it's the shine is off, you
know, I, I, these, the last five years have been the hardest of my life.
And I sacrificed my marriage.
And it's just like, I'm going to be so stoked to be somebody's cog someday.
You have no idea.
So the flip side, I think I was also interested in kind of knowing is that I feel like a lot of the friends that I talk to who are engineers who want to start a company, they kind of have that same fear, I think that you mentioned
a little bit in the beginning, where it's like, okay, if say, I end up doing a more, you know,
maybe CEO as a founder to kind of help out with the business side, where I need to learn how to
do sales, then it's going to be really tough, like, you know, three, four years, maybe it doesn't work
out and then come back. And, you know, how hireable i am am i like after all that like do you have any sort of i guess advice because to me i
think having gone through that like very quick try try my luck at starting a startup like it
definitely gave me a lot of perspectives in terms of like what matters and how to think about
problems on a high level such that it benefits the business from you know like but i wanted to
kind of give your like do you have any advice for people who are kind of in that position where first of all if you're a dude you don't really
have to worry about it like i really think this is mostly pathology that affects women because
they're seen as so dubiously technical to start with uh i've never seen a dude get put it mostly
people just like oh wow you tried to do a startup right and and that's so cool right like i don't
think you have to worry about it if you're if you I've never seen men get dinged for it. You're just like, oh, yeah,
you have a little bit, you know, you can brush up on your skills, but, you know, I don't think
you need to worry about it. You know, starting a company is hard. There's not a lot of glamour to
it. I don't recommend it to anyone. It's really a stupid thing to do.
Um, but you know, people get the bug and they want to try it.
They have an idea that they believe it.
Only do it as long as you believe in what you're doing.
Right.
Um, but if you believe in what you're doing, then you have to be willing to do whatever
it takes.
And yeah, that means sales.
That means marketing.
And that means not being snobby about this shit.
Right.
Like I honestly think that we,
one of the things that we did right at Honeycomb was be very vocal about how much we respected
business people and the functions that they bring.
You know, like if an engineer expresses any snobbery
about sales or marketing,
it's an instant out in our interview process.
We don't hire people who don't look
at their business counterparts as peers.
And this is incredibly rare.
And I think that it, like that it just rolls off of us
and fumes when this is how we feel. So I think there's some retraining that you need to do if
you're starting a company. Retrain yourself to really see the value and the necessity and the
glory in these things that we have been kind of taught to shit on our entire careers. And be part
of the narrative that's pushing back about
against you know sales being lesser yeah that's that's that's really well said okay charity we're
we're done with all the easy questions now i have a very hard questions for you next um okay so we
know that you're a very experienced podcaster you have your own show the observability podcast
we'll link it i thought you were gonna, we know you're very experienced at whiskey.
We haven't figured out the shipping yet,
so you have to worry about that.
Gotcha, gotcha.
But Ronan and Austin and I, we just started.
We're complete noobs.
So from an experienced veteran podcaster
to us toddler podcasters,
podcaster to podcaster,
how would you rate our podcast on a scale of one
to ten there's no wrong answers this has been one of my two or three favorite podcasts of all time
so you guys are on a roll that was uh you know to our listeners that was not paid for you know
completely spontaneous and that and then yeah i i will say you
have the edge on pretty much any other topic because i'm such a sucker for horror stories
and production so i showed up already pretty like yeah this is gonna be a good one awesome
i was hoping you say like oh i'll give it a six so then i can do the whole line
about oh my mom you know kind of give it like a slightly better score but we'll save that joke
for later before we wrap up uh is there anything else you'd like to share with our uh listeners
um we didn't talk about observability at all which is really refreshing um but you know uh
if people have you know production problems and who doesn't um you
should check out honeycomb because um does it solve it if you have large systems you know
problems uh you probably need observability so honeycomb.io um
oh and we have a really generous free tier, which is cool.
You can run actual real production workloads on it and not pay for anything.
Nice.
That's pretty cool.
And, topical, SLOs.
We have the only implementation of SLO that I have yet seen in the wild that does it correctly. Like according to Google, it's a rebook where, you know,
which is cool because it means you can go from very high level, you know,
the number burning down directly to, you know,
which events are failing and what is different about them from the baseline
events. And just at a glance, you'd go, oh,
all these errors are because of this one node and this one replica set that's
doing blah, blah, blah.
It's just super dope because you can go really quickly
from high level to low level and get back to bed.
That sounds really awesome.
One thing which I would just say is
we purposefully skipped the observability topic
because we know you have a podcast on it.
You've wrote about this a lot,
so we wanted to keep this a little interesting for you.
Yeah, this is great, man.
I love it.
I'm so glad you did.
It's refreshing.
Thanks.
Awesome.
Thanks so much, Charity.
Really appreciate your time.
Yeah, thank you so much.
Yeah, thank you.
It's really great.
Hey, thank you so much
for listening to the show.
You can subscribe
wherever you get your podcasts
and learn more about us at software misadventures.com. You can also write to us at
hello at software misadventures.com. We would love to hear from you. Until next time, take care.