Screaming in the Cloud - Reliability Starts in Cultural Change with Amy Tobey
Episode Date: May 11, 2022About AmyAmy Tobey has worked in tech for more than 20 years at companies of every size, working with everything from kernel code to user interfaces. These days she spends her time building a...n innovative Site Reliability Engineering program at Equinix, where she is a principal engineer. When she's not working, she can be found with her nose in a book, watching anime with her son, making noise with electronics, or doing yoga poses in the sun.Links Referenced:Equinix Metal: https://metal.equinix.comPersonal Twitter: https://twitter.com/MissAmyTobeyPersonal Blog: https://tobert.github.io/
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
This episode is sponsored in part by our friends at Vulture, spelled V-U-L-T-R,
because they're all about helping save money, including on things like, you know, vowels.
So what they do is they are a cloud provider that provides surprisingly high
performance cloud compute at a price that, well, sure, they claim it is better than AWS's pricing.
And when they say that, they mean that it's less money. Sure, I don't dispute that. But what I find
interesting is that it's predictable. They tell you in advance on a monthly basis what it's going
to cost. They have a bunch of advanced networking features. They tell you in advance on a monthly basis what it's going to cost.
They have a bunch of advanced networking features. They have 19 global locations and scale things elastically, not to be confused with openly, which is apparently elastic and open. They can mean the
same thing sometimes. They have had over a million users. Deployments take less than 60 seconds across
12 pre-selected operating systems,
or if you're one of those nutters like me,
you can bring your own ISO
and install basically any operating system you want.
Starting with pricing as low as $2.50 a month
for Vulture Cloud Compute,
they have plans for developers and businesses of all sizes,
except maybe Amazon,
who stubbornly insists on having something of the scale on their
own. Try Vulture today for free by visiting vulture.com slash screaming, and you'll receive
$100 in credit. That's v-u-l-t-r dot com slash screaming. Finding skilled DevOps engineers is a
pain in the neck, and if you need to deploy a secure and compliant application to AWS without such things, forget about it.
But that's where Duplo Cloud can help.
Their comprehensive no-code-slash-low-code software platform guarantees a secure and compliant infrastructure in as little as two weeks while automating the full DevSecOps lifestyle.
Get started with DevOps as a service from Duplo Cloud, and your cloud configurations will be done
right the first time. Tell them I sent you, and your first two months are free. To learn more,
visit snark.cloud slash duplocloud. That's snark.cloud slash d-u-p-l-o-c-l-o-u-d.
Welcome to Screaming in the Cloud. I'm Corey Quinn. Every once in a while, I catch up with
someone that it feels like I've known for ages, and I realize somehow I have never been able to
line up getting them on this show as a guest. Today is just one of
those days. And my guest is Amy Tobey, who has been someone I've been talking to for ages, even
in the before times, if you can remember such a thing. Today, she's a senior principal engineer
at Equinix. Amy, thank you for finally giving in to my endless wheedling.
Thanks for having me. You mentioned the before times. I remember it was right before the
pandemic. We had beers in San Francisco, wasn't it? There was Ian there and a couple other people.
It was a really great time. I think I remember beer. Yeah. And then the world ended.
Oh my God, yes. It's still March of 2020, right? As far as I know. I haven't checked in a couple years.
So you do an awful lot, and it's always a difficult question to ask someone.
So can you just encapsulate your entire existence in a paragraph?
It's awful.
So I'd like to give a bit more structure to it.
Let's start with the introduction.
You are a senior principal engineer.
We know it's high level because of all the adjectives that get put in there. And none of those adjectives are associate or beginner or junior or all the other diminutives that companies like to play games with to justify paying people less.
And you're at Equinix, which is a company that is a bit unlike most of the, shall we say, traditional cloud providers.
What do you do over there, both as a company and as a person?
So as a company, Equinix, what most people know about is that we have a whole bunch of data centers all over the world.
I think we have the most of any company.
And what we do is we lease out space in that data center,
and then we have a number of other products that people don't know as well,
which is one is Equinix Metal, which is what I specifically work on,
where we rent you bare metal servers.
None of that fancy stuff
that you get in the other clouds on top of it.
There's things you can get that are partner things
that you can add on like storage
and other things like that.
But we just deliver you bare metal servers
with really great networking.
So what I work on is the reliability
of that whole system.
All of the things that go into provisioning the servers, making them come up, making sure that they get delivered to the server, make sure the API works right, all of that stuff.
So you're on the Equinix cloud side of the world more so than you are on the building data centers by the sweat of your brow, as they say.
Correct, yeah. Software side.
Excellent. Yeah, I spent some time in data centers in the early part of my career before cloud ate that. That was sort of contemporaneous with the discovery that I'm the hardware destruction bunny and I should pay for the great pains to keep my aura from anything expensive and important, like, you know, the SAN.
Right, yeah.
So, yeah, companies moving out of data centers and me getting out was a great thing.
But the thing about SANs, though, is like, it might not be you. They're just kind of cursed from the start the start right they just always were kind of fussy and easy to break oh yeah i used to think and i kid you not that i
had a limited upside to my career in tech because i sometimes got sloppy and i was fairly slow at
crimping ethernet cables that is very similar to growing up in third grade when it became apparent
that I was going to have problems in my career because my handwriting was sloppy.
Yeah, it turns out the future doesn't look like we predicted it would.
Oh gosh, are we going to talk about like neurological development now?
That's the thing I struggle with too, right? Is I started typing as soon as they would let,
in fact, before they would let me. I remember in high school,
I had teachers who would grade me down
for typing a paper out.
They wanted me to handwrite it,
and I would go, cool, go ahead and take a grade off,
because if I handwrite it,
you're going to take two grades off my handwriting.
So I'm cool with this deal.
Yeah, it was pretty easy early on.
I don't know when the actual shift was,
but it became more and more apparent
that more and more things are moving towards a world where you could type. And I was five when I started working
on that stuff. And that really wound up changing a lot of aspects of how I started seeing things.
One thing I think that you're probably fairly well known for is incidents. I want to be clear
when I say that. You are not the root cause.
So why are things broken? It's Amy again. What you gotten into this time?
It does happen, but not all the time.
Exactly. It's a learning experience.
Right.
You've also been deeply involved with SREcon and a lot of aspects of what I will term,
and please don't yell at me for this, SRE culture, which is sometimes a challenging thing
to wind up describing or putting a definition around. The one that I've always been somewhat
partial to is SRE is DevOps, except you work at Google for a while. I don't know how necessarily
accurate that is, but it does rile people up. Yeah, it does. Dave Stinke actually did a really
great talk at SRECon San Francisco
just a couple weeks ago about the DORA report. And the new DORA report, they split SRE out into
its own function and kind of is pushing against that old model, which actually comes from Liz
Fong-Jones. I think it's from her or older, about like class SRE implements DevOps, which is kind
of this idea that like SREs make DevOps happen. Things have evolved
since then. Things have evolved since Google released those books. The world has figured out
what works and what doesn't a little bit. And so it's not that we're implementing DevOps so much.
In fact, it's that ops stuff that kind of holds us back from the really high-impact work that SREs,
I think, should be doing that aren't just like fixing the problems, the symptoms
down at the bottom layer, like what we did at SysAdmin 20 years ago. You know, we go and a lot
of people are SREs that came out of the SysAdmin world and still think in that mode where it's like,
well, I set up the systems and when things break, I go and I fix them. And why do the developers
keep writing crappy code? Why do I have to always getting up in the middle of the night because
this thing crashed? And it turns out that the work we need to do to make things more reliable,
there's a ceiling to how far the platform can take us, right? Like we can have the best platform in
the world with redundancy and, you know, nine-way replicated data storage and all this crazy stuff.
And still, if we put crappy software on top, it's going to be unreliable. So how do we make less crappy software?
And for most of my career, people would be like, well, you should test it.
So we started doing that, and we still have crappy software.
So what's going on here?
We still have incidents.
So we write more tests, and we still have incidents.
We had a QA group.
We still have incidents.
We send the developers to training, and we still have incidents.
So what is the thing we need to do to make things more reliable? And it turns out that most of it is culture work. My perspective on this
stems from being a grumpy old sysadmin. And at some point I started calling myself a systems
engineer or DevOps or production engineer or SRE. It was all from my point of view, the same job.
But you know, if you call yourself a sysadmin, you're just asking for a 40% pay cut off the top. But I still tended to view the world through that lens. I tended to be
very good at Linux systems internals, for example, understanding system calls and the rest.
But increasingly, as the DevOps wave or SRE wave or Googlization of the internet wound up being
more and more of a thing. I found myself increasingly
in job interviews where, great, now can you go wind up implementing a sorting algorithm on the
whiteboard? It's what on earth? No, like my lingua franca is shitty bash. And no one tends to write
that without a bunch of tab completions and a quick checking with man pages, die.net or whatnot on the fly as you go down that path. And it was awful.
And I felt like my skillset was increasingly eroding.
And it wasn't honestly until I started this place
where I really got into writing a fair bit of code
to do different things
because it felt like an orthogonal skillset,
but in the fullness of time, it seems like it's not.
And it's a re-skilling and it made me wonder, does this mean in the fullness of time, it seems like it's not. And it's a re-skilling, and it made
me wonder, does this mean that the areas of technology that I focused on early in my career,
was that all a waste? And the answer is, not really. Sometimes, sure. In that I don't spend
nearly as much time worrying about iNotes, for example, as I once did. But every once in a while,
I'll run into something and I look like a wizard from the future, but instead I'm a wizard from
the past. Yeah, and I find that a lot in my work now is sometimes things I did 20 years ago come back
and it's like, oh yeah, I remember I did it. I did all that threading work in 2002 in Pearl and I
learned everything the very, very, very hard way. And then, you know, this January I did some work
and some threading work to fix some stability issues.
And all of it came flooding back, right?
Just the experience is really not more than the code or the learning or the text and stuff.
Then more of the just like this feels like thread.
Is a diagnostic thing that sometimes we have to say.
And the people are like, can you prove it?
And I'm like, not really, because it's literally thread f***ery.
Like the definition of it is that there's weird stuff happening that we can't figure out why it's happening.
There's something acting in the system that isn't synchronized, that isn't connected to other things.
It's happening out of order from what we expect.
And if we had a clear signal, we would just fix it.
But we don't.
We just have like weird stuff happening over here and then over there and then over there and over there.
And like that tells me there's just something happening
at that layer and then have to go and dig into that, right?
And like just basically charge through.
My colleagues are like, well, maybe we should look at this
and go look at the database,
the things that they're used to looking at
and that their experiences inform.
Whereas then I bring that ancient toiling
through the threading minds experiences back and go,
oh, yeah, so let's go find where this is happening, where people are doing dangerous things with threads,
and see if we can spot something.
But that came from that experience.
There's so much that just repeats itself, and history rhymes.
The challenge is that do you have 20 years of experience, or do you have one year of experience repeated 20 times? And as the tide rises, doing the same task by hand,
it really is just a matter of time before your full-time job winds up being something a piece
of software does. An easy example is, oh, what's your job? I manually place containers onto specific
hosts. Well, I've got news for you
and you're not gonna like it at all.
Yeah, yeah.
I think that we share a little bit.
I'm allergic to repeated work.
I don't know if allergic's the right word,
but if I sit and I do something once, fine.
Like I'll just crank it out.
It's this form or it's this data file I gotta write.
And I'll, fine, I'll type it in and do the manual labor.
The second time, the difficulty goes up by 10, right? Like just
mentally, I just, just to do it, be like, I've already done this once. Doing it again is an
anathema to everything that I am. And then sometimes I'll get through it. But after that,
like writing a program is so much easier because it's like exponential, almost growth and difficulty.
You know, the third time I have to do the same thing, that's like just typing the same stuff,
like look over here, read this thing and type it over here. I'm out. I can't do it. You know, the third time I have to do the same thing that's like just typing the same stuff, like look over here, read this thing and type it over here.
I'm out.
I can't do it.
You know, I got to find a way to automate.
And I don't know, maybe normal people aren't driven to live this way,
but it's kept me from getting stuck in those spots too.
It was weird because I spent a lot of time as a consultant going from place to place,
and it led to some weird changes.
For example, oh, thank God I don't have to think about about that whole messaging queue thing. Sure, next engagement, it's message
queue time. Fantastic. I found that repeating myself drove me nuts, but you also have to be
very sensitive not to wind up stealing IP from the people that you're working with.
But what I loved about the sysadmin side of the world is that the vast majority of stuff that I've taken with me lives
in my shell config. And by what
I mean by that is, there's nothing
in there that's proprietary, but
when you have a weird problem of trying to figure
out the best way to figure out which Ruby
process is stealing all the
CPU, great. Turns out that you can chain
seven or eight different shell commands together
through a bunch of pipes. I don't want
to remember that forever. So that's the sort of thing I would wind up committing a bunch of pipes. I don't want to remember that forever.
So that's the sort of thing I would wind up committing
as I learned it.
I don't remember what company I picked that up at,
but it was one of those things that was super helpful.
I have a sarcastic, it's a one-liner,
except no sane editor setting
is going to show it any less than three
of a whole bunch of Perl piped into DU,
piped into the rest,
that tells you what are the largest consumers of files in a given part of the system, and it rates them with stars, and it winds up
doing some neat stuff. I would never sit down and reinvent something like that today, but the fact
that it's there means that I can do all kinds of neat tricks when I need to. It's making sure that
as you move through your career on some level, you're picking up skills that are repeatable
and applicable beyond one company. Skills and tooling. Yeah. Right. Like you just described
a tool. Another SREcon talk was John Ospaugh and Dr. Richard Cook talking about above the line,
below the line. And they started with these metaphors about tools, right? Showing all the
different kinds of hammers. And if you're a blacksmith, a lot of times you craft specialized hammers for very specific jobs. And that's one of the properties of a tool that they
were trying to get people to think about, is that tools get crafted to the job. And what you just
described is a bespoke tool that you had created on the fly that kind of floated under the radar
of intellectual property. So let's not tell the security or IP people, right? Like, cause there's probably billions
and billions of dollars of technically
like made up IP value.
I'm doing air quotes with my fingers.
You know, that's just basically people's shell profiles
and my God, the Emacs automation that people have done.
If you've ever really seen somebody who's amazing at Emacs
and is 10, 20, 30, maybe 40 years of experience encoded in their
Emacs settings, it's a wonder to behold. I'd look at it and I'd go, man, I wish I could do that.
It's like listening to a really great guitar player and be like, wow, I wish I could play
like them. You see them just buying through stuff, but all that IP in there is both that person's
collection of wisdom and experience and working with that code,
but also encodes that stuff like you described, right? Which is all these little systems tricks
and little fiddly commands and things we don't want to remember. And so we encode them into our
tool set. Oh, yeah. Anything I wound up taking, I always would share it with people internally,
too. I mentioned, yeah, I'm keeping this in my shell files because I just closed it,
which solves a lot of the problem. And also, none of it was even close to proprietary or anything like that. I'm sorry, but the way that you wind up
figuring out how much of a disk is being eaten up and where in a more pleasing way is not a
competitive advantage. It just isn't. It isn't to you or me, but back at the beginning of our
careers, people thought it was worth money and should be proprietary.
Like, oh, that disk checking script is a competitive advantage for our company because there were only a few of us doing this work.
It was actually being able to actually manage your servers was a competitive advantage.
Now it's kind of commodity.
Let's also be clear that the world has moved on.
I wound up buying a Daisy disc a while back for Mac,
which I love.
It is a fantastic,
pretty effective.
Where's all the stuff on your disc going?
And it does a scan and you can drive and collect things and delete them when
you're trying to clean things out of using it the other day.
So it's top of mind at the moment,
but it's way more polished than that crappy Pearl three liner.
And I see,
I see both sides.
Truly.
I do the The trick also,
for those wondering in their own career, like, where is the line? It's super easy. Disclose it,
what you're doing in those scenarios. In the event someone is no, because they believe that
finding the right man page section for something is somehow proprietary, great. When you go home
that evening in a completely separate environment, build it yourself from scratch to solve the
problem, reimplement it, and save that, and you're done. There are lots of ways to do this.
Don't steal from your employer, but your employer employs you. They don't own you.
And the way that you think about these problems, every person I've met who has had a career that's
longer than 20 minutes has a giant doc somewhere on some system of all of the scripts that they
wound up putting together,
all of the one-liners, the notes on. Next time you see this, this is the thing to check.
Yeah. The cheat sheet or the notebook with all the little commands or, again,
the Emacs config sometimes for some people or shell profiles.
Here's the awk one-liner that I put that automatically spits out from an Apache log file.
What, sorry, HTTPD log file that just tells me
what are the most frequent talkers and what are the-
You should probably let go of that one.
You know, like, I think that one's lifetime is kind of past, Corey.
Maybe you just-
I just have to get to work with Nginx and we're good to go.
Oh yeah, there you go.
Or S3 access logs, perish the thought.
But yeah, like, what are the five most high volume talkers
and what are those relatives to each other?
Huh, that one thing seems super crappy
and it's coming from Russia,
but that's, hmm, one starts to wonder,
maybe it's time to dig back in.
So one of the things that I have found
is that a lot of the people talking about SRE
seem to have descended from an ivory tower somewhere.
And they're talking about how some of the best in class companies out there,
renowned for their technical cultures, at least externally, are doing these things.
But there's a lot more folks who are not there.
And honestly, I consider myself one of those people who is not there.
I was a competent engineer, but never a terrific one.
And looking at the way this was described, I often came away thinking, okay, it was the purpose of this conference talk,
just to reinforce how smart people are and how I'm not. And or, well, there are the 18 cultural
changes you need to make to your company, and then you can do something kind of like we were
just talking about on stage. It feels like there's a combination of problems here. One
is making this stuff more accessible to folks who are not themselves in those environments.
And two, how to drive cultural change as an individual contributor, if that's even possible.
And I'm going to go out on a limb and guess you have thoughts on both aspects of that,
and probably some more. Hit me, please. So the ivory tower, right?
Let's just be straight up.
Like the ivory tower is Google.
I mean, that's where it started.
We get it from the other large companies
that want to do conference talks
about what this stuff all means and what it does.
What I've kind of come around to
in the last couple of years is that
those talks don't really reach
the vast majority of engineers.
They don't really apply to a large swath of the enterprise, especially, which is like
where a lot of the bulk of our industry sits, right?
We spend a lot of time talking about the darlings out here on the West Coast and in high tech
culture and startups and so on.
But like we were talking about before we started the show, right?
Like the interior of even just America is filled with all these like insurance and banks and all of these companies that are cranking out tons of code and servers and stuff.
And they're trying to figure out these same problems.
But they're structured in companies where their tech arm is still, in most cases, considered a cost center.
Often is bundled under finance for that's a whole show
of itself about that historical blunder and so the tech cultures tend to be very very different from
what we experience in what do we call it anymore like i don't even want to say west coast anymore
because we've gone remote but like high tech culture we'll say and so like thinking about
how to make sre and all this stuff more accessible comes down to like thinking about what the who those engineers are that are sitting at the computers writing all the code that runs our banks, all the code that makes sure that I'm trying to think of examples that are more enterprisey, right?
Or shoot, buying clothes online.
You go to Macy's, for example.
They have a whole bunch of servers that run their online store and stuff. They have internal IT-ish people who keep all this stuff running and write that code
and probably integrating open source stuff, much like we all do.
But when you go to try to put in a reliability program that's based on the current SRE models,
like SLOs, you put in SLOs and you start doing this incident management program that's like,
you have a form you fill out after every incident and then you make developers write retros.
And it turns out that those things are very high level skills, skills and capabilities
in an organization.
And so when you have this kind of IT mindset or the enterprise mindset, bringing the culture
together to make those things work often doesn't happen because, you know, they'll go with
the prescriptive model and say like, OK, we're going to implement SLOs. We're going to start measuring SLIs on
all of the services, and we're going to hold you accountable for meeting those targets.
If you just do that, you're just doing more gatekeeping and policing of your tech environment,
my bet is reliability almost never improves in those cases. That's been my experience too,
why I get charged up about
this is if you just go slam in these practices, people end up miserable. The practices then
become tarnished because people experience the worst version of them. And then...
With the remote explosion as well, it turns out that changing jobs basically means that
your company sends you a different Mac and the next Monday you wind up signing into a different
Slack team. Yeah. So the culture really matters, right? You can't cover it over with foosball tables
and great lunch. You actually have to deliver tools that developers want to use. And you have
to deliver a software engineering culture that brings out the best in developers instead of
demanding the best from developers. I think that's a fundamental business shift that's kind of
happening. That's if I'm putting on my wizard hat
and looking into the future
and dreaming about what might change in the world, right?
Is that there's kind of a change in how we do leadership
and how we do business
that's shifting more towards that model
where we look at what people are capable of
and we trust in our people
and we get more out of them, the knowledge work model.
If we want more knowledge
work, we need people to be happy and to feel engaged in their community. And all of a sudden,
we start to see these kind of generational, bigger pie kind of things start to happen.
But how do we get there? It's not SLOs. It maybe is a little bit starting with incidents. That's
where I've had the most success. And you asked me about that. So getting practical,
incident management is probably-
Right, well, as I see it,
the problem with SLOs across the board
is it feels like it's a very insular community so far.
And communicating it to engineers
seems to be the focus of where the community has been.
But from my understanding of it,
you absolutely need buy-in
at significantly high executive levels
to, at the very least, buy you air cover
while you're doing these things and
making these changes, but also to help drive that cultural shift. None of this is something I have
the slightest clue how to do. Let's be very clear. If I knew how to change a company's culture,
I'd have a different job. Yeah. The biggest omission in the Google SRE books was Erz.
There's a guy at Google named Erz who owns availability for Google. And when anything
is like in dispute and bubbles up the management chain, it goes to Erz and he says, thou shalt,
right? Makes the call. And that's why it works, right? Like that's, it's not just that one person,
but that system of management where the whole leadership team, there's a large,
very well-funded team with a lot of power in that organization
that can drive availability. And they can say, this is how you're going to do metrics for your
service. And this is the system that you're in. And it's kind of, yeah, sure, it works for them
because they have all the organizational support in place. What I was saying to my team just the
other day, because we're in the middle of our SLO rollout, is that really, I think an SLO program isn't about the engineers at all until late in the game.
At the beginning of the game,
it's really about getting the leadership team on board
to say, hey, we want to put in SLIs and SLOs
to start to understand the functioning
of our software system.
But if they don't have that curiosity in the first place,
that desire to understand how well their teams are doing,
how healthy their teams are, don't do it.
It's not going to work.
It's just going to make everyone miserable.
It feels like it's one of those difficult to sell problems as well, in that it requires some tooling changes, absolutely.
It requires cultural change and buy-in and whatnot.
But in order for that to happen, there has to be a painful problem that a company recognizes and is willing to pay to make go away.
The problem with stuff like this is that once you pay, there's a lot of extra work that goes on top of it as well that does not have a perception, rightly or wrongly, of contributing to feature velocity, of hitting the next milestone.
It's really, so we're going to be spending how much money to make engineers happier? They should get paid an awful lot and they're
still complaining and never seem happy. Why do I care if they're happy other than the pure mercenary
perspective? Otherwise they'll quit. I'm not saying that it's not worth pursuing. It's not a worthy
goal. I am saying that it becomes a very difficult thing to wind up selling as a product.
Well, as a product, for sure, right? Because, gosh, I have friends in this space who work on
these tools, and I want to be careful. Of course. Nothing but love for all of those people, let's be
very clear. But a lot of them, you know, they're pulling metrics from existing monitoring systems.
They are doing some interesting math on them. But what you get at the end is a nice service catalog and dashboard, which are things we've been trying to land as products in this industry for as long as I can remember.
And we've got it this time, though.
This time we'll crack it up.
Yeah.
Get off the island, Gilligan.
And then the other risky thing, right, is the other part that makes me uncomfortable about SLOs and why I will often tell folks that I talk to out in the industry
that are asking me about this, like one-on-one,
should I do it here?
And it's like, you can bring the tool in.
And if you have a management team
that's just looking to have metrics to drive productivity
instead of trying to drive better knowledge work,
what you get is just a fancier version of more Taylorism,
which is basically scientific management,
this idea that we can like drive workers to maximum efficiency by measuring random things
about them and driving those numbers. It turns out that doesn't really work very well, even in
industrial scale. It just happened to work because, you know, we have a bloody enough society that we
push people into it. But the reality is, is if you implement SLOs badly, you get more really bad Taylorism
that's bad for your developers. And my suspicion is that you will get worse availability out of it
than you would if you just didn't do it at all. This episode is sponsored by our friends at
Revelo. Revelo is the Spanish word of the day, and it's spelled R-E-V-E-L-O. It means I reveal. Now, have you tried to hire an engineer
lately? I assure you it is significantly harder than it sounds. One of the things that Ravello
has recognized is something I've been talking about for a while, specifically that while talent
is evenly distributed, opportunity is absolutely not. They're exposing a new talent pool to
basically those of us without a presence in Latin America via their platform. It's the largest tech
talent marketplace in Latin America with over a million engineers in their network, which includes
but isn't limited to talent in Mexico, Costa Rica, Brazil, and Argentina. Now, not only do they wind up spreading all of
their talent on English ability as well as, you know, their engineering skills, but they go
significantly beyond that. Some of the folks on their platform are hands down the most talented
engineers that I've ever spoken to. Let's also not forget that Latin America has high time zone
overlap with what we have here in the United States.
So you can hire full-time remote engineers who share most of the workday as your team.
It's an end-to-end talent service, so you can find and hire engineers in Central and South America without having to worry about, frankly, the colossal pain of cross-border payroll and benefits and compliance because Revelo handles
all of it. If you're hiring engineers, check out revelo.io slash screaming to get 20% off your
first three months. That's R-E-V-E-L-O dot I-O slash screaming. That is part of the problem is
in some cases to drive some of these improvements, you have to go backwards to move forwards. And it's one of those great, so we spent all this effort and money and the rest and now things are by Gene Kim, has been the fact that companies
had these problems and actively cared enough to change it. In my experience, that feels a little
on the rare side. Yeah. And I think that's actually the key, right? Is for the culture
change and for like, if you're really looking to be like, do I want to work at this company?
Am I investing myself in here? Is look at the leadership team and be like, do these people
actually give a crap?
Are they looking just to punt another number down the road?
That's the real question, right?
Like the technology and stuff,
at the point where I'm at my career,
I just don't care that much anymore.
I just, fine, use Kubernetes, use Postgres, MySQL.
I don't care.
I just don't.
Like Oracle, I might have to ask, you know, go to finance and be like,
hey, can we spend 20 million for a database?
But like, nobody really asks for that anymore. So. As with us, I will say that I mostly
agree with you, but a technology that I found myself getting excited about, given the time of
the recording on this is fun. I spent a bit of time yesterday from when we're recording this,
teaching myself just enough go to wind up beating together a binary
that I needed to do something
actively ridiculous for my camera here.
And I found myself coming away
deeply impressed by a lot of things about it,
how prescriptive it was for one,
how self-contained for another.
And after spending far too many years of my life
writing shitty Perl and shitty Bash
and worse Python, et cetera, et etc., the prescriptiveness was great.
The fact that it wound up giving me something I could just run, I could cross-compile for anything I needed to run it on, and it just worked.
It's been a while since I found a technology that got me this interested in exploring further.
Go is great for that.
You mentioned one of my two favorite features of Go.
One is usually when a program compiles, at least the way I code in Go, it usually works.
I've been working with Go since about 0.9, just a little bit before it was released as 1.0.
And that's what I've noticed over the years of working with it, is that most of the time,
if you have a pretty good data structure design and you get the code to compile,
usually it's going to work, unless you're doing weird stuff. The other thing I really love about Go and that maybe you'll
discover over time is the malleability of it. And the reason why I think about that more than
probably most folks is that I work on other people's code most of the time. And maybe this
is something that you probably run into with your business too, right? Where you're working on other
people's infrastructure. And the way that we encode business rules and things in the languages, in our programming language or our config syntax and
stuff, has a huge impact on folks like us and how quickly we can come into a situation, assess,
figure out what's going on, figure out where things are laid out, and start making changes
with confidence. Forget other people for a minute there. Looking at what I built out three or four years ago here myself, like I look at past
me, it's like, what was that rat bastard thinking?
This is awful.
And it's forget other people's code.
Hell is your own code on some level too.
Once you've, once it's slipped out of the mental stack and you have to re-explore it
and, oh, well, thank God I defensively wound up not including any comments whatsoever explaining
what the living hell this thing was.
It's terrible.
But you're right.
The other people's shell scripts are finicky and odd.
I started poking around for help when I got stuck on something by looking at GitHub and a few bit of searching here and there.
Even these large, complex, well-used projects started making sense to me in a way that I very rarely find. It's,
what the hell is that thing is my most common refrain when I'm looking at other people's code
and Go, for whatever reason, avoids that. I think because it is so prescriptive about formatting,
about how things should be done, about the vision that it has. Maybe I'm romanticizing it and I'll
hate it in a week from now and I'll want to go back and remove this recording.
The size of the language helps a lot, but probably my favorite, it's more of a convention,
which is actually funny the way I'm going to talk about this, because the two languages I work on
the most right now are Ruby and Go. And I don't feel like two languages could really be more
different. Syntax-wise, they share some things, but really like the mental models are so very,
very different. Ruby is all the way in on object-oriented programming and the actual real
kind of object-oriented with messaging and stuff. And the whole language kind of springs from that.
And it kind of requires you to understand all of these concepts very deeply to be effective
in large programs. So what I find is when I approach a Ruby code base, I have to load all
this crap into my head and remember, okay, so yeah, there's this convention when you do this kind of thing in Ruby, or especially Ruby on Rails is even worse because
they go deep into convention over configuration. But what that's code for is this code is accessible
to people who have a lot of free cognitive capacity to load all this convention into
their heads and keep it in their heads so that the code looks pretty. Right. And so that's the trade-off is you've said, okay, my developers have to be these people
with all these spare brain cycles to understand like why I would put the code here in this
place versus this place.
And all these like things that are in the code, like very compact, dense concepts.
And then you go to something like go which is like nah we're not
going to do lambdas nah we're not doing all this fancy stuff so everything is there on the page
this drives some people crazy right is that there's all this boilerplate boilerplate boilerplate but
the reality is i can read most go files from top to the bottom and understand what the hell it's
doing whereas i can go sometimes look at like a Ruby thing or sometimes Python and even Perl
is just common all the time, right?
Is there so much indirection?
And it'd just be like, what the fuck is going on?
This is so dense.
I'm gonna have to sit down and write it out in longhand
so I can understand what the developer was even doing here.
And-
Well, that's why I got the Mac Studio
for when I'm not doing AV stuff with it.
That means that I'll have one core
that I can use for front-end processing and the rest, and the other
19 cores can be put to work, failing
to build Nokogiri and Ruby yet again.
I remember the
travails of working with Ruby, and the
problem, I have similar problems with Python, specifically,
in that, I don't know if I'm
special like this. It feels like it's a
SRE, DevOps
style of working, but I am
grabbing random crap off of GitHub constantly and running
it like small scripts other people have built. And let's be clear, I run them on my test AWS
account that is nothing important because I'm not a fool and I read most of it before I run it.
But I also, it wants a different version of Python every single time. It wants a whole
bunch of other things too. And okay, so I use ASDF as
my version manager for these things, which for whatever reason does not work for the way that
I think about this ergonomically. Okay, great. And I wind up with detritus scattered throughout
my system. It's, hey, can you make this reproducible on my machine? Almost certainly
not, but thank you for asking. It's like step 17, master the wolf level of instructions. And I think Docker
generally papers over the worst of it, right? Is when we built all this stuff in the aughts,
you know, CPAN. Dev containers and VS code are very nice. Yeah, yeah. You know, like we had CPAN
back in the day. I was doing cheroots, I think in like 04 or 05, you know, to solve this problem,
right? Which is basically, I just screw it. I will compile an entire distro into a directory
with a Perl and all of its dependencies
so that I can isolate it from the other things
I want to run on this machine
and not screw up and not have these interactions.
And I think that's kind of what you're talking about
is like the old model, when we deployed servers,
there was one of us sitting there
and we'd log into the server and be like,
okay, I'm going to install the Perl.
I'll, you know. I'll compile it into
slash ops, slash Perl, 5.8,
whatever. And then I'll
cpan all the stuff in. I'll give it over to the
developer, tell him to set the shebang to that, and everything
just works. And now we're in a mode
where it's like, okay, you've got to set up
a thousand of those. Okay,
well, I'll make a tarball.
But it's still like...
DevOps is about making the dev closer to ops.
You're interrelating all the time.
Yeah, and then Docker comes along
and dev's like, well, here's the container.
Good luck, asshole.
And it feels like it's been cast
into your yard to worry about.
Yeah, well, I mean, that's just kind of business
or just, I'm not sure if it's business
or capitalism or something like that,
but just the idea that, you know,
if I can hand off the shitty work to some other poor schlub, why wouldn't I?
I mean, that's most folks, right?
Like, just be like, well, I got it working.
Like, my part is done.
I did what I was supposed to do.
And now there's a lot of folks out there.
That's how they work, right?
I hit done.
I'm done.
I shipped it.
Sure, it's an old-ass Ubuntu.
Sure, there's a bunch of shell scripts that rip through things.
Sure, you know, like, I've worked on repos
where there's hundreds of things that need to be addressed.
And passing it to someone else is fine.
I'm thrilled to do it.
Where I run into problems with it is where people assume that,
well, my part was the hard part,
and anything you schlubs do is easy.
Well, that's the underclass.
Forget engineering for a second.
I throw things to the people over in the finance group here at the duckbill group because those people
are wizards at solving for this thing and it's that's how we want to do things yeah specialization
works but we have this it's probably more cultural i want to pick like capitalism to beat on because
this is really like human cultural thing and it's not even really particularly western is the the idea that like if i have an underclass why would i give a shit what their
experience is and this this is why i say like ops teams like get out of here because most ops teams
the extant ops team that are still called ops and a lot of them been renamed sre but they still do
the same job are an underclass and i don't mean that those people are below us. People are treated
as an underclass and they shouldn't be. Absolutely not. Because the idea is that, well, I'm a fancy
person who writes code up my ivory tower and then it all flows down and those people, just faceless
people, do the deployment stuff that's beneath me. Bad attitude is the most toxic thing, I think,
in tech orgs to address.
Like if you're trying to be like, well, our reliability is bad. We have security problems.
People won't fix their code. And go look around and you will find people that are treated as an
underclass that are given codes thrown over the wall at them. And then they just have to toil
through and make it work. I've worked on that a number of times in my career. And I think just
like saying underclass,
right, or a cast system is what I found is the most effective way to get people actually thinking about what the hell is going on here. Because most people are just like, well, this is just
the way things are. This is how we've always done it. The developers write the code, they give it
to the sysadmins, the sysadmins deploy the code. Isn't that how it always works? you'd really like to hope wouldn't you not me again the way i see it is in theory
in theory sysadmins ops all that should not exist people should theoretically be able to write code
as developers that just works the end and the right correct the first time and never have to
change it again yeah they there's a reason that i always like to call staging environments in
places i work theory because it works in theory, but not in production. And that is fundamentally, that entire job role is the difference between theory and practice. over multiple strands of glass and digital transcodings and things right now, right?
Like we are detached from the physical reality.
You mentioned earlier working in data centers, right?
The thing I miss about it is like the physicality of it.
Like actually like I held a server in my arms and put it in the rack and slid it into the rails.
I plugged in the power myself.
I pushed the power button myself.
There's a server there.
I physically touched it.
Developers who don't work in production, we talk about empathy and stuff, but really I think the big problem is when they work out in their idea space and just writing code, they write their unit tests.
If we're very lucky, they'll write a functional test, and then they hand that WOD off to some poor ops group.
They're detached from the reality of operations.
It's not even about accountability.
It's about experience.
The ability to see all of the weird crap we do with, right?
You know, like, well, we pushed the code to that server,
but there were three bit flips, so we had to do it again.
And then the other server, the disk failed.
And on the other server, you know,
there's all this weird crap that happens.
These systems are so complex that they're always doing something weird.
And if you're a developer that just spends
all day in your IDE, you don't get to see that. And I can't really be mad at those folks as
individuals for not understanding our world. I have to figure out how to help them. And the best
thing we've come up with so far is like, well, we start giving them some responsibility in the
production environment so that they can learn that. People do that again is another one that
can be done wrong where it turns into kind of a forced empathy.
I actually really hate that mode
where it's like,
we're forcing all the developers online,
whether they like it or not,
you know, on call,
whether they like it or not,
because they have to learn this.
And it's like, you know,
maybe slow your roll, little buddy,
because the stuff is actually hard to learn.
Again, minimizing how hard ops work is,
oh, we'll just put the developers on it.
They'll figure it out, right?
They're software engineers.
They're probably smarter than you sysadmins is the unstated thing when we do that, right?
When we throw them in the pit and be like, yeah, they'll get it.
And that was my problem with being asked to do the interview stuff.
It was in the right code on a whiteboard.
It's, look, I understood how the system fundamentally worked under the hood.
Being able to power my way through to get to an outcome, even in a language I don't know, is sort of part and
parcel of the job. But this idea of doing it in an artificially constrained environment in a language
I'm not super familiar with off the top of my head, it took me years to get to a point of being
able to do it with a bash script because whoever starts with an empty editor and starts getting to
work in a lot of these scenarios, especially in an ops world where we're not building something from scratch.
That's the interesting thing, right? In the majority of tech work today,
maybe 20 years ago, we did it more because we were literally building the internet we have today.
But today, most of the engineers out there working, most of us working staffs,
are working on stuff that already exists. We're making small incremental changes,
which is great if that's what we're doing. And we're dealing with old code.
We're gluing APIs together, and that's fine.
I really want to thank you for taking so much time to talk to me about how you see all these things.
If people want to learn more about what you're up to, where's the best place to find you?
I'm on Twitter every once in a while as MissAmyToby.
M-I-S-S-A-M-Y-T-O-B-E-Y.
I have a blog I don't write on enough,
and there's a couple things on the Equinix Metal blog
that I've written.
So if you're looking for that, otherwise, mainly Twitter.
And those links will, of course, be in the show notes.
Thank you so much for your time.
I appreciate it.
I had fun.
Thank you.
As did I.
Amy Tobey, Senior Principal Engineer at Equinix.
I'm cloud economist Corey Quinn,
and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star
review on your podcast platform of choice or on the YouTube's smash the like and subscribe buttons
as the kids say. Whereas if you've hated this episode, same thing, five-star review, all the
platforms, smash the buttons, but also include an angry comment telling me that
you're about to wind up subpoenaing a copy of my shell script because you're convinced that
your intellectual property and secrets are buried within. If your AWS bill keeps rising and your
blood pressure is doing the same, then you need the Duck Bill Group. We help companies fix their AWS bill
by making it smaller and less horrifying. The Duck Bill Group works for you, not AWS. We tailor
recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
This has been a HumblePod production.
Stay humble.