Screaming in the Cloud - The Controversy of Cloud Repatriation With Amy Tobey of Equinix
Episode Date: September 27, 2022About AmyAmy Tobey has worked in tech for more than 20 years at companies of every size, working with everything from kernel code to user interfaces. These days she spends her time building a...n innovative Site Reliability Engineering program at Equinix, where she is a principal engineer. When she's not working, she can be found with her nose in a book, watching anime with her son, making noise with electronics, or doing yoga poses in the sun.Links Referenced:Equinix: https://metal.equinix.comTwitter: https://twitter.com/MissAmyTobey
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
This episode is sponsored in part by our friends at AWS AppConfig.
Engineers love to solve and occasionally create problems,
but not when it's an on-call fire drill at four in the morning.
Software problems should drive innovation and collaboration,
not stress and sleeplessness and threats of violence.
That's why so many developers are realizing the value of AWS AppConfig feature flags.
Feature flags let developers push code to production,
but hide that feature from customers so that the developers can release their feature when it's ready.
This practice allows for safe, fast, and convenient software development.
You can seamlessly incorporate AppConfig feature flags into your AWS or cloud environment and ship your features with excitement, not trepidation and fear.
To get started, go to snark.cloud slash appconfig.
That's snark.cloud slash appconfig.
I come bearing ill tidings.
Developers are responsible for more than ever these days.
Not just the code that they write,
but also the containers and the cloud infrastructure
that their apps run on because serverless means it's still somebody's problem.
And a big part of that responsibility is app security from code to cloud.
And that's where our friend Snyk comes in.
Snyk is a frictionless security platform that meets developers where they are, finding and fixing vulnerabilities right from the CLI, IDEs, repos, and pipelines.
Snyk integrates seamlessly with AWS offerings like CodePipeline, EKS, ECR, and more, as well
as things you're likely to actually be using. Deploy on-n-y-k dot c-o slash scream.
Welcome to Screaming in the Cloud. I'm Corey Quinn, and this episode is another one of those
real profiles in shitposting type of episodes. I am joined again from a few months ago by Amy
Tobey, who is a senior principal engineer at Equinix, back for more. Amy,
thank you so much for joining me. Welcome to your show.
Exactly. So one thing that we have been seeing a lot of over the past year, and you struck me as
one of the best people to talk about a what you are seeing in the wilderness perspective
has been the idea of cloud repatriation. It started off with something that came out of
Andreessen Horowitz toward the start of the year about the trillion dollar paradox, how at a certain
point of scale, repatriating to a data center is the smart and right move. And oh my stars,
did that ruffle some feathers for people.
Well, I spent all this money moving to the cloud.
That was just mean.
I know. Why would I want to leave the cloud?
I mean, for God's sake, my account manager named his kid after me.
Wait a minute. How much am I spending on that?
Yeah, there is that ever-growing problem.
And there have been the examples that people have given of Dropbox
classically did a cloud repatriation exercise and a second example that no one can ever name.
And it seems like, okay, this might not necessarily be the direction that the industry is going.
But I also tend to not be completely naive when it comes to these things.
And I can see repatriation making sense
on a workload-by-workload basis.
What that implies is that, yeah,
but a lot of other workloads
are not going to be going to a data center.
They're going to stay in a cloud provider
who would like very much,
if you never breathe a word of this
to anyone in public.
So if there are workloads repatriating,
it would occur to me that there's a vested interest
on the part of every major cloud provider to do their best to, I don't know if saying suppress the story is too strongly worded, but it is directionally what I mean.
They aren't helping get the story out.
Yeah, that's a great observation.
Could you maybe shut the hell up and never make it ever again in public or we will end you?
Yeah, it's your Amazon.
What are you going to do?
Launch a shitty Amazon Basics version of what my company does?
Good luck.
Have fun.
You're probably doing it already.
But the reason I want to talk to you on this is a confluence of a few things.
One, as I mentioned back in May when you were on the show, I am incensed and annoyed that we've been talking for as long as we have.
And somehow I never had you on the show.
So, great.
Come back, please. You're always welcome here. Secondly, you work at Equinix,
which is effectively, let's be relatively direct, it is functionally a data center as far as how
people wind up contextualizing this. Yes, you have- Yeah, I guess people contextualize it that way,
but we'll get into that. Yeah, from the outside. I don't work there, to be clear. My talking points don't exist for this.
But I think, oh, Equinix, oh, that means you basically have a colo or colo equivalent.
The pricing dynamics are radically different. It looks a lot closer to a data center, in my imagination, than it does a traditional public cloud.
I would also argue that if someone migrates from AWS to Equinix, that would be viewed, arguably correctly, as something of a repatriation. Is that
directionally correct? I would argue incorrectly for metal, right? So Equinix is a data center
company, right? Like that's what everybody knows this as. Equinix Metal is a bare metal
primitive service, right? So it's a lot more of a cloud workflow, right? Except that you're not
getting the rich services that you get in a technically full cloud, right? Like there's no
RDS, there's no S3 even. What you get is bare metal primitives, right? With a really fast network.
Are you really a cloud provider without some ridiculous machine learning powered service
that's going to wind up
taking pictures, perform incredibly expensive operations on it, and then return something
that's more than a little racist? I mean, come on. You're not a cloud until you can do that, right?
We can do that. We have customers that do that. Well, not specifically that.
But they have to build it themselves. You don't have the high-level managed service that basically
serves as what is functionally bias laundering. Yeah, you don't get it in a box, right? So a lot of our customers are doing things that are
unique, right? That are maybe not exactly fit into the cloud well. And it comes back down to
a lot of Equinix's roots, which is, we talk about going to the cloud and it's this kind of
abstract environment we're reaching for, you know, up in the sky. And it's like, we don't know where
it is, except we have regions that, okay, so it's in Virginia. But the rule of real estate applies to technology
as often as not, which is location, location, location, right? When we're talking about a lot
of applications, a challenge that we face, say, in gaming, is that the latency from the customer, that last mile to your data center,
can often be extremely important, right?
So a few milliseconds even.
And a lot of like SaaS applications,
the typical stuff that really the cloud was built on,
10 milliseconds, 50 milliseconds,
nobody's really going to notice that, right?
But in a gaming environment
or some very high or low latency application
that needs to run extremely close to the customer, it's hard to do that in the cloud they're building this stuff out right like i see
it you know the different ones built opening new regions but you know there's this other side of
the cloud which is like the edge computing thing that's coming alive and that's more where i think
about it and again location location location the speed of light is really fast but as most of us in
techno if you want to go across from the East Coast
to the West Coast,
you're talking about 80 milliseconds on average, right?
I think that's what it is.
I haven't checked in a while.
You know, that's just basic fundamental speed of light.
And so if everything's in US East 1,
and this is why we do multi-regions sometimes,
the latency from the West Coast isn't going to be great.
And so we run applications-
It has improved though.
You want to talk old school, things that are seared into my brain from over 20 years ago,
every person who's worked in data centers or in technology as a general rule has a few IP
addresses seared upon their soul. And the one that I've always had on my mind was 130.111.32.11.
Kind of arbitrary and ridiculous, but it was one of the two recursive resolvers provided at the University of Maine, where I had my first help desk job.
And it lives on-prem in Maine.
And generally speaking, I tended to always accept that no matter where I was,
unless I was in a data center somewhere, it was about 120 milliseconds.
And I just checked now, it is 85 and change from where I am in San Francisco.
So the internet or the speed of light have improved.
So good for whichever one of those it was.
But yeah, you've just updated my understanding of these things.
All this wishes to say, yes, latency is very important.
Right.
Let's forget perpetuation.
To be really honest, even the Dropbox case or any of them, right?
There's an economic story here that I think all of us that have been doing cloud work
for a while see pretty clearly that maybe not everybody's seeing that's thinking from
an on-prem kind of situation, which is that, and I know you do this all the time, right?
Is you don't just look at the cost of the data center and the servers and the network,
the technical components, the bill of materials. Oh, lies, damn lies, and TCO analyses.
But there's all these people on top of it and the organizational complexity and the contracts that
you've got to manage. And it's this big, huge operation that is incredibly complex to do well that is almost nobody's business.
So the way I look at this, right, and the way I even talk to customers about it is like,
what is your product? And I talk to people internally about this way. It's like, what are
you trying to build? Well, I want to build a SaaS, okay? Do you need data center expertise to build
a SaaS? No. Why the hell are you putting it in a data center? Like, you know, I'm speaking for my employer, right? Like, we have EconX Metal right here.
You can build on that. You don't have to do all the most complex part of this,
at least in terms of like the physical plant, right? Like getting a bare metal server available.
We take care of all of that. Even at the primitive level where we sit, it's higher level than, say, Colo. There's also the question of economics as it
ties into it. It's never just a raw cost of materials type of approach. Like my original
job in a data center was basically to walk around and replace hard drives and apparently to insult
people. Now the cloud has taken one of those two aspects away and you can follow my Twitter
account and figure out which one of those two it is,
but what I keep seeing now is that there's value to having that task done,
but in a cloud environment,
and an Equinix model, let's be clear,
that has slipped below the surface level of awareness.
And, well, what are the economic implications of that?
Well, okay, you have a whole team of people
at large companies whose job it is to do precisely that.
Okay, we're going to upskill them and train them to use cloud.
Okay, first, not everyone is going to be capable or willing to make that leap from hard drive replacement to congratulations and welcome to JavaScript.
You're about to hate everything that comes next. And if they do make that leap, their baseline market value,
by which I mean what the market is willing to pay for them,
approximately will double.
And whether they wind up being paid more
by their current employer
or they take a job somewhere else with those skills
and get paid what they are worth,
the company still has that economic problem.
Like it or not, you will generally get what you pay for
whether you want to or not. you will generally get what you pay for,
whether you want to or not. That is the reality of it. And as companies are thinking about this,
well, what gets into the TCO analysis and what doesn't, I have yet to see one where the outcome was not predetermined. They're less, let's figure out in good faith, whether it's going to be more
expensive to move to the cloud or move out of the cloud or just burn the building down for insurance money.
The outcome is generally the one that the person who commissioned the TCO analysis wants.
So when a vendor is trying to get you to switch to them and they do one for you, yeah.
I'm not saying they're lying, but there's so much judgment that goes into this.
And what do you include?
What do you not include?
That's hard.
And there's so many hidden costs and that's one of those things that i love about working at a
cloud provider is that i still get to play with all that stuff and like i get to see those hidden
costs right like you were talking about the the person who goes around and swaps out the hard
drives or early in my career right i worked with someone whose job it was just every day
she would go in the data center she'd swap out out the tapes, you know, and do a few things other around and like take care of the billing system.
And that was a job where it was kind of going around and stewarding a whole bunch of things that kind of kept the whole machine running.
But most people outside of being right next to the data center didn't have any idea that stuff even happened, right, that went into it. And so like you were saying, like when you go to do the TCO analysis, I mean, I've been through this a couple of times
prior in my career where people will look at it
and be like, well, of course we're not going to list.
We'll put like two headcount on there.
And it's always a lie
because it's never just two headcount.
It's never just the network person or the SRE
or the person who's racking the servers.
It's also like finance has to do all this extra work.
And there's all the logistic work.
And there is just so much stuff that just is really hard to include.
Not only do people leave it out, but it's also just really hard for people to grapple with the complexity of all the things it takes to run a data center, which is like one of the most complex machines on the planet.
Any single data center. I've worked in small-scale environments, even a couple of mid-sized ones, but never the type of hyperscale facility
that you folks have,
which I would say is,
if it's not hyperscale,
it's at least directionally close to it.
We're talking thousands of servers
and hundreds of racks.
I've started getting into that on some level.
I guess when we say hyperscale,
we're talking about AWS-sized things
where, oh, that's a region,
and it's going to have three dozen data center facilities in it. Yeah, I don't work in places like that because,
honestly, have you met me? Would you trust me around something that's that critical to
infrastructure? No, you would not unless you have terrible judgment, which means you should not be
working in those environments to begin with. I mean, you're like a walking chaos exercise.
Maybe I would let you in. Oh, I bring my hardware destruction aura near
anything expensive and things are terrible. It's awful. But as I look at the cloud, regardless of
cloud, there is another economic element that I think is underappreciated. And to be fair, this
does, I believe, apply as much to Equinix Metal as it does to the public hyperscale cloud providers that have problems with naming
things well. And that is when you are provisioning something as a customer of one of these places,
you have an unbounded growth problem. When you're in a data center, you are not going to just
absentmindedly sign an $8 million purchase order for new servers, you know, a second time. And
that means you're eventually going to run a power space, places to put things, and you have to go find it somewhere.
Whereas in cloud, the only limit is basically your budget,
where there is no forcing function that reminds you to go and clean up that experiment from five years ago.
You have people with three petabytes of data they were using for a project,
but they haven't worked there in five years and nothing's touched it since.
Because the failure mode of deleting things that are important or disastrous... That's why Glacier exists.
Oh, exactly. But the failure mode of deleting things that should not be deleted are disastrous
for a company. Whereas if you leave them there, well, it's only money. And there's no forcing
function to do that, which means you have this infinite growth problem with no natural limit
slash predator around it. And that is the economic analysis that I do not see playing out basically
anywhere because, oh, by the time that becomes a problem, we'll have good governance in place.
Yeah, pull the other one. It has bells on it. That's the funny thing, right? Is a lot of the
early drive in the cloud was those of us who wanted to go faster and we were up against the
limitations of our data centers. And then we go out and go like, hey, we've got this cloud thing.
I'll just, you know, put the credit card in there
and I'll spin up a few instances
and hey, I delivered your product.
And everybody goes, yeah, hey, happy.
And then like you mentioned, right?
And then we get down the road here
and it's like, oh my God,
how much are we spending on this?
And then you're in that funny boat
where you have both.
But yeah, I mean, like,
that's just typical engineering problem
where, you know,
we have to deal with our constraints
and the cloud has constraints too, right? Like when I was at Netflix, one of the things we would do frequently is bump up against
instance limits. And then we go talk to our TAM and be like, Hey buddy, can we have some more
instance limit and then take care of that? Right. But there are some bounds on that. Of course,
in the cloud providers, you know, if I have my cloud provider shoes on, I don't necessarily want
to put those limits too low because it's a business.
The business wants to hoover up all the money.
That's what businesses do.
So I guess it's just a different constraint that is maybe much too easy to knock down, right?
Because as you mentioned, in a data center or in a colo space, I grow my cage and I filled up all the space I have.
I have to either order more space from my colo provider.
I expand to the cloud, right?
The scale I was always at, the limit was not the space.
Because I assure you, with enough shoving, all things are possible.
Don't believe me.
Look at what people are putting in the overhead bin on any airline.
Enough shoving, you'll get a Volkswagen in there.
But it was always power constraint is what I dealt with.
And it's like, eh, they're just being conservative.
And the whole building room dies.
You want blade servers because that's how you get Blade servers, right?
That movement was about bringing the density up
and putting more servers in a rack.
You know, there was some management stuff in a lot,
but a lot of it was just about like, you know,
I remember I'm picturing...
Even without that, I was still power constrained
because you have to remember, a lot of my experiences
were not in, shall we say, data center facilities
that you would call, you know, good.
Well, that brings up a fun thing that's happening,
which is that the power of servers is still growing.
The newest Intel chips, especially the ones they're shipping
for hyperscale and stuff like that,
with the really high core counts and the faster clock speeds,
you know, these things are pulling like 300 watts.
And they also have to egress all that heat.
And so that's one of the places where we're doing some innovations.
I think there's a couple of blog posts out about it
around like liquid cooling or multi-mode cooling.
And what's interesting about this from a cloud or data center perspective
is that the tools and skills and everything has to come together
to run a, you know, this year's or next year's servers where we're pushing thousands of kilowatts into a rack, thousands, one rack, right?
The bar to actually bootstrap and run this stuff successfully is rising again compared to, I take my pizza box servers, right?
And I worked at a gaming company a long time ago, right?
And they would just like stack them on the floor.
It was just a stack of servers.
Like they were in between the rails,
but they weren't screwed down or anything, right?
And they would network them all up
because basically like the game would spin up on the servers.
And if they died, they would just unplug that one
and leave it there and spin up another one.
It was like, you could just stack stuff up
and like you sling cables across the data center
and stuff back then.
I wouldn't do that way now.
But when you add, say, liquid cooling and some of these like extremely high power situations into the mix, now you need to have, for example, if you're using liquid cooling, you don't want that stuff leaking.
Right.
And so as good as the pressure fittings and blind mating and all this stuff that's coming around is get, you still have that element of
additional training and skill and possibility for mistakes. The thing that I see as I look at this
across the space is that on some level, it's gotten harder to run a data center than it ever
did before. Because again, another reason I wanted to have you on this show is that you do not carry a quota,
although you do often carry the conversation when you have boring people around you.
But quotas, no. You are not here selling things to people. You are not actively incentivized to
get people to see things a certain way. You are very clearly an engineer in the right ways. I
will further point out, though, that you do not
sound like an engineer, by which I mean you're going to basically belittle people in many cases
in the name of being technically correct. You're a human being with a freaking soul, and believe me,
it is noticed. I really appreciate that. If somebody's just listening and hearing my voice
and then my name, I have a low voice. And in most of my career, I was extremely technical,
like to the point where, you know,
if something was wrong, technically,
I would fight to the death
to get the right technical solution
and maybe not see the complexity around the decisions
and why things were the way they were
and the way I can today.
And that's changed how I sound.
It's changed how I talk.
It's changed how I look at
and talk about technology as well, right? I'm just not that interested in Kubernetes
because I've kind of started looking up the stack in this kind of pursuit.
Yeah, when I say you don't sound like an engineer, I am in no way, shape, or form
alluding in any respect to your technical acumen. I feel the need to clarify that statement for
people who might be listening and saying, hey, wait a minute, is he being a shithead?
No, well, he's not the kind you're worried I'm being anyway.
I'm a different breed of shithead.
That's fine.
Yeah, I should remember that other people don't know we've had conversations that are
deeply technical that aren't on air, that aren't context anybody else has.
And so, like, I bring that deep technical knowledge, you know, the ability to talk about
PCI Express and kilovolts going to a rack and top of rack switches and network topologies,
all of that together now.
But what's really fascinating
is where the really big impact is for reliability,
for security, for quality.
The things that me as a person that I'm driven by,
products are cool, but I like them to be reliable.
That's the part that I like.
Really come down to more leadership and business
acumen and understanding the business constraints and then being able to get heard by an audience
that isn't necessarily technical, that doesn't necessarily understand the difference between PCI,
PCIX, and PCI Express. There's a difference between those. It doesn't mean anything to the business,
right? So when we want to go and talk about why are we doing,
for example, multi-region deployment of our application.
If I come in and say,
well, because we want to use Raft,
that's going to fall flat.
The business go, I don't care about Raft.
What does that have to do with my customers?
Which is the right question to always ask.
Instead, when I show up and say,
okay, what's going on here is
we have this application sits in a single region or in a single data center or whatever, right?
I'm using region because that's probably what most of the people listening understand.
You know, so I put my application in a single region and it goes down. Our customers are going
to be unhappy. We have the alternative to spend, okay, not a little bit more money, probably a lot
more money to build a second region. And the benefit we will get is that our customers will be able to access the service 24
by 7, and it will always work, and they'll have a wonderful experience. And maybe they'll keep
coming back and buy more stuff from us. And so when I talk about it in those terms, right, and
it's usually more nuanced than that, then I start to get the movement at the macro level, right, in the systemic level of the business and the direction I want it to go, which is for the product group to understand why reliability matters to the customer.
You know, for the individual engineers to understand why it matters that we use secure coding practices. This episode is sponsored in part by our friends at Sysdig.
Sysdig secures your cloud from source to run.
They believe, as do I, that DevOps and security are inextricably linked.
If you want to learn more about how they view this, check out their blog.
It's definitely worth the read.
To learn more about how they are absolutely getting it right from where I sit,
visit sysdig.com and tell them that I sent you.
That's S-Y-S-D-I-G.com.
And my thanks to them for their continued support of this ridiculous nonsense.
Getting back to the reason I said that you are not quota carrying and you are not incentivized
to push things in a particular way is that often we'll meet zealots and i've never known you to be
one you have always been a strong advocate for doing the right thing even if it doesn't directly
benefit any given random employer that you might have and as a result one of the things that you've
said to me repeatedly is if you're building something from scratch, for God's sake, put it in cloud.
What is wrong with you?
Do that.
That is the idea of building it yourself on low-lying underlying primitives for almost every modern SaaS-style workload.
There's no reason to consider doing something else in almost any case.
Is that a fair representation of your position on this?
It is.
I mean, the simpler version, right, is why the hell are you doing
undifferentiated lifting, right?
Things that don't differentiate your product.
Why would you do it?
The thing that this has empowered then
is I can build an experiment tonight.
I don't have to wait for provisioning
and sign contracts and do all the rest.
I can spend 25 cents
and get an experiment up and running.
If it takes off, though,
it has changed how I move going forward as well
because there's no difference in the way that there was back when we were in data centers.
I'm going to try and experiment.
I'm going to run it in this, I don't know, crappy Raspberry Pi or my desktop or something under my desk somewhere.
And if it takes off and I have to scale up, I've got to do a giant migration to real enterprise-grade hardware.
With cloud, you are getting all of that out of the box, even if all you're doing with it is something ridiculous and nonsensical. And you're also getting like ridiculously better
service. So 20 years ago, if you and I sat down to build a SaaS app, we would have spun up a Linux
box somewhere in a colo. And we would have spun up Apache, MySQL, maybe some Perl or PHP if we're
feeling frisky. And the availability of that would be what one machine could do,
what we could handle in terms of one MySQL instance.
But today, if I'm spinning up a new stack for the same kind of SaaS,
I'm going to probably deploy it into an ASG.
I'm probably going to have some kind of high availability database on it,
and I'm going to use Aurora as an example.
Because the availability of an Aurora instance,
in terms of if I'm building myself up
with even the very best kit available in databases,
it's going to be really hard to hit the same availability
that Aurora does,
because Aurora is not just a software solution.
It's also got a team around it that stewards it 24-7.
And it continues to evolve on its own.
And so the base, when we start that little tiny startup,
instead of being that one machine,
we're actually starting at a much higher level of quality
and availability and even security sometimes
because of these primitives that were available.
And I probably should go on to extend on the thought
of undifferentiated lifting, right?
And coming back to the colo or the edge story,
which is that there are still some little edge cases, right? Like I think for SaaS, duh, right? Like go straight
to, and then, but there are still some really interesting things where there's like hardware
innovations, where they're doing things with GPUs and stuff like that, where the colo experience may
be better because you're trying to do like custom hardware. And that, in which case you are in a
colo, there are businesses doing some really interesting stuff with custom hardware, in which case you are in a colo. There are businesses doing some really interesting stuff
with custom hardware that's behind an application stack.
What's really cool about some of that, from my perspective,
is that some of that might be sitting on, say, bare metal with us,
and maybe the front end is sitting somewhere else.
Because the other thing Aquanix does really well
is this product we call Fabric,
which lets us basically do peering with any of the cloud providers.
Yeah, the reason I guess I don't consider you as a quote-unquote cloud is first and foremost rooted in the fact
that you don't have a bandwidth model that is free for ingress and criminally expensive to send it anywhere that isn't to you folks.
Are you really a cloud if you're not just gouging the living piss out of your customers every time they want to send data somewhere else? Well,
I mean, we like to say we're part of the cloud
and really that's actually my favorite
feature of Metal is that
you get, I think... Yeah, this was
a compliment to be very clear. I'm a
big fan of not paying 1998 bandwidth pricing
anymore. Yeah, but this is the part where I get
to do a little bit of like showing off for Metal
a little bit and that like when you buy a
Metal server, there's different configurations, right? But but like i think the lowest one you have dual 10 gig
ports to the server that you can get either in a bonded mode so that you have a single 20 gig
interface in your operating system or you can actually do l3 and you can do bgp to your server
and so this is a capability that you really can't get at all on the other clouds,
right? This lets you do things with the network, not only the bandwidth, right, that you have
available. Like you want to stream out 25 gigs of bandwidth out of us. I think that's pretty doable.
And the rates, I've only seen a couple of comparisons, are pretty good. So this is like
where some of the business opportunity is, right? And I can't get too much into it, but like, this is all public stuff I've talked about so far,
which is,
that's part of the opportunity there is sitting at the crossroads of the
internet.
We can give you a server that has really great networking and you can do
all the cool custom stuff with it.
Like BGP,
right?
Like,
so that you can do any cast,
right?
You can build any cast applications.
I miss the days when that was a thing that made sense.
I mean that in the context of the internet and networks. These days, it always feels like the
network engineering has slipped away within the cloud because you have overlays on top of overlays
and it's all abstractions that are living out there right until suddenly you really need to
know what's going on. But it has abstracted so much of this away. And that, on some level,
is the surprise people are often in for
when they wind up outgrowing the cloud for a workload
and wanting to move it someplace that doesn't, you know,
ride them like naughty ponies for bandwidth.
And they have to rediscover things
that we've mostly forgotten about.
I remember having to architect significantly
around the context of hard drive failures.
I know we've talked about that a fair bit as a thing, but yeah, it's spinning metal. It throws off heat. And if you lose the
wrong one, your data is gone and you now have serious business problems. In cloud, at least
AWS land, that's not really a thing anymore. The way EBS is provisioned, there's a slight
tick in latency if you're looking at just the right time for what I think is a hard drive failure,
but it's there. You don't have to think about this anymore.
Migrate that workload to a pile of servers
in a colo somewhere.
Guess what?
Suddenly your reliability is going to decrease.
Amazon and the other cloud providers as well
have gotten to a point
where they are better at operations
than you are at your relatively small company
with your nascent sysadmin team.
I promise there is an economy of scale here.
And it doesn't have to be good or better, right?
It's just simply better resourced.
Yeah.
Then most anybody else can help.
Amazon can throw a billion dollars at it and never miss it.
And most organizations out there, you know, and most of the, especially enterprise,
people are scratching and trying to get resources wherever they can,
right? They're all competing for people, for time, for engineering resources. And that's one of the
things that gets freed up when you just basically bang an API and you get the thing you want. You
don't have to go through that kind of old world internal process that is usually slow and often
painful. Just because they're not resourced as well. They're not automated as well.
Maybe they could be.
I'm sure most of them could, in theory, be.
But we come back to undifferentiated lifting.
None of this helps, say,
let me think of another random business.
Claire's, whatever, like any of the shops in the mall,
they all have some kind of enterprise behind them
for cash processing and all that stuff, point of sale.
None of this stuff is differentiating for them because it doesn't impact anything to do with
where the money comes in. So again, we're back at, why are you doing this?
I think that's also the big challenge as well. When people start talking about repatriation
and talking about this idea that they are going to, oh, the cloud is too expensive,
we're going to move out, and they make the economics work. Again, I do firmly believe that, by and large,
businesses do not intentionally go out and make poor decisions. I think when we see a company
doing something inscrutable, there is always context that we are missing. And I think,
as a general rule of thumb, that these companies do not hire people who are fools.
And there are always constraints that they cannot talk about in public.
My general position, and as a consultant, and ideally as someone who aspires to be a decent human being, is that when I see something I don't understand, I assume that there's simply a lack of
context, not that everyone involved in this has been foolish enough to make giant blunders that
I can pick out in the first five seconds of looking at it. I'm not quite that self-confident yet.
I mean, that's a big part of the career progression into above senior engineer, right?
Is you don't get to sit in your chair and go like, oh, those dummies, right?
You actually have to, I don't know about have to, but the way I operate now, right, is I
remember when I'm in my youth, I used to be like, oh, those business people, I don't know
nothing.
What are they doing?
It's goofy what they're doing.
And then now I have a different mode, which is, oh, that's interesting.
Can you tell me more?
The feeling is still there, right?
Like, oh my God, what is going on here?
But then I get curious and I go, so how did we get here?
And you get that story and the stories are always fascinating and they always
involve like constraints,
immovable objects,
people doing the best they can with what they have available.
Always.
And I want to be clear that it very rarely is it the right answer to walk into
a room and said,
look at the architecture and all right,
what moron built this?
Because always you're going to be asking that question to said moron and it doesn't matter how right you are they're never going to listen to
another thing out of your mouth again and have some respect for what came before even if it's
the potentially wrong answer well great why didn't you just use this op this service to do this
instead yeah because this thing predates that by five years jackass there are reasons things are
the way they are if If you take any architecture
in the world and tell people to rebuild at Greenfield, almost none of them would look the
same as they do today because we learn things by getting it wrong. That's a great teacher and it
hurts, but it's also true. And we got to build, right? Like that's what we're here to do. If we
just kind of cycle waiting for the perfect technology, the right choices, and again,
to come back to like the people who built it at the time use, you know, often we can fault people for this, use the things they know or the things that are
nearby and they make it work. And that's kind of amazing sometimes, right? Like I'm sure you see
architectures frequently and I see them too, probably less frequently where you just go,
how does this even work in the first place? Like, how did you get this to work? Because I'm looking at this diagram or whatever,
and I don't understand how this works.
Maybe that's a thing that's more a me thing, right?
Because usually I can look at a,
skim over an architecture document,
something like, be able to build the model up
and be like, okay, I can see how that kind of works
and how the data flows through it.
I can do that pretty quickly.
And it comes back to that, like, just, again,
asking, how did we get here?
And then the cool part about asking, how did we get here and then the cool part about
asking how do we get here is it sets everybody up in the room not just you as the person trying to
drive change but the people you're trying to bring along the original architects original engineers
when you ask how did we get here you've started them on the path to coming along with you into
the future which is kind of cool but until that story-pelling mode, again, is so powerful at almost every level of the stack,
right?
And that's why I just like, we were talking about how technical I bring things in.
Again, like, I'm just not that interested in, like, are you Little Indian or Big Indian?
How did we get here?
It's kind of cool.
You built a Big Indian architecture in 2022.
Like, whoo.
How did we do that?
Hey, leave me to my own devices
and I need to build something super quickly
to get it up and running.
Well, what I'm going to do for a lot of answers
is going to look an awful lot
like the traditional three-tier architecture
that I was running back in 2008
because I know it, it works well,
and I can iterate rapidly on it.
Is it a best practice?
Absolutely not.
But given the constraints,
sometimes it's the
fastest thing to grab. Well, if you built this in serverless technologies, it would run at a
fraction of the cost. Yes. But if I run this thing the way that I'm running it now, it'll be $20 a
month. It'll take me two hours instead of 20. And what exactly is your time worth again? It comes
down to the better economic model of all these things. Anytime you're trying to make a case to
the business, the economic model is going to always
go further. Just a general
tip for tech people, right? If you can make the better
economic case and you go to the business with an
economic case that is clear,
businesses listen to that.
They're not going to listen to us go on and on
about distributed systems. Somebody
in finance trying to make a decision about, like, do we
go and spend a million bucks on this?
That's not really the material thing.
It's like, how is this going to move the business forward?
And how much is it going to cost us to do it?
And what other opportunities are we giving up to do that?
I think that's probably a good place to leave it because there's no good answer.
We can all think about that until the next episode.
I really want to thank you for spending so much time talking to me again.
If people want to learn more, where's the best place for them to find you?
Always Twitter for me,
MissAmyToby, and
I'll see you there. Say hi.
Thank you again for being as
generous with your time as you are. It's deeply appreciated.
It's always fun.
AmyToby, Senior
Principal Engineer at Equinix
Metal. I'm Cloud Economist
Corey Quinn, and this is Screaming in
the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform
of choice. Whereas if you've hated this podcast, please leave a five-star review on your podcast
platform of choice, along with an angry comment that tells me exactly what we got wrong in this
episode in the best dialect you have of condescending engineer with zero people skills.
I look forward to reading it.
If your AWS bill keeps rising and your blood pressure is doing the same,
then you need the Duck Bill Group.
We help companies fix their AWS bill by making it smaller and less horrifying.
The Duck Bill Group works for you, not AWS.
We tailor recommendations to your business, and we get to the point.
Visit duckbillgroup.com to get started. this has been a humble pod production
stay humble