Screaming in the Cloud - The Controversy of Cloud Repatriation With Amy Tobey of Equinix

Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. This episode is sponsored in part by our friends at AWS AppConfig. Engineers love to solve and occasionally create problems,

Starting point is 00:00:39 but not when it's an on-call fire drill at four in the morning. Software problems should drive innovation and collaboration, not stress and sleeplessness and threats of violence. That's why so many developers are realizing the value of AWS AppConfig feature flags. Feature flags let developers push code to production, but hide that feature from customers so that the developers can release their feature when it's ready. This practice allows for safe, fast, and convenient software development. You can seamlessly incorporate AppConfig feature flags into your AWS or cloud environment and ship your features with excitement, not trepidation and fear.

Starting point is 00:01:23 To get started, go to snark.cloud slash appconfig. That's snark.cloud slash appconfig. I come bearing ill tidings. Developers are responsible for more than ever these days. Not just the code that they write, but also the containers and the cloud infrastructure that their apps run on because serverless means it's still somebody's problem. And a big part of that responsibility is app security from code to cloud.

Starting point is 00:01:54 And that's where our friend Snyk comes in. Snyk is a frictionless security platform that meets developers where they are, finding and fixing vulnerabilities right from the CLI, IDEs, repos, and pipelines. Snyk integrates seamlessly with AWS offerings like CodePipeline, EKS, ECR, and more, as well as things you're likely to actually be using. Deploy on-n-y-k dot c-o slash scream. Welcome to Screaming in the Cloud. I'm Corey Quinn, and this episode is another one of those real profiles in shitposting type of episodes. I am joined again from a few months ago by Amy Tobey, who is a senior principal engineer at Equinix, back for more. Amy, thank you so much for joining me. Welcome to your show.

Starting point is 00:02:51 Exactly. So one thing that we have been seeing a lot of over the past year, and you struck me as one of the best people to talk about a what you are seeing in the wilderness perspective has been the idea of cloud repatriation. It started off with something that came out of Andreessen Horowitz toward the start of the year about the trillion dollar paradox, how at a certain point of scale, repatriating to a data center is the smart and right move. And oh my stars, did that ruffle some feathers for people. Well, I spent all this money moving to the cloud. That was just mean.

Starting point is 00:03:29 I know. Why would I want to leave the cloud? I mean, for God's sake, my account manager named his kid after me. Wait a minute. How much am I spending on that? Yeah, there is that ever-growing problem. And there have been the examples that people have given of Dropbox classically did a cloud repatriation exercise and a second example that no one can ever name. And it seems like, okay, this might not necessarily be the direction that the industry is going. But I also tend to not be completely naive when it comes to these things.

Starting point is 00:04:02 And I can see repatriation making sense on a workload-by-workload basis. What that implies is that, yeah, but a lot of other workloads are not going to be going to a data center. They're going to stay in a cloud provider who would like very much, if you never breathe a word of this

Starting point is 00:04:18 to anyone in public. So if there are workloads repatriating, it would occur to me that there's a vested interest on the part of every major cloud provider to do their best to, I don't know if saying suppress the story is too strongly worded, but it is directionally what I mean. They aren't helping get the story out. Yeah, that's a great observation. Could you maybe shut the hell up and never make it ever again in public or we will end you? Yeah, it's your Amazon.

Starting point is 00:04:44 What are you going to do? Launch a shitty Amazon Basics version of what my company does? Good luck. Have fun. You're probably doing it already. But the reason I want to talk to you on this is a confluence of a few things. One, as I mentioned back in May when you were on the show, I am incensed and annoyed that we've been talking for as long as we have. And somehow I never had you on the show.

Starting point is 00:05:04 So, great. Come back, please. You're always welcome here. Secondly, you work at Equinix, which is effectively, let's be relatively direct, it is functionally a data center as far as how people wind up contextualizing this. Yes, you have- Yeah, I guess people contextualize it that way, but we'll get into that. Yeah, from the outside. I don't work there, to be clear. My talking points don't exist for this. But I think, oh, Equinix, oh, that means you basically have a colo or colo equivalent. The pricing dynamics are radically different. It looks a lot closer to a data center, in my imagination, than it does a traditional public cloud. I would also argue that if someone migrates from AWS to Equinix, that would be viewed, arguably correctly, as something of a repatriation. Is that

Starting point is 00:05:49 directionally correct? I would argue incorrectly for metal, right? So Equinix is a data center company, right? Like that's what everybody knows this as. Equinix Metal is a bare metal primitive service, right? So it's a lot more of a cloud workflow, right? Except that you're not getting the rich services that you get in a technically full cloud, right? Like there's no RDS, there's no S3 even. What you get is bare metal primitives, right? With a really fast network. Are you really a cloud provider without some ridiculous machine learning powered service that's going to wind up taking pictures, perform incredibly expensive operations on it, and then return something

Starting point is 00:06:29 that's more than a little racist? I mean, come on. You're not a cloud until you can do that, right? We can do that. We have customers that do that. Well, not specifically that. But they have to build it themselves. You don't have the high-level managed service that basically serves as what is functionally bias laundering. Yeah, you don't get it in a box, right? So a lot of our customers are doing things that are unique, right? That are maybe not exactly fit into the cloud well. And it comes back down to a lot of Equinix's roots, which is, we talk about going to the cloud and it's this kind of abstract environment we're reaching for, you know, up in the sky. And it's like, we don't know where it is, except we have regions that, okay, so it's in Virginia. But the rule of real estate applies to technology

Starting point is 00:07:11 as often as not, which is location, location, location, right? When we're talking about a lot of applications, a challenge that we face, say, in gaming, is that the latency from the customer, that last mile to your data center, can often be extremely important, right? So a few milliseconds even. And a lot of like SaaS applications, the typical stuff that really the cloud was built on, 10 milliseconds, 50 milliseconds, nobody's really going to notice that, right?

Starting point is 00:07:39 But in a gaming environment or some very high or low latency application that needs to run extremely close to the customer, it's hard to do that in the cloud they're building this stuff out right like i see it you know the different ones built opening new regions but you know there's this other side of the cloud which is like the edge computing thing that's coming alive and that's more where i think about it and again location location location the speed of light is really fast but as most of us in techno if you want to go across from the East Coast to the West Coast,

Starting point is 00:08:06 you're talking about 80 milliseconds on average, right? I think that's what it is. I haven't checked in a while. You know, that's just basic fundamental speed of light. And so if everything's in US East 1, and this is why we do multi-regions sometimes, the latency from the West Coast isn't going to be great. And so we run applications-

Starting point is 00:08:22 It has improved though. You want to talk old school, things that are seared into my brain from over 20 years ago, every person who's worked in data centers or in technology as a general rule has a few IP addresses seared upon their soul. And the one that I've always had on my mind was 130.111.32.11. Kind of arbitrary and ridiculous, but it was one of the two recursive resolvers provided at the University of Maine, where I had my first help desk job. And it lives on-prem in Maine. And generally speaking, I tended to always accept that no matter where I was, unless I was in a data center somewhere, it was about 120 milliseconds.

Starting point is 00:09:09 And I just checked now, it is 85 and change from where I am in San Francisco. So the internet or the speed of light have improved. So good for whichever one of those it was. But yeah, you've just updated my understanding of these things. All this wishes to say, yes, latency is very important. Right. Let's forget perpetuation. To be really honest, even the Dropbox case or any of them, right?

Starting point is 00:09:30 There's an economic story here that I think all of us that have been doing cloud work for a while see pretty clearly that maybe not everybody's seeing that's thinking from an on-prem kind of situation, which is that, and I know you do this all the time, right? Is you don't just look at the cost of the data center and the servers and the network, the technical components, the bill of materials. Oh, lies, damn lies, and TCO analyses. But there's all these people on top of it and the organizational complexity and the contracts that you've got to manage. And it's this big, huge operation that is incredibly complex to do well that is almost nobody's business. So the way I look at this, right, and the way I even talk to customers about it is like,

Starting point is 00:10:13 what is your product? And I talk to people internally about this way. It's like, what are you trying to build? Well, I want to build a SaaS, okay? Do you need data center expertise to build a SaaS? No. Why the hell are you putting it in a data center? Like, you know, I'm speaking for my employer, right? Like, we have EconX Metal right here. You can build on that. You don't have to do all the most complex part of this, at least in terms of like the physical plant, right? Like getting a bare metal server available. We take care of all of that. Even at the primitive level where we sit, it's higher level than, say, Colo. There's also the question of economics as it ties into it. It's never just a raw cost of materials type of approach. Like my original job in a data center was basically to walk around and replace hard drives and apparently to insult

Starting point is 00:10:58 people. Now the cloud has taken one of those two aspects away and you can follow my Twitter account and figure out which one of those two it is, but what I keep seeing now is that there's value to having that task done, but in a cloud environment, and an Equinix model, let's be clear, that has slipped below the surface level of awareness. And, well, what are the economic implications of that? Well, okay, you have a whole team of people

Starting point is 00:11:24 at large companies whose job it is to do precisely that. Okay, we're going to upskill them and train them to use cloud. Okay, first, not everyone is going to be capable or willing to make that leap from hard drive replacement to congratulations and welcome to JavaScript. You're about to hate everything that comes next. And if they do make that leap, their baseline market value, by which I mean what the market is willing to pay for them, approximately will double. And whether they wind up being paid more by their current employer

Starting point is 00:11:55 or they take a job somewhere else with those skills and get paid what they are worth, the company still has that economic problem. Like it or not, you will generally get what you pay for whether you want to or not. you will generally get what you pay for, whether you want to or not. That is the reality of it. And as companies are thinking about this, well, what gets into the TCO analysis and what doesn't, I have yet to see one where the outcome was not predetermined. They're less, let's figure out in good faith, whether it's going to be more expensive to move to the cloud or move out of the cloud or just burn the building down for insurance money.

Starting point is 00:12:27 The outcome is generally the one that the person who commissioned the TCO analysis wants. So when a vendor is trying to get you to switch to them and they do one for you, yeah. I'm not saying they're lying, but there's so much judgment that goes into this. And what do you include? What do you not include? That's hard. And there's so many hidden costs and that's one of those things that i love about working at a cloud provider is that i still get to play with all that stuff and like i get to see those hidden

Starting point is 00:12:53 costs right like you were talking about the the person who goes around and swaps out the hard drives or early in my career right i worked with someone whose job it was just every day she would go in the data center she'd swap out out the tapes, you know, and do a few things other around and like take care of the billing system. And that was a job where it was kind of going around and stewarding a whole bunch of things that kind of kept the whole machine running. But most people outside of being right next to the data center didn't have any idea that stuff even happened, right, that went into it. And so like you were saying, like when you go to do the TCO analysis, I mean, I've been through this a couple of times prior in my career where people will look at it and be like, well, of course we're not going to list. We'll put like two headcount on there.

Starting point is 00:13:31 And it's always a lie because it's never just two headcount. It's never just the network person or the SRE or the person who's racking the servers. It's also like finance has to do all this extra work. And there's all the logistic work. And there is just so much stuff that just is really hard to include. Not only do people leave it out, but it's also just really hard for people to grapple with the complexity of all the things it takes to run a data center, which is like one of the most complex machines on the planet.

Starting point is 00:13:58 Any single data center. I've worked in small-scale environments, even a couple of mid-sized ones, but never the type of hyperscale facility that you folks have, which I would say is, if it's not hyperscale, it's at least directionally close to it. We're talking thousands of servers and hundreds of racks. I've started getting into that on some level.

Starting point is 00:14:19 I guess when we say hyperscale, we're talking about AWS-sized things where, oh, that's a region, and it's going to have three dozen data center facilities in it. Yeah, I don't work in places like that because, honestly, have you met me? Would you trust me around something that's that critical to infrastructure? No, you would not unless you have terrible judgment, which means you should not be working in those environments to begin with. I mean, you're like a walking chaos exercise. Maybe I would let you in. Oh, I bring my hardware destruction aura near

Starting point is 00:14:45 anything expensive and things are terrible. It's awful. But as I look at the cloud, regardless of cloud, there is another economic element that I think is underappreciated. And to be fair, this does, I believe, apply as much to Equinix Metal as it does to the public hyperscale cloud providers that have problems with naming things well. And that is when you are provisioning something as a customer of one of these places, you have an unbounded growth problem. When you're in a data center, you are not going to just absentmindedly sign an $8 million purchase order for new servers, you know, a second time. And that means you're eventually going to run a power space, places to put things, and you have to go find it somewhere. Whereas in cloud, the only limit is basically your budget,

Starting point is 00:15:31 where there is no forcing function that reminds you to go and clean up that experiment from five years ago. You have people with three petabytes of data they were using for a project, but they haven't worked there in five years and nothing's touched it since. Because the failure mode of deleting things that are important or disastrous... That's why Glacier exists. Oh, exactly. But the failure mode of deleting things that should not be deleted are disastrous for a company. Whereas if you leave them there, well, it's only money. And there's no forcing function to do that, which means you have this infinite growth problem with no natural limit slash predator around it. And that is the economic analysis that I do not see playing out basically

Starting point is 00:16:05 anywhere because, oh, by the time that becomes a problem, we'll have good governance in place. Yeah, pull the other one. It has bells on it. That's the funny thing, right? Is a lot of the early drive in the cloud was those of us who wanted to go faster and we were up against the limitations of our data centers. And then we go out and go like, hey, we've got this cloud thing. I'll just, you know, put the credit card in there and I'll spin up a few instances and hey, I delivered your product. And everybody goes, yeah, hey, happy.

Starting point is 00:16:30 And then like you mentioned, right? And then we get down the road here and it's like, oh my God, how much are we spending on this? And then you're in that funny boat where you have both. But yeah, I mean, like, that's just typical engineering problem

Starting point is 00:16:40 where, you know, we have to deal with our constraints and the cloud has constraints too, right? Like when I was at Netflix, one of the things we would do frequently is bump up against instance limits. And then we go talk to our TAM and be like, Hey buddy, can we have some more instance limit and then take care of that? Right. But there are some bounds on that. Of course, in the cloud providers, you know, if I have my cloud provider shoes on, I don't necessarily want to put those limits too low because it's a business. The business wants to hoover up all the money.

Starting point is 00:17:08 That's what businesses do. So I guess it's just a different constraint that is maybe much too easy to knock down, right? Because as you mentioned, in a data center or in a colo space, I grow my cage and I filled up all the space I have. I have to either order more space from my colo provider. I expand to the cloud, right? The scale I was always at, the limit was not the space. Because I assure you, with enough shoving, all things are possible. Don't believe me.

Starting point is 00:17:32 Look at what people are putting in the overhead bin on any airline. Enough shoving, you'll get a Volkswagen in there. But it was always power constraint is what I dealt with. And it's like, eh, they're just being conservative. And the whole building room dies. You want blade servers because that's how you get Blade servers, right? That movement was about bringing the density up and putting more servers in a rack.

Starting point is 00:17:52 You know, there was some management stuff in a lot, but a lot of it was just about like, you know, I remember I'm picturing... Even without that, I was still power constrained because you have to remember, a lot of my experiences were not in, shall we say, data center facilities that you would call, you know, good. Well, that brings up a fun thing that's happening,

Starting point is 00:18:09 which is that the power of servers is still growing. The newest Intel chips, especially the ones they're shipping for hyperscale and stuff like that, with the really high core counts and the faster clock speeds, you know, these things are pulling like 300 watts. And they also have to egress all that heat. And so that's one of the places where we're doing some innovations. I think there's a couple of blog posts out about it

Starting point is 00:18:32 around like liquid cooling or multi-mode cooling. And what's interesting about this from a cloud or data center perspective is that the tools and skills and everything has to come together to run a, you know, this year's or next year's servers where we're pushing thousands of kilowatts into a rack, thousands, one rack, right? The bar to actually bootstrap and run this stuff successfully is rising again compared to, I take my pizza box servers, right? And I worked at a gaming company a long time ago, right? And they would just like stack them on the floor. It was just a stack of servers.

Starting point is 00:19:09 Like they were in between the rails, but they weren't screwed down or anything, right? And they would network them all up because basically like the game would spin up on the servers. And if they died, they would just unplug that one and leave it there and spin up another one. It was like, you could just stack stuff up and like you sling cables across the data center

Starting point is 00:19:23 and stuff back then. I wouldn't do that way now. But when you add, say, liquid cooling and some of these like extremely high power situations into the mix, now you need to have, for example, if you're using liquid cooling, you don't want that stuff leaking. Right. And so as good as the pressure fittings and blind mating and all this stuff that's coming around is get, you still have that element of additional training and skill and possibility for mistakes. The thing that I see as I look at this across the space is that on some level, it's gotten harder to run a data center than it ever did before. Because again, another reason I wanted to have you on this show is that you do not carry a quota,

Starting point is 00:20:06 although you do often carry the conversation when you have boring people around you. But quotas, no. You are not here selling things to people. You are not actively incentivized to get people to see things a certain way. You are very clearly an engineer in the right ways. I will further point out, though, that you do not sound like an engineer, by which I mean you're going to basically belittle people in many cases in the name of being technically correct. You're a human being with a freaking soul, and believe me, it is noticed. I really appreciate that. If somebody's just listening and hearing my voice and then my name, I have a low voice. And in most of my career, I was extremely technical,

Starting point is 00:20:47 like to the point where, you know, if something was wrong, technically, I would fight to the death to get the right technical solution and maybe not see the complexity around the decisions and why things were the way they were and the way I can today. And that's changed how I sound.

Starting point is 00:21:03 It's changed how I talk. It's changed how I look at and talk about technology as well, right? I'm just not that interested in Kubernetes because I've kind of started looking up the stack in this kind of pursuit. Yeah, when I say you don't sound like an engineer, I am in no way, shape, or form alluding in any respect to your technical acumen. I feel the need to clarify that statement for people who might be listening and saying, hey, wait a minute, is he being a shithead? No, well, he's not the kind you're worried I'm being anyway.

Starting point is 00:21:27 I'm a different breed of shithead. That's fine. Yeah, I should remember that other people don't know we've had conversations that are deeply technical that aren't on air, that aren't context anybody else has. And so, like, I bring that deep technical knowledge, you know, the ability to talk about PCI Express and kilovolts going to a rack and top of rack switches and network topologies, all of that together now. But what's really fascinating

Starting point is 00:21:49 is where the really big impact is for reliability, for security, for quality. The things that me as a person that I'm driven by, products are cool, but I like them to be reliable. That's the part that I like. Really come down to more leadership and business acumen and understanding the business constraints and then being able to get heard by an audience that isn't necessarily technical, that doesn't necessarily understand the difference between PCI,

Starting point is 00:22:18 PCIX, and PCI Express. There's a difference between those. It doesn't mean anything to the business, right? So when we want to go and talk about why are we doing, for example, multi-region deployment of our application. If I come in and say, well, because we want to use Raft, that's going to fall flat. The business go, I don't care about Raft. What does that have to do with my customers?

Starting point is 00:22:39 Which is the right question to always ask. Instead, when I show up and say, okay, what's going on here is we have this application sits in a single region or in a single data center or whatever, right? I'm using region because that's probably what most of the people listening understand. You know, so I put my application in a single region and it goes down. Our customers are going to be unhappy. We have the alternative to spend, okay, not a little bit more money, probably a lot more money to build a second region. And the benefit we will get is that our customers will be able to access the service 24

Starting point is 00:23:09 by 7, and it will always work, and they'll have a wonderful experience. And maybe they'll keep coming back and buy more stuff from us. And so when I talk about it in those terms, right, and it's usually more nuanced than that, then I start to get the movement at the macro level, right, in the systemic level of the business and the direction I want it to go, which is for the product group to understand why reliability matters to the customer. You know, for the individual engineers to understand why it matters that we use secure coding practices. This episode is sponsored in part by our friends at Sysdig. Sysdig secures your cloud from source to run. They believe, as do I, that DevOps and security are inextricably linked. If you want to learn more about how they view this, check out their blog. It's definitely worth the read.

Starting point is 00:24:01 To learn more about how they are absolutely getting it right from where I sit, visit sysdig.com and tell them that I sent you. That's S-Y-S-D-I-G.com. And my thanks to them for their continued support of this ridiculous nonsense. Getting back to the reason I said that you are not quota carrying and you are not incentivized to push things in a particular way is that often we'll meet zealots and i've never known you to be one you have always been a strong advocate for doing the right thing even if it doesn't directly benefit any given random employer that you might have and as a result one of the things that you've

Starting point is 00:24:42 said to me repeatedly is if you're building something from scratch, for God's sake, put it in cloud. What is wrong with you? Do that. That is the idea of building it yourself on low-lying underlying primitives for almost every modern SaaS-style workload. There's no reason to consider doing something else in almost any case. Is that a fair representation of your position on this? It is. I mean, the simpler version, right, is why the hell are you doing

Starting point is 00:25:05 undifferentiated lifting, right? Things that don't differentiate your product. Why would you do it? The thing that this has empowered then is I can build an experiment tonight. I don't have to wait for provisioning and sign contracts and do all the rest. I can spend 25 cents

Starting point is 00:25:17 and get an experiment up and running. If it takes off, though, it has changed how I move going forward as well because there's no difference in the way that there was back when we were in data centers. I'm going to try and experiment. I'm going to run it in this, I don't know, crappy Raspberry Pi or my desktop or something under my desk somewhere. And if it takes off and I have to scale up, I've got to do a giant migration to real enterprise-grade hardware. With cloud, you are getting all of that out of the box, even if all you're doing with it is something ridiculous and nonsensical. And you're also getting like ridiculously better

Starting point is 00:25:49 service. So 20 years ago, if you and I sat down to build a SaaS app, we would have spun up a Linux box somewhere in a colo. And we would have spun up Apache, MySQL, maybe some Perl or PHP if we're feeling frisky. And the availability of that would be what one machine could do, what we could handle in terms of one MySQL instance. But today, if I'm spinning up a new stack for the same kind of SaaS, I'm going to probably deploy it into an ASG. I'm probably going to have some kind of high availability database on it, and I'm going to use Aurora as an example.

Starting point is 00:26:22 Because the availability of an Aurora instance, in terms of if I'm building myself up with even the very best kit available in databases, it's going to be really hard to hit the same availability that Aurora does, because Aurora is not just a software solution. It's also got a team around it that stewards it 24-7. And it continues to evolve on its own.

Starting point is 00:26:43 And so the base, when we start that little tiny startup, instead of being that one machine, we're actually starting at a much higher level of quality and availability and even security sometimes because of these primitives that were available. And I probably should go on to extend on the thought of undifferentiated lifting, right? And coming back to the colo or the edge story,

Starting point is 00:27:04 which is that there are still some little edge cases, right? Like I think for SaaS, duh, right? Like go straight to, and then, but there are still some really interesting things where there's like hardware innovations, where they're doing things with GPUs and stuff like that, where the colo experience may be better because you're trying to do like custom hardware. And that, in which case you are in a colo, there are businesses doing some really interesting stuff with custom hardware, in which case you are in a colo. There are businesses doing some really interesting stuff with custom hardware that's behind an application stack. What's really cool about some of that, from my perspective, is that some of that might be sitting on, say, bare metal with us,

Starting point is 00:27:35 and maybe the front end is sitting somewhere else. Because the other thing Aquanix does really well is this product we call Fabric, which lets us basically do peering with any of the cloud providers. Yeah, the reason I guess I don't consider you as a quote-unquote cloud is first and foremost rooted in the fact that you don't have a bandwidth model that is free for ingress and criminally expensive to send it anywhere that isn't to you folks. Are you really a cloud if you're not just gouging the living piss out of your customers every time they want to send data somewhere else? Well, I mean, we like to say we're part of the cloud

Starting point is 00:28:08 and really that's actually my favorite feature of Metal is that you get, I think... Yeah, this was a compliment to be very clear. I'm a big fan of not paying 1998 bandwidth pricing anymore. Yeah, but this is the part where I get to do a little bit of like showing off for Metal a little bit and that like when you buy a

Starting point is 00:28:23 Metal server, there's different configurations, right? But but like i think the lowest one you have dual 10 gig ports to the server that you can get either in a bonded mode so that you have a single 20 gig interface in your operating system or you can actually do l3 and you can do bgp to your server and so this is a capability that you really can't get at all on the other clouds, right? This lets you do things with the network, not only the bandwidth, right, that you have available. Like you want to stream out 25 gigs of bandwidth out of us. I think that's pretty doable. And the rates, I've only seen a couple of comparisons, are pretty good. So this is like where some of the business opportunity is, right? And I can't get too much into it, but like, this is all public stuff I've talked about so far,

Starting point is 00:29:06 which is, that's part of the opportunity there is sitting at the crossroads of the internet. We can give you a server that has really great networking and you can do all the cool custom stuff with it. Like BGP, right? Like,

Starting point is 00:29:18 so that you can do any cast, right? You can build any cast applications. I miss the days when that was a thing that made sense. I mean that in the context of the internet and networks. These days, it always feels like the network engineering has slipped away within the cloud because you have overlays on top of overlays and it's all abstractions that are living out there right until suddenly you really need to know what's going on. But it has abstracted so much of this away. And that, on some level,

Starting point is 00:29:45 is the surprise people are often in for when they wind up outgrowing the cloud for a workload and wanting to move it someplace that doesn't, you know, ride them like naughty ponies for bandwidth. And they have to rediscover things that we've mostly forgotten about. I remember having to architect significantly around the context of hard drive failures.

Starting point is 00:30:03 I know we've talked about that a fair bit as a thing, but yeah, it's spinning metal. It throws off heat. And if you lose the wrong one, your data is gone and you now have serious business problems. In cloud, at least AWS land, that's not really a thing anymore. The way EBS is provisioned, there's a slight tick in latency if you're looking at just the right time for what I think is a hard drive failure, but it's there. You don't have to think about this anymore. Migrate that workload to a pile of servers in a colo somewhere. Guess what?

Starting point is 00:30:31 Suddenly your reliability is going to decrease. Amazon and the other cloud providers as well have gotten to a point where they are better at operations than you are at your relatively small company with your nascent sysadmin team. I promise there is an economy of scale here. And it doesn't have to be good or better, right?

Starting point is 00:30:49 It's just simply better resourced. Yeah. Then most anybody else can help. Amazon can throw a billion dollars at it and never miss it. And most organizations out there, you know, and most of the, especially enterprise, people are scratching and trying to get resources wherever they can, right? They're all competing for people, for time, for engineering resources. And that's one of the things that gets freed up when you just basically bang an API and you get the thing you want. You

Starting point is 00:31:15 don't have to go through that kind of old world internal process that is usually slow and often painful. Just because they're not resourced as well. They're not automated as well. Maybe they could be. I'm sure most of them could, in theory, be. But we come back to undifferentiated lifting. None of this helps, say, let me think of another random business. Claire's, whatever, like any of the shops in the mall,

Starting point is 00:31:38 they all have some kind of enterprise behind them for cash processing and all that stuff, point of sale. None of this stuff is differentiating for them because it doesn't impact anything to do with where the money comes in. So again, we're back at, why are you doing this? I think that's also the big challenge as well. When people start talking about repatriation and talking about this idea that they are going to, oh, the cloud is too expensive, we're going to move out, and they make the economics work. Again, I do firmly believe that, by and large, businesses do not intentionally go out and make poor decisions. I think when we see a company

Starting point is 00:32:12 doing something inscrutable, there is always context that we are missing. And I think, as a general rule of thumb, that these companies do not hire people who are fools. And there are always constraints that they cannot talk about in public. My general position, and as a consultant, and ideally as someone who aspires to be a decent human being, is that when I see something I don't understand, I assume that there's simply a lack of context, not that everyone involved in this has been foolish enough to make giant blunders that I can pick out in the first five seconds of looking at it. I'm not quite that self-confident yet. I mean, that's a big part of the career progression into above senior engineer, right? Is you don't get to sit in your chair and go like, oh, those dummies, right?

Starting point is 00:32:56 You actually have to, I don't know about have to, but the way I operate now, right, is I remember when I'm in my youth, I used to be like, oh, those business people, I don't know nothing. What are they doing? It's goofy what they're doing. And then now I have a different mode, which is, oh, that's interesting. Can you tell me more? The feeling is still there, right?

Starting point is 00:33:15 Like, oh my God, what is going on here? But then I get curious and I go, so how did we get here? And you get that story and the stories are always fascinating and they always involve like constraints, immovable objects, people doing the best they can with what they have available. Always. And I want to be clear that it very rarely is it the right answer to walk into

Starting point is 00:33:38 a room and said, look at the architecture and all right, what moron built this? Because always you're going to be asking that question to said moron and it doesn't matter how right you are they're never going to listen to another thing out of your mouth again and have some respect for what came before even if it's the potentially wrong answer well great why didn't you just use this op this service to do this instead yeah because this thing predates that by five years jackass there are reasons things are the way they are if If you take any architecture

Starting point is 00:34:05 in the world and tell people to rebuild at Greenfield, almost none of them would look the same as they do today because we learn things by getting it wrong. That's a great teacher and it hurts, but it's also true. And we got to build, right? Like that's what we're here to do. If we just kind of cycle waiting for the perfect technology, the right choices, and again, to come back to like the people who built it at the time use, you know, often we can fault people for this, use the things they know or the things that are nearby and they make it work. And that's kind of amazing sometimes, right? Like I'm sure you see architectures frequently and I see them too, probably less frequently where you just go, how does this even work in the first place? Like, how did you get this to work? Because I'm looking at this diagram or whatever,

Starting point is 00:34:46 and I don't understand how this works. Maybe that's a thing that's more a me thing, right? Because usually I can look at a, skim over an architecture document, something like, be able to build the model up and be like, okay, I can see how that kind of works and how the data flows through it. I can do that pretty quickly.

Starting point is 00:35:00 And it comes back to that, like, just, again, asking, how did we get here? And then the cool part about asking, how did we get here and then the cool part about asking how do we get here is it sets everybody up in the room not just you as the person trying to drive change but the people you're trying to bring along the original architects original engineers when you ask how did we get here you've started them on the path to coming along with you into the future which is kind of cool but until that story-pelling mode, again, is so powerful at almost every level of the stack, right?

Starting point is 00:35:28 And that's why I just like, we were talking about how technical I bring things in. Again, like, I'm just not that interested in, like, are you Little Indian or Big Indian? How did we get here? It's kind of cool. You built a Big Indian architecture in 2022. Like, whoo. How did we do that? Hey, leave me to my own devices

Starting point is 00:35:47 and I need to build something super quickly to get it up and running. Well, what I'm going to do for a lot of answers is going to look an awful lot like the traditional three-tier architecture that I was running back in 2008 because I know it, it works well, and I can iterate rapidly on it.

Starting point is 00:36:00 Is it a best practice? Absolutely not. But given the constraints, sometimes it's the fastest thing to grab. Well, if you built this in serverless technologies, it would run at a fraction of the cost. Yes. But if I run this thing the way that I'm running it now, it'll be $20 a month. It'll take me two hours instead of 20. And what exactly is your time worth again? It comes down to the better economic model of all these things. Anytime you're trying to make a case to

Starting point is 00:36:23 the business, the economic model is going to always go further. Just a general tip for tech people, right? If you can make the better economic case and you go to the business with an economic case that is clear, businesses listen to that. They're not going to listen to us go on and on about distributed systems. Somebody

Starting point is 00:36:40 in finance trying to make a decision about, like, do we go and spend a million bucks on this? That's not really the material thing. It's like, how is this going to move the business forward? And how much is it going to cost us to do it? And what other opportunities are we giving up to do that? I think that's probably a good place to leave it because there's no good answer. We can all think about that until the next episode.

Starting point is 00:36:58 I really want to thank you for spending so much time talking to me again. If people want to learn more, where's the best place for them to find you? Always Twitter for me, MissAmyToby, and I'll see you there. Say hi. Thank you again for being as generous with your time as you are. It's deeply appreciated. It's always fun.

Starting point is 00:37:18 AmyToby, Senior Principal Engineer at Equinix Metal. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment that tells me exactly what we got wrong in this episode in the best dialect you have of condescending engineer with zero people skills.

Starting point is 00:37:47 I look forward to reading it. If your AWS bill keeps rising and your blood pressure is doing the same, then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duck Bill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com to get started. this has been a humble pod production stay humble

Screaming in the Cloud - The Controversy of Cloud Repatriation With Amy Tobey of Equinix

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.