Screaming in the Cloud - The Multi-Cloud Counterculture with Tim Bray

Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. This episode is sponsored in part by our friends at Vulture, spelled V-U-L-T-R, because they're all about helping save money, including on things like, you know, vowels.

Starting point is 00:00:40 So what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that, well, sure, they claim it is better than AWS's pricing. And when they say that, they mean that it's less money. Sure, I don't dispute that. But what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to cost. They have a bunch of advanced networking features. They tell you in advance on a monthly basis what it's going to cost. They have a bunch of advanced networking features. They have 19 global locations and scale things elastically, not to be confused with openly, which is apparently elastic and open.

Starting point is 00:01:17 They can mean the same thing sometimes. They have had over a million users. Deployments take less than 60 seconds across 12 pre-selected operating systems, or if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vulture Cloud Compute, they have plans for developers and businesses of all sizes,

Starting point is 00:01:41 except maybe Amazon, who stubbornly insists on having something of the scale on their own. Try Vulture today for free by visiting vulture.com slash screaming, and you'll receive $100 in credit. That's v-u-l-t-r dot com slash screaming. Couchbase Capplla database as a service is flexible, full featured and fully managed with built in access via key value, SQL and full text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in memory performance and automated replication and scaling while reducing cost.

Starting point is 00:02:29 Capella has the best price performance of any fully managed document database. Visit couchbase.com slash screaming in the cloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella. Make your data sing. Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today has been on a year or two ago, but today we're going in a bit of a different direction.

Starting point is 00:02:56 Tim Bray is a principal at Textuality Services. Once upon a time, he was a distinguished engineer slash VP at AWS, but let's be clear, he isn't solely focused on one company. He also used to work at Google. Also, there is scuttlebutt that he might have had something to do at one point with the creation of God's true language, XML. Tim, thank you for coming back on the show and suffering my slings and arrows. You're just fine. Glad to be here. So the impetus for having this conversation is you had a blog post somewhat recently, by which I mean January of 2022, where you talked about lock-in and multi-cloud,

Starting point is 00:03:41 two subjects near and dear to my heart, mostly because I have what I thought was a fairly countercultural opinion. You seem to have a very closely aligned perspective on this, but let's not get too far ahead of ourselves. Where did this blog post come from? Well, I advise a couple of companies, and one of them happens to be using GCP, and the other happens to be using AWS. And I get involved in a lot of industry conversations. And I noticed that multi-cloud is a buzzword. If you go and type multi-cloud into Google, you get like a page of people saying, we will solve your multi-cloud problems. Come to us, and you will be multi-cloud.

Starting point is 00:04:22 And I was not sure what to think. So I started writing to find out what I would think. And I think it's not complicated anymore. I think that multi-cloud is a reality in most companies. I think that many mainstream non-startup companies are really worried about cloud lock-in, and that's not entirely unreasonable. So it's a reasonable thing to think about, and it's a reasonable thing to try and find the right balance between avoiding lock-in and not slowing yourself down. And the issues were interesting. What was surprising is that I published that blog piece saying what I thought were some kind of controversial things, and I got no pushback,

Starting point is 00:05:09 which is why I started talking to you and saying, Corey, you know, does nobody disagree with this? Do you disagree with this? Maybe we should have a talk and see if this is just the new conventional wisdom. There's nothing worse than almost trying to pick a fight, but no one actually winds up taking you up on the opportunity. That always feels a little off. Let's break it down into two issues, because I would argue that they are intertwined, but not necessarily the same thing. Let's start with multi-cloud, because it turns out that there's just enough nuance to, at least where I sit on this position, that whenever I tweet about it, I wind up getting wildly misinterpreted. Do you find that as well? Not so much. It's not a subject I have really had wildly misinterpreted. Do you find that as well? Not so much. It's not a subject I have really had much to say about, but it does mean lots of different things. And so it's not totally surprising that that happens. I mean, some people

Starting point is 00:05:54 think when you say multi-cloud, you mean, well, I'm going to take my strategic application and I'm going to run it in parallel on AWS and GCP because that way I'll be more resilient and other good things will happen. And then there's another thing, which is that, well, you know, as my company grows, I am naturally going to be using lots of different technologies, and that might include more than one cloud. So there's a whole spectrum of things that multi-cloud could mean. So I guess when we talk about it, we probably owe it to our audiences to be clear what we're talking about. Let's be clear. From my perspective, the common definition of multi-cloud is whatever the person talking is trying to sell you at that point in time is, of course, what multi-cloud is. If it's a third-party dashboard, for example, oh yeah,

Starting point is 00:06:35 you want to be able to look at all of your cloud usage on a single pane of glass. If it's a certain, well, I guess certain not a given cloud provider, well, they understand if you go all in on a cloud provider, it's probably not going to be them. So, they understand if you go all in on a cloud provider, it's probably not going to be them. So they're, of course, going to talk about multi-cloud. And if it's AWS, where they are the 8,000-pound gorilla in the space, oh yeah, multi-cloud's terrible. Put everything on AWS at the end.

Starting point is 00:06:57 It seems that most people who talk about this have a very self-serving motivation that they can't entirely escape. That bias does reflect itself. That's true. When I joined AWS, which was around 2014, the PR line was a very hard line. Well, multi-cloud, that's not something you should invest in. And I've noticed that the conversational line has become much softer. And I think one reason for that is that going all in on a single cloud is at least possible when you're a startup. But if you're a big company, you know, insurance company, a tire manufacturer, that kind of thing, you're going to be multi-cloud for the same reason that

Starting point is 00:07:34 they already have COBOL on the mainframe and Java on the old sunboxes and Mongo running somewhere else and five different programming languages. And that's just the way big companies are. It's a consequence of M&A, it's a consequence of research projects that succeeded one kind or another. I mean, lots of big companies have been trying to get rid of COBOL for decades, literally, and not succeeding in doing that. It's legacy, which is, of course, the condescending engineering term for it makes money. And works. And so I don't think it's realistic to, as a matter of principle, not be multi-cloud. Let's define our terms a little more closely, because very often people

Starting point is 00:08:12 like to pull strange gotchas out of the air. Because when I talk about this, I'm talking about, like when I speak about it off the cuff, I'm thinking in terms of where do I run my containers? Where do I run my virtual machines? Where does my database live? But you can also move in a bunch of different directions. Where do my Git repositories live? What office suite am I using? What am I using for my CRM, et cetera, et cetera? Where do you draw the boundary lines? Because it's very easy to talk past each other if we're not careful here. Right. And, you know, let's grant that if you're a mainstream enterprise, you're running your office automation on Microsoft and they're twisting your arm to use the cloud version,

Starting point is 00:08:54 so you probably are. And if you have any sense at all, you're not running your own exchange server. So let's assume that you're using Microsoft Azure for that and you're running Salesforce, and that means you're on Salesforce's cloud. And a lot of other software as a service offerings might be on AWS or Azure or GCP. They don't even tell you. So I think probably the crucial issue that we should focus our conversation on is my own apps, my own software that is my core competence that I actually use to run the core of my business. And typically, that's the only place where a company would and should invest serious engineering resources to build software. And that's where the question comes, where should that software that I'm going to build run? And should it run

Starting point is 00:09:36 on just one cloud? I found that when I gave a conference talk on this in the before times, I had to have an ever-lengthier section about, I'm speaking in the general sense, there are specific cases where it does make sense for you to go in a multi-cloud direction. And when I'm talking about multi-cloud, I'm not necessarily talking about workload A lives on Azure and workload B lives on AWS through mergers or weird corporate approaches or shadow IT that, surprise, that's now revenue-bearing. Well, I guess we have to live with it. There are a lot of different divisions doing different things, and you're going to see that a fair bit. And I'm not

Starting point is 00:10:15 convinced that's a terrible idea as such. I'm talking about the single workload that we're going to spread across two or more clouds intentionally? That's probably not a good idea. I just can't see that being a good idea, simply because you get into a problem of just terminology and semantics. You know, the different providers mean different things by the word region and the word instance and things like that. And then there's the people problem. I mean, I don't think I personally know anybody who would claim to be able to build and deploy an application on AWS and also on GCP. I'm sure such people exist, but I don't know any of them.

Starting point is 00:10:54 Well, Forrest Brazeal was deep in the AWS weeds, and now he's the head of content at Google Cloud. I will credit him that he probably has learned to smack an API around over there. But, you know, you're going to have a hard time hiring a person like that. Yeah, you can count these people almost as individuals. And that's a big problem. And, you know, in a lot of cases, it's clearly the case that our profession is talent-starved. I mean, the whole world is talent-starved at the moment, but our profession in particular.

Starting point is 00:11:16 And a lot of the decisions about what you can build and what you can do are highly contingent on who you can hire. And if you can't hire a multi-cloud expert, well, you should not deploy a multi-cloud application. Having said that, I just want to dot this I here and say that it can be made to kind of work. I've got this one company I advise. I wrote about it in the blog piece that used to be on AWS and switched over to GCP. I don't even know why this happened before I joined them. And they have a lot of applications. And then they have some integrations with third-party partners, which they implemented with AWS Lambda functions. So when they moved over to GCP, they didn't stop doing that. So this mission-critical,

Starting point is 00:11:53 latency-sensitive application of theirs runs on GCP and then calls out to AWS to make calls into their partner's APIs and so on, and works fine. Solid as a rock, reliable, low latency. And so I talked to a person I know, who knows, over on the AWS side, and they said, oh yeah, sure, we talk to those guys. Lots of people do that. We make sure the connections are low latency and solid. So technically speaking, it can be done. But for a variety of business reasons,

Starting point is 00:12:21 maybe the most important one being expertise and who you can hire, it's probably just not a good idea. One of the areas where I think is an exception case is if you are a SaaS provider, let's pick a big, easy example, Snowflake, where they are a data warehouse. They've got to run their data warehousing application in all of the major clouds because that is where their customers are. And it turns out that if you're going to send a few petabytes into a data warehouse, you really don't want to be paying cloud egress rates to do it, because it turns out you can just bootstrap a second company for that much money. Well, Zoom would be another example, obviously. Oh, yeah. Anything that's heavy on

Starting point is 00:13:01 data transfer is going to be a strange one. And being close to customers, gaming companies are another good example on this, where a lot of the game servers themselves will be spread across a bunch of different providers just purely based on latency metrics around what is close to certain customer clusters. segment that is of people who are, I think you're talking about core technology companies. Now, of the potential customers of the cloud providers, how many of them are core technology companies like the kind we're talking about who have such a need? And how many people are people who just want to run their manufacturing and product design and stuff? And for those, buying into a particular cloud is probably a perfectly sensible choice. I've also seen regulatory stories about this. I haven't been able to track them down specifically, but there is a pervasive belief that one interpretation of UK banking regulations stipulates that you have to be able to get back up and running within 30 days

Starting point is 00:14:00 on a different cloud provider entirely. And also they have the regulatory requirement that I believe the data remain in-country. So that's a little odd. And honestly, when it comes to best practices and how you should architect things, I'm going to take a distinct backseat to legal requirements imposed upon you by your regulator.

Starting point is 00:14:20 Let's be clear here. I'm not advising people to go and tell their auditors that they're wrong on these things. I had not heard that story, but, you know, it sounds plausible. So I wonder if that is actually in effect, which is to say, could a huge British banking company, in fact, do that? Could they, in fact, decamp from Azure and move over to GCP or AWS in 30 days. Boy. That is what one bank I spoke to over there was insistent on. A second bank I spoke to in that same jurisdiction had never heard of such a thing. So I feel like a lot of this is subject to auditor interpretation. Again, I am not an expert in this

Starting point is 00:14:58 space. I do not pretend to be. I know I'm that rarest of all breeds, a white guy with a microphone in tech who admits he doesn't know something. But here we are. Yeah, I mean, I imagine it could be plausible if you didn't use any higher level services and you just, you know, rented instances and were careful about which version of Linux you ran and were just running a bunch of Java code, which actually, you know, describes the workload of a lot of financial institutions. So it would just be a matter of getting all the right instances configured and the JVM configured and launched. I mean, there are no architecturally terrifying barriers to doing that. Of course, to do that, it would mean you would have to avoid using any of the higher level services that are particular to any cloud provider and basically just treat them as people you rent boxes from, which is probably not a good choice for other business reasons.

Starting point is 00:15:52 Which can also include things as seemingly low-level as load balancers. Just based upon different provisioning modes, failure modes, and the rest, you're probably going to have a more consistent experience running HAProxy or Nginx yourself to do it. But Tim, I have it on good authority that this is the old way of thinking and that Kubernetes solves all of it. And through the power of containers and powers combining and whatnot, that frees us from being beholden to any given provider

Starting point is 00:16:22 and our workloads are now all free as birds. Well, I will go as far as saying that if you are in the position of trying to be portable, probably using containers is a smart thing to do because it's a more attractable level of abstraction that does give you some insulation from, you know, which version of Linux you're running and things like that. The proposition that configuring and running

Starting point is 00:16:46 Kubernetes is easier than configuring and running JVM on Linux is unsupported by any evidence I've seen. So I'm dubious of the proposition that operating at the Kubernetes level, at the instance level, you know, there's good reasons why some people want to do that. But I'm dubious of the proposition that really makes you more portable in any essential way. Well, you're also not the target market for Kubernetes. You have worked at multiple cloud providers, and I feel like the real advantage of Kubernetes is people who haven't who want to pretend that they do, so they can act as a sort of a cosplay of being their own cloud provider by running all the intricacies of Kubernetes. I'm halfway kidding, but there is an uncomfortable element of truth to that with some of the conversations I've had with some of

Starting point is 00:17:29 its more, shall we say, fanatical adherence. Well, I think you and I are neither of us huge fans of Kubernetes, but my reasons are maybe a little different. Kubernetes does some really useful things. It really, really does. It allows you to take N VMs and pack M different applications onto them in a way that takes reasonably good advantage of the processing power they have. And it allows you to have different things running in one place with different IP addresses. It sounds straightforward, but that turns out to be really helpful in a lot of ways. So I'm actually kind of sympathetic with what Kubernetes is trying to be. My big gripe with it is that I think that good technology should make easy things easy and difficult things possible. And I think Kubernetes fails the first test there. I think the complexity

Starting point is 00:18:16 that it involves is out of balance with the benefits you get. There's a lot of really, really smart people who disagree with me. So this is not a hill I'm going to die on. This is very much one of those areas where reasonable people can disagree. I find the complexity to be overwhelming. It has to collapse. At this point, finding someone who can competently run Kubernetes in production is a bit hard to do, and they tend to be extremely expensive. You aren't going to find a team of those people at every company that wants to do things like this, and they're certainly not going to be able to find it in their budget in many cases. So it's a challenging thing to do. Well, that's true. And the other thing is that

Starting point is 00:18:58 once you step onto the Kubernetes slope, you start looking about Istio and Envoy and fabric technology, and we're talking about extreme complexity squared at that point. But, you know, here's the thing. Back in 2018, I think it was, at his keynote, Werner said that the big goal is that all the code you ever write should be application logic that delivers business value. Didn't CGI say the same thing? Isn't there like a long history dating back longer than I believe either of us have been alive of,

Starting point is 00:19:27 with this, all you're going to write is business logic. That was the Java promise. That was the Google App Engine promise. Again and again, we've had that carrot dangled in front of us. And it feels like the reality with Lambda is the only code you will write is not necessarily business logic. It's getting the thing to speak to the other service you're trying to get it to talk to because a lot of those integrations are super finicky, at least back when I started learning how this stuff worked, they were. People understand where the pain points are

Starting point is 00:19:51 and are indeed working on them. But I think we can agree that if you believe in that as a goal, which I still do, I mean, we may not have got there, but it's still a worthwhile goal to work on. We can agree that wrangling Istio configurations is not such a thing. It's not directly value-adding business logic. To the extent that you can do that, I think serverless provides a plausible way forward. Now, you can be all cynical about, well, I still have trouble making my Lambda talk to my other thing. But I've done that, and I've also deployed JVM on bare metal kind of thing. You know, I'd rather do things at the Lambda level. I really rather would.

Starting point is 00:20:26 Because capacity forecasting is a horribly difficult thing. We're all terrible at it. And the penalties for being wrong are really bad. If you underspecify your capacity, your customers have a lousy experience. And if you overspecify it, and you have an architecture that makes you configure for peak load, you're going to spend bucket loads of money that you don't need to. But you're then putting your availability in the cloud provider's hands.

Starting point is 00:20:49 Yeah, you already were. Now we're just being explicit about acknowledging that. Yeah, yeah, absolutely. And that's highly relevant to the current discussion because if you use the higher level serverless functions, if you decide, okay, I'm going to go with Lambda and Dynamo and EventBridge and that kind of thing, well, that's not portable at all. I mean, the APIs are totally idiosyncratic for AWS and GCP's equivalent and Azure's, what do they call it, permanent functions or something

Starting point is 00:21:15 or other functions. So yeah, that's part of the trade-off you have to think about. If you're going to do that, you're definitely not going to be multi-cloud in that application. And in many cases, one of the stated goals for going multi-cloud is that you can avoid the downtime of a single provider. People love to point at the big AWS outages or see they were down for half a day. And there is a societal question of what happens when everyone is down for half a day at the same time. But in most cases, what I'm seeing, you're instead of getting rid of a single point of failure, you're introducing a second one. If either one of them is down, your application's down, so you've doubled your outage on AWS. So if you can't process payments, does it really matter that your website stays up? It becomes an interesting question. And those are the ones that you know about, let alone the

Starting point is 00:22:15 third and fourth order dependencies that are almost impossible to map unless everyone is as diligent as you are. It's a heavy, heavy lift. I'm going to push back a little bit. Now, for example, this company I'm advising that's running GCP and calling out to Lambda is in that position. Either GCP or Lambda goes off the air. On the other hand, if you look at somebody like Zoom, they're probably running parallel full stacks on the different cloud providers. And if you're doing that, then you can at least plausibly claim that you're in a good place because if Dynamo has an outage and everything relies on Dynamo, then you can at least plausibly claim that you're in a good place because if Dynamo has an outage and everything relies on Dynamo, then you shift your load over to GCP or Oracle,

Starting point is 00:22:50 and you're still on the air. Yeah, but what is up as well? Because Zoom loves to sign me out on my desktop whenever I log into it on my laptop and vice versa, and I wonder if that authentication and login system is also replicated full stack to everywhere it goes and what the fencing on that looks like and how the communication between all those things works. I wouldn't doubt that it's possible that they've solved for this, but I also wonder how thoroughly they've really

Starting point is 00:23:14 tested all of it too. Not because I question them any, just because this stuff is super intricate as you start tracing it down into the nitty gritty levels of the madness that consumes all these abstractions. Well, right. That's a conventional wisdom that is really wise and true, which is that if you have software that is alleged to do something like allow you to get going on another cloud, unless you've tested it within the last three weeks, it's not going to work when you need it. Oh, it's like a DR exercise. The next commit you make breaks it once you have the thing working again. It sits around as a binder, and it's a best guess. And let's be serious, a lot of these DR exercises presume that you're able to, for example, change DNS records on the fly or be able to get a virtual machine provision

Starting point is 00:23:55 in less than 45 minutes, because when there's an actual outage surprise, everyone's trying to do the same things. There's a lot of stuff in there that gets really wonky at weird levels. A related similar exercise, which is people who want to be on AWS, but want to be multi-region. It's actually a fairly similar kind of problem. If I need to be able to fail out of US East 1, well, God help you, because if you need to, everybody else needs to as well. But would that work? Before you go multi-cloud, go multi-region first. Tell me how easy it is, because then you have full feature parity, presumably, between everything. It should just be a walk in the park. Send me a postcard once you get that set up, and I'll eat a bunch of words. And it turns out basically no one does.

Starting point is 00:24:35 Another area of lock-in around a lot of this stuff, and a thing that makes it very hard to go multi-cloud, is the security model of how does that interface with various aspects. And in many cases, I'm seeing people doing full-on network overlays. They don't have to worry about the different security group models and VPCs and all the rest. They can just treat everything as a node sitting on the internet, and the only thing it talks to is an overlay network, which is terrible. But that seems to be one of the only ways people are able to build things that span multiple providers with any degree of success. That is painful because much and all, as we all like to scoff at and so on, and the degree of complexity you get into there, it is the case

Starting point is 00:25:14 that your typical public cloud provider can do security better than you can. They just can't. It's a fact of life. And if you're using a public cloud provider and not taking advantage of their security offerings infrastructure, that's probably dumb. But if you really want to be multi-cloud, you kind of have to, as you said. In particular, this gets back to the problem of expertise, because it's hard enough to hire somebody who really understands IAM deeply and how to get that working properly. Try and find somebody who can understand that level of thing on two different cloud providers at once. Oh, gosh. This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into

Starting point is 00:25:50 production. I'm going to just guess that it's awful because it's always awful. No one loves their deployment process. What if launching new features didn't require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren't what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you and watch for the wince. Another point you made in your blog post was the idea of lock-in, of people being worried that going all-in on a provider was setting them up to be, I think, oracled is the term that was tossed around, where once you're dependent on a provider, what's to stop them from cranking the pricing knobs until you squeal?

Starting point is 00:26:36 Nothing. And I think that is a perfectly sane thing to worry about. Now, in the short term, based on my personal experience working with AWS leadership, I think that it's probably not a big short-term risk. AWS is clearly aware that most of the growth is still in front of them. The amount of all of IT that's on the cloud is still pretty small. And so the thing to worry about right now is growth. And they are really, really genuinely, sincerely focused on customer success and will bend over backwards to deal with the customer's problems as they are. And I've seen places where people have negotiated a huge multi-year enterprise agreement based on reserved instances or something like that and then realize, oh, wait, we need to switch our whole technology stack, but you've got us by the RIs and AWS will say, no, no, it's okay. We'll tear that up and rewrite it and get you where you need to go. So in the short term, between now and 2025, would I worry about my cloud provider doing that?

Starting point is 00:27:32 Probably not so much. But let's go a little further out. Let's say it's, you know, 2030 or something like that. And at that point, you know, Andy Jassy decided to be a full-time sports mogul and Satya Nadella has gone off to be a recreational sailboat owner or something like that. And private equity operators come in and take very significant stakes in the public cloud providers and get a lot of their guys on the board. And you have a very different dynamic, and you have something that starts to feel like Oracle, where their priority isn't, you know, optimizing for growth and customer success. Their priority is optimizing for a quarterly bottom line. Revenue extraction becomes the goal.

Starting point is 00:28:10 That's absolutely right. And this is not a hypothetical scenario. It's happened. Most large companies do not control the amount of money they spend per year to have desktop software that works. They pay whatever Microsoft's going to say they pay because they don't have a choice. And a lot of companies are in the same situation with their database.

Starting point is 00:28:26 They don't get to budget their database budget. Oracle comes in and says, here's what you're going to pay, and that's what you pay. You really don't want to be in that situation with your cloud. And that's why I think it's perfectly reasonable for somebody who is doing cloud transition at a major financial or manufacturing or service provider company to have an eye to this. You know, let's not completely ignore the lock-in issue. There is a significant scale with enterprise deals and contracts. There is almost always a contractual provision that says if you're going to raise a price with any cloud provider,

Starting point is 00:29:00 there's a fixed period of time of notice you must give before it happens. I feel like the first mover there winds up getting soaked because everyone is going to panic and migrate in other directions. I mean, Google tried it with Google Maps for their API and not quite Google Cloud, but also scared the bejesus out of a whole bunch of people who were, wait, is this a harbinger of things to come? Well, not in the short term, I don't think. And I think, you know, Google Maps is absurdly underpriced. That's a hellishly expensive service. And it's supposed to pay for itself by advertising on Maps. I don't know about that.

Starting point is 00:29:33 I would see that as the exception rather than the rule. I think that it's reasonable to expect cloud prices nominally at least to go on decreasing for at least the short term, maybe even the medium term. But that can't go on forever. It also feels to me like, having looked at an awful lot of AWS environments, that if there were to be some sort of regulatory action or some really weird outage for a year, that meant that AWS could not onboard a single new customer. Their revenue year over year would continue to increase purely by organic growth because there is no forcing function that turns a thing off when you're done using it.

Starting point is 00:30:12 In fact, they could migrate things around to hardware that works. They can continue billing you for the thing that's sitting there idle. And there is no governance path on that. So on some level, winding up doing a price increase is going to cause a massive company focus on fixing a lot of that. It feels on some level like it is drawing attention to a thing that they don't really want to draw attention to from a purely revenue extraction story. When CentOS backwalked their 10-year support line to two years, suddenly with an idea that it would drive RHEL adoption.

Starting point is 00:30:45 Well, suddenly a lot of people looked at their environment and saw they had old RHEL instances they weren't using and massively short-sighted, massively irritated a whole bunch of people who needed that in the short term, but by the renewal, we're going to be onto Ubuntu or something else. It feels like it's going to backfire massively. And I'd like to imagine the strategists

Starting point is 00:31:02 of whoever takes the reins of these companies is going to be smarter than that. But here we are. Here we are. And it's interesting you should mention regulatory action. At the moment, there are only three credible public cloud providers. And it's not obvious that Google's really in it for the long haul. Last I checked, they were claiming to maybe be breaking even on it. That's not a good number. You'd like there to be more than that. And if it goes on like that, eventually some politician is going to say, oh, maybe they should be regulated like public utilities because they kind of are, right? And I would think that anybody who did get into oracleizing would be, you know, accelerate that

Starting point is 00:31:38 happening. Having said that, we do live in the atmosphere of 21st century capitalism, and growth is the god that must be worshipped at all costs. Who knows? It's a cloudy future. Hard to see. It really is. I also want to be clear on some level that with Google's current position, if they weren't taking a small loss, at least, on these things, I would worry. Like, wait, you're trying to catch AWS AWS and you don't have anything better to invest that money into than just, well, time to start taking profits from it. So I can see both sides of that one. Right. As I keep saying, I've already said once during this slot, you know, the total cloud spend in the world is probably on the order of one or 200 billion per annum and global IT

Starting point is 00:32:19 is in multiple trillions. So there's a lot more space for growth, years and years worth of it. Now, the challenge too, is that people are worried about this from a long-term strategic point of view. So one thing you talked about in your blog post is the idea of using hosted open source solutions. Like instead of using Kinesis, you'd wind up using Kafka.

Starting point is 00:32:39 Or instead of using DynamoDB, you'd use their managed Cassandra service, or as I think of it, Amazon Basics Cassandra. And effectively going down the path of letting them manage this thing, but you then have a theoretical exodus path. Where do you land on that? I think that speaks to a lot of people's concerns, and I've had conversations with really smart people about that who like that idea. Now, to be realistic, it doesn't make migration easy because you've still got all the CI and CD and monitoring and management and scaling and alarms and alerts and paging and et cetera, et cetera, et cetera, wrapped around it. So it's not as though you could just pick up your managed Kafka off AWS and drop a huge installation onto GCP easily.

Starting point is 00:33:26 But at least your data plan APIs are the same, so a lot of your code would probably still run okay. So it's a plausible path forward. And when people say, I want to do that, well, it does mean that you can't go all serverless, but it's not a totally insane path forward. So one last point in your blog post that I think a lot of people think about only after they get bitten by it is the idea of data gravity. I alluded earlier in our conversation to data egress charges,

Starting point is 00:33:58 but my experience has been that where your data lives is effectively where the rest of your cloud usage tends to aggregate. How do you see it? Well, it's a real issue, but I think it might perhaps be a little overblown. People throw the term petabytes around, and people don't realize how big a petabyte is. A petabyte is just an insanely huge amount of data, and the notion of transmitting one over the internet is terrifying. And there are lots of enterprises that have multiple petabytes around. And so they think, well, you know, it would take me 26 years to transmit that, so I can't. And they might be wrong. The internet's getting faster all the time. Did you notice? I've been able to move some purely personal projects,

Starting point is 00:34:41 insane amounts of data, and it gets there a lot faster than you did. Secondly, in the case of AWS Snowmobile, we have an existence proof that you can do exabyte-ish scale data transfers in time it takes to drive a truck across the country. Inbound only snowmobiles are not, at least according to all public examples, are valid for Exodus. But you know, this is kind of a place where regulatory action might come into play if what the people were doing was seen to be abusive. I mean, there's an existence proof you can do this thing. But here's another point.

Starting point is 00:35:12 So suppose you have like 15 petabytes. That's an insane amount of data displayed in your key corporate application. So are you actually using that to run the application? Or is a huge proportion of that stuff just logs and data gathered of various kinds that's being used

Starting point is 00:35:27 in analytics applications and AI models and so on? Do you actually need all that data to actually run your app? And could you, in fact, just pick up the stuff you need for your app, move it to a different cloud provider, run there,

Starting point is 00:35:39 and leave your analytics on the first one? Not a totally insane idea. It's not a terrible idea at all. It comes down to the idea as well of when you're trying to run a query against a bunch of that data, do you need all the data to transit or just the results of that query as well? It's a question of can you move the compute closer to the data as opposed to move the data to where the compute lives? Well, you know, and a lot of those people who have those huge data pools

Starting point is 00:36:01 have it sitting on S3, and a lot of it migrated off into Glacier, so it's not as if you could get at it in milliseconds anyhow. I just ask myself, how much data can anybody actually use in a day in the course of satisfying some transaction request from a customer? And I think it's not data bytes. It just isn't. Now, okay, there are exceptions. There's the intelligence community. there's the oil drilling community, there are some communities who genuinely will use insanely huge seas of data on a routine basis. But I think that's kind of a corner case. So before you shake your head and say, ah, they'll never move because of data gravity, you need to prove that to me. And I might be a little bit skeptical. And I think that that is probably a very fair request.

Starting point is 00:36:47 Just tell me what it is you're going to be doing here to validate the idea that is in your head. Because the most interesting lies I've found customers tell isn't intentionally to me or anyone else. It's to themselves. The narrative of what they think they're doing from the early days takes root. And never mind the fact that, yeah, it turns out that now that you've scaled out, maybe development isn't 80% of your cloud bill anymore. You learn things and your understanding of what you're doing has to evolve with the evolution of the applications. Yep. It's a fun time to be around. I mean, it's so great. Right at the moment, lock-in isn't that big an issue.

Starting point is 00:37:25 And let's be clear. I'm sure you agree with me on this, Corey, is if you're a startup and you're trying to grow and scale and prove you've got a viable business and sure that you have exponential growth and so on, don't think about lock-in. Just don't go near it. Pick a cloud provider. Pick whichever cloud provider your CTO already knows how to use and just go all in on them and use all their most advanced features and be serverless if you can. It's the only sane way forward. You're short of time. You're short of money. You need growth. Well, what if you need to move strategically in five years? You should be so lucky.

Starting point is 00:37:57 Great. Deal with it then. Or, well, what if we want to sell to retail as our primary market and they hate AWS? Well, go all in on provider. Probably not that one. Pick a different provider and go all in. I do not care which cloud any given company picks. Go with what's right for you.

Starting point is 00:38:16 But then go all in. Because until you have a compelling reason to do otherwise, you're going to spend more time solving global problems locally. That's right. And we've never actually said this because probably because it's something that both you and I know at the core of our being, but it probably needs to be said that being multi-cloud is expensive, right? Because the nouns and verbs that describe what clouds do are different in Google land and AWS land. They're just different. And it's hard to think about those things. And you lose the capability of using the advanced serverless stuff.

Starting point is 00:38:48 There are a whole bunch of costs to being multi-cloud. Now, maybe if you're existentially afraid of lock-in, you don't care. But for, I think, most normal people, it's expensive. Pay now, pay later, you will pay. Wouldn't you ideally like to see

Starting point is 00:39:02 that dollar go as far as possible? I'm right there with you. Because it's not just the actual infrastructure cost that's expensive. It costs something far more dear and expensive. And that is the cognitive expense of having to think about both of these things. Not just how each cloud provider works, but how each one breaks. You've done this stuff longer than I have. I don't think that either of us trust a system that we don't understand the failure cases for and how it's going to degrade. It's, oh, great, you built something new and awesome. Awesome.

Starting point is 00:39:32 How does it fall over? What direction is it going to hit? So what side should I not stand on? It's based on understanding of what you're about to blow holes in. That's right. And, you know, I think particularly if you're using AWS heavily,

Starting point is 00:39:44 you know that there are some things that you might as well bet your business on because, you know, if they're down, so is the rest of the world and who cares. And other things, maybe a little chancier. So understanding failure modes, understanding your stuff, you know, the cost sharp edges, understanding manageability issues, it's not obvious. It's really not. Tim, I want to thank you for taking the time to go through this, frankly, excellent post with me. If people want to learn more about how you see things, and I guess how you view the world, where's the best place to find you? Well, I'm on Twitter, just Tim Bray, T-I-M-B-R-A-Y, and my blog's at tbray.org.

Starting point is 00:40:22 And that's where that piece you were just talking about is. And that's kind of my online presence. And we will, of course, put links to it in the show notes. Thanks so much for being so generous with your time. It's always a pleasure to talk to you. Well, it's always fun to talk to somebody who has shared passions, and we clearly do. Indeed.

Starting point is 00:40:40 Tim Bray, Principal at Textuality Services. I'm cloud economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment that you then need to take to all of the other podcast platforms out there, purely for redundancy, so you don't get locked into one of them. If your AWS bill keeps rising and your blood pressure is doing the same, then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor

Starting point is 00:41:27 recommendations to your business, and we get to the point. Visit duckbillgroup.com to get started. This has been a humble pod production stay humble

Screaming in the Cloud - The Multi-Cloud Counterculture with Tim Bray

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.