Screaming in the Cloud - The Multi-Cloud Counterculture with Tim Bray
Episode Date: April 5, 2022About TimTimothy William Bray is a Canadian software developer, environmentalist, political activist and one of the co-authors of the original XML specification. He worked for Amazon Web Serv...ices from December 2014 until May 2020 when he quit due to concerns over the terminating of whistleblowers. Previously he has been employed by Google, Sun Microsystemsand Digital Equipment Corporation (DEC). Bray has also founded or co-founded several start-ups such as Antarctica Systems.Links Referenced:Textuality Services: https://www.textuality.com/laugh]. So, the impetus for having this conversation is, you had a [blog post: https://www.tbray.org/ongoing/When/202x/2022/01/30/Cloud-Lock-In@timbray: https://twitter.com/timbraytbray.org: https://tbray.orgduckbillgroup.com: https://duckbillgroup.com
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
This episode is sponsored in part by our friends at Vulture, spelled V-U-L-T-R,
because they're all about helping save money, including on things like, you know, vowels.
So what they do is they are a cloud provider that provides surprisingly high
performance cloud compute at a price that, well, sure, they claim it is better than AWS's pricing.
And when they say that, they mean that it's less money. Sure, I don't dispute that. But what I find
interesting is that it's predictable. They tell you in advance on a monthly basis what it's going
to cost. They have a bunch of advanced networking features. They tell you in advance on a monthly basis what it's going to cost.
They have a bunch of advanced networking features.
They have 19 global locations and scale things elastically, not to be confused with openly,
which is apparently elastic and open.
They can mean the same thing sometimes.
They have had over a million users.
Deployments take less than 60 seconds across 12 pre-selected operating systems,
or if you're one of those nutters like me,
you can bring your own ISO and install basically any operating system you want.
Starting with pricing as low as $2.50 a month
for Vulture Cloud Compute,
they have plans for developers and businesses of all sizes,
except maybe Amazon,
who stubbornly insists on having something of the scale on their
own. Try Vulture today for free by visiting vulture.com slash screaming, and you'll receive
$100 in credit. That's v-u-l-t-r dot com slash screaming. Couchbase Capplla database as a service is flexible, full featured and fully managed
with built in access via key value, SQL and full text search.
Flexible JSON documents aligned to your applications and workloads.
Build faster with blazing fast in memory performance and automated replication and scaling while
reducing cost.
Capella has the best price performance of any fully managed document database.
Visit couchbase.com slash screaming in the cloud to try Capella today for free and be up and running in three minutes with no credit card required.
Couchbase Capella.
Make your data sing.
Welcome to Screaming in the Cloud.
I'm Corey Quinn.
My guest today has been on a year or two ago, but today we're going in a bit of a different
direction.
Tim Bray is a principal at Textuality Services.
Once upon a time, he was a distinguished engineer slash VP at AWS, but let's be clear,
he isn't solely focused on one company. He also used to work at Google. Also, there is scuttlebutt
that he might have had something to do at one point with the creation of God's true language,
XML. Tim, thank you for coming back on the show and suffering my slings and arrows.
You're just fine. Glad to be here.
So the impetus for having this conversation is you had a blog post somewhat recently,
by which I mean January of 2022, where you talked about lock-in and multi-cloud,
two subjects near and dear to my heart, mostly because I have what I thought was a
fairly countercultural opinion. You seem to have a very closely aligned perspective on this, but
let's not get too far ahead of ourselves. Where did this blog post come from?
Well, I advise a couple of companies, and one of them happens to be using GCP, and the other happens to be using AWS.
And I get involved in a lot of industry conversations.
And I noticed that multi-cloud is a buzzword.
If you go and type multi-cloud into Google, you get like a page of people saying, we will solve your multi-cloud problems.
Come to us, and you will be multi-cloud.
And I was not sure what to think. So I started
writing to find out what I would think. And I think it's not complicated anymore. I think that
multi-cloud is a reality in most companies. I think that many mainstream non-startup companies
are really worried about cloud lock-in, and that's not entirely unreasonable.
So it's a reasonable thing to think about, and it's a reasonable thing to try and find the right balance between avoiding lock-in and not slowing yourself down.
And the issues were interesting.
What was surprising is that I published that blog piece saying what I thought were some kind of controversial things,
and I got no pushback,
which is why I started talking to you and saying, Corey, you know, does nobody disagree with this?
Do you disagree with this? Maybe we should have a talk and see if this is just the new conventional wisdom. There's nothing worse than almost trying to pick a fight, but no one actually
winds up taking you up on the opportunity. That always feels a little off. Let's break it down into two issues,
because I would argue that they are intertwined, but not necessarily the same thing. Let's start
with multi-cloud, because it turns out that there's just enough nuance to, at least where I
sit on this position, that whenever I tweet about it, I wind up getting wildly misinterpreted.
Do you find that as well? Not so much. It's not a subject I have really had wildly misinterpreted. Do you find that as well?
Not so much. It's not a subject I have really had much to say about, but it does mean lots of different things. And so it's not totally surprising that that happens. I mean, some people
think when you say multi-cloud, you mean, well, I'm going to take my strategic application and
I'm going to run it in parallel on AWS and GCP because that way I'll be more resilient and other
good things will happen.
And then there's another thing, which is that, well, you know, as my company grows, I am naturally going to be using lots of different technologies, and that might include more than
one cloud. So there's a whole spectrum of things that multi-cloud could mean. So I guess when we
talk about it, we probably owe it to our audiences to be clear what we're talking about.
Let's be clear. From my perspective, the common definition of multi-cloud is whatever the person talking is trying to sell you at that point in time is,
of course, what multi-cloud is. If it's a third-party dashboard, for example, oh yeah,
you want to be able to look at all of your cloud usage on a single pane of glass. If it's a certain,
well, I guess certain not a given cloud provider, well, they understand if you go all in on a cloud
provider, it's probably not going to be them. So, they understand if you go all in on a cloud provider,
it's probably not going to be them.
So they're, of course, going to talk about multi-cloud.
And if it's AWS, where they are the 8,000-pound gorilla in the space,
oh yeah, multi-cloud's terrible.
Put everything on AWS at the end.
It seems that most people who talk about this
have a very self-serving motivation that they can't entirely escape. That bias does reflect itself.
That's true. When I joined AWS, which was around 2014, the PR line was a very hard line. Well,
multi-cloud, that's not something you should invest in. And I've noticed that the conversational
line has become much softer. And I think one reason for that is that going all in on a single
cloud is at least
possible when you're a startup. But if you're a big company, you know, insurance company,
a tire manufacturer, that kind of thing, you're going to be multi-cloud for the same reason that
they already have COBOL on the mainframe and Java on the old sunboxes and Mongo running somewhere
else and five different programming languages. And that's just the way big companies are.
It's a consequence of M&A, it's a consequence of research projects that succeeded one kind or another.
I mean, lots of big companies have been trying to get rid of COBOL for decades, literally, and not succeeding in doing that.
It's legacy, which is, of course, the condescending engineering term for it makes money.
And works.
And so I don't think it's realistic to, as a matter of principle,
not be multi-cloud. Let's define our terms a little more closely, because very often people
like to pull strange gotchas out of the air. Because when I talk about this, I'm talking about,
like when I speak about it off the cuff, I'm thinking in terms of where do I run my containers?
Where do I run my virtual machines?
Where does my database live? But you can also move in a bunch of different directions. Where
do my Git repositories live? What office suite am I using? What am I using for my CRM, et cetera,
et cetera? Where do you draw the boundary lines? Because it's very easy to talk past each other
if we're not careful here.
Right. And, you know, let's grant that if you're a mainstream enterprise, you're running your office automation on Microsoft and they're twisting your arm to use the cloud version,
so you probably are. And if you have any sense at all, you're not running your own exchange server.
So let's assume that you're using Microsoft Azure for that and you're running Salesforce,
and that means you're on Salesforce's cloud. And a lot of other software as a service offerings might be on AWS or Azure or GCP. They
don't even tell you. So I think probably the crucial issue that we should focus our conversation
on is my own apps, my own software that is my core competence that I actually use to run the core of
my business. And typically, that's the only place where a company
would and should invest serious engineering resources to build software. And that's where
the question comes, where should that software that I'm going to build run? And should it run
on just one cloud? I found that when I gave a conference talk on this in the before times,
I had to have an ever-lengthier section about,
I'm speaking in the general sense, there are specific cases where it does make sense for you
to go in a multi-cloud direction. And when I'm talking about multi-cloud, I'm not necessarily
talking about workload A lives on Azure and workload B lives on AWS through mergers or
weird corporate approaches or shadow IT that,
surprise, that's now revenue-bearing. Well, I guess we have to live with it. There are a lot
of different divisions doing different things, and you're going to see that a fair bit. And I'm not
convinced that's a terrible idea as such. I'm talking about the single workload that we're
going to spread across two or more clouds intentionally?
That's probably not a good idea. I just can't see that being a good idea, simply because you get
into a problem of just terminology and semantics. You know, the different providers mean different
things by the word region and the word instance and things like that. And then there's the people
problem. I mean, I don't think I personally know anybody who would claim to be able to build and deploy an application on AWS
and also on GCP.
I'm sure such people exist, but I don't know any of them.
Well, Forrest Brazeal was deep in the AWS weeds,
and now he's the head of content at Google Cloud.
I will credit him that he probably has learned to smack an API around over there.
But, you know, you're going to have a hard time hiring a person like that.
Yeah, you can count these people almost as individuals.
And that's a big problem.
And, you know, in a lot of cases, it's clearly the case that our profession is talent-starved.
I mean, the whole world is talent-starved at the moment, but our profession in particular.
And a lot of the decisions about what you can build and what you can do are highly contingent
on who you can hire.
And if you can't hire a multi-cloud expert, well, you should not deploy a multi-cloud application. Having said that, I just want to
dot this I here and say that it can be made to kind of work. I've got this one company I advise.
I wrote about it in the blog piece that used to be on AWS and switched over to GCP. I don't even
know why this happened before I joined them. And they have a lot of applications. And then they
have some integrations with third-party partners, which they implemented with AWS Lambda functions.
So when they moved over to GCP, they didn't stop doing that. So this mission-critical,
latency-sensitive application of theirs runs on GCP and then calls out to AWS to make calls into
their partner's APIs and so on, and works fine. Solid as a rock, reliable, low latency.
And so I talked to a person I know, who knows, over on the AWS side,
and they said, oh yeah, sure, we talk to those guys.
Lots of people do that.
We make sure the connections are low latency and solid.
So technically speaking, it can be done.
But for a variety of business reasons,
maybe the most important one being expertise and who you can hire,
it's probably just not a good idea. One of the areas where I think is an exception case is if you are a SaaS
provider, let's pick a big, easy example, Snowflake, where they are a data warehouse.
They've got to run their data warehousing application in all of the major clouds because
that is where their customers are. And it turns out that
if you're going to send a few petabytes into a data warehouse, you really don't want to be paying
cloud egress rates to do it, because it turns out you can just bootstrap a second company for that
much money. Well, Zoom would be another example, obviously. Oh, yeah. Anything that's heavy on
data transfer is going to be a strange one. And being close to customers, gaming companies are another good example on this, where a lot of the game servers themselves will be spread across a bunch of different providers just purely based on latency metrics around what is close to certain customer clusters. segment that is of people who are, I think you're talking about core technology companies. Now,
of the potential customers of the cloud providers, how many of them are core technology companies
like the kind we're talking about who have such a need? And how many people are people who just
want to run their manufacturing and product design and stuff? And for those, buying into
a particular cloud is probably a perfectly sensible choice.
I've also seen regulatory stories about this.
I haven't been able to track them down specifically,
but there is a pervasive belief that one interpretation of UK banking regulations stipulates that you have to be able to get back up and running within 30 days
on a different cloud provider entirely.
And also they have the regulatory requirement
that I believe the data remain in-country.
So that's a little odd.
And honestly, when it comes to best practices
and how you should architect things,
I'm going to take a distinct backseat
to legal requirements imposed upon you by your regulator.
Let's be clear here.
I'm not advising people to go and tell their auditors
that they're wrong on these things.
I had not heard that story, but, you know, it sounds plausible.
So I wonder if that is actually in effect, which is to say, could a huge British banking company, in fact, do that?
Could they, in fact, decamp from Azure and move over to GCP or AWS in 30 days. Boy. That is what one bank I spoke to over there
was insistent on. A second bank I spoke to in that same jurisdiction had never heard of such a thing.
So I feel like a lot of this is subject to auditor interpretation. Again, I am not an expert in this
space. I do not pretend to be. I know I'm that rarest of all breeds, a white guy with a microphone in tech who admits he doesn't know something. But here we are. Yeah, I mean, I imagine it could be plausible
if you didn't use any higher level services and you just, you know, rented instances
and were careful about which version of Linux you ran and were just running a bunch of Java code,
which actually, you know, describes the workload of a lot of financial institutions.
So it would just be a matter of getting all the right instances configured and the JVM configured and launched.
I mean, there are no architecturally terrifying barriers to doing that.
Of course, to do that, it would mean you would have to avoid using any of the higher level
services that are particular to any cloud provider and basically just treat them as people you rent boxes from, which is probably not a good choice for other business reasons.
Which can also include things as seemingly low-level as load balancers.
Just based upon different provisioning modes, failure modes, and the rest, you're probably going to have a more consistent experience running HAProxy or Nginx yourself to do it.
But Tim, I have it on good authority
that this is the old way of thinking
and that Kubernetes solves all of it.
And through the power of containers
and powers combining and whatnot,
that frees us from being beholden to any given provider
and our workloads are now all free as birds.
Well, I will go as far as saying that
if you are in the position of trying to be portable,
probably using containers is a smart thing to do
because it's a more attractable level of abstraction
that does give you some insulation from, you know,
which version of Linux you're running and things like that.
The proposition that configuring and running
Kubernetes is easier than configuring and running JVM on Linux is unsupported by any evidence I've
seen. So I'm dubious of the proposition that operating at the Kubernetes level, at the
instance level, you know, there's good reasons why some people want to do that. But I'm dubious
of the proposition that really makes you more portable in any essential way.
Well, you're also not the target market for Kubernetes.
You have worked at multiple cloud providers, and I feel like the real advantage of Kubernetes is people who haven't who want to pretend that they do,
so they can act as a sort of a cosplay of being their own cloud provider by running all the intricacies of Kubernetes. I'm halfway kidding, but there is
an uncomfortable element of truth to that with some of the conversations I've had with some of
its more, shall we say, fanatical adherence. Well, I think you and I are neither of us huge
fans of Kubernetes, but my reasons are maybe a little different. Kubernetes does some really
useful things. It really, really does. It allows you to take N VMs and pack M different applications
onto them in a way that takes reasonably good advantage of the processing power they have.
And it allows you to have different things running in one place with different IP addresses.
It sounds straightforward, but that turns out to be really helpful in a lot of ways. So I'm
actually kind of sympathetic with what Kubernetes is trying to be.
My big gripe with it is that I think that good technology should make easy things easy and difficult things possible. And I think Kubernetes fails the first test there. I think the complexity
that it involves is out of balance with the benefits you get. There's a lot of really,
really smart people who disagree with me. So this is not a hill I'm going to die on.
This is very much one of those areas where reasonable people can disagree. I find the
complexity to be overwhelming. It has to collapse. At this point, finding someone who can competently
run Kubernetes in production is a bit hard to do, and they tend to be extremely expensive.
You aren't going to find a team of those people at every company that wants
to do things like this, and they're certainly not going to be able to find it in their budget
in many cases. So it's a challenging thing to do. Well, that's true. And the other thing is that
once you step onto the Kubernetes slope, you start looking about Istio and Envoy and fabric
technology,
and we're talking about extreme complexity squared at that point.
But, you know, here's the thing.
Back in 2018, I think it was, at his keynote, Werner said that the big goal is
that all the code you ever write should be application logic that delivers business value.
Didn't CGI say the same thing?
Isn't there like a long history dating back longer than I believe either of us have been alive of,
with this, all you're going to write is business logic.
That was the Java promise.
That was the Google App Engine promise.
Again and again, we've had that carrot dangled in front of us.
And it feels like the reality with Lambda is the only code you will write is not necessarily business logic.
It's getting the thing to speak to the other service you're trying to get it to talk to
because a lot of those integrations are super finicky, at least back when I started
learning how this stuff worked, they were. People understand where the pain points are
and are indeed working on them. But I think we can agree that if you believe in that as a goal,
which I still do, I mean, we may not have got there, but it's still a worthwhile goal to work
on. We can agree that wrangling Istio configurations is not such a
thing. It's not directly value-adding business logic. To the extent that you can do that,
I think serverless provides a plausible way forward. Now, you can be all cynical about,
well, I still have trouble making my Lambda talk to my other thing. But I've done that,
and I've also deployed JVM on bare metal kind of thing. You know, I'd rather do things at the Lambda level.
I really rather would.
Because capacity forecasting is a horribly difficult thing.
We're all terrible at it.
And the penalties for being wrong are really bad.
If you underspecify your capacity, your customers have a lousy experience.
And if you overspecify it, and you have an architecture that makes you configure for peak load,
you're going to spend bucket loads of money that you don't need to.
But you're then putting your availability
in the cloud provider's hands.
Yeah, you already were.
Now we're just being explicit about acknowledging that.
Yeah, yeah, absolutely.
And that's highly relevant to the current discussion
because if you use the higher level serverless functions,
if you decide, okay, I'm going to go with Lambda and Dynamo
and EventBridge and that kind of thing, well, that's not portable at all. I mean, the APIs are totally idiosyncratic
for AWS and GCP's equivalent and Azure's, what do they call it, permanent functions or something
or other functions. So yeah, that's part of the trade-off you have to think about. If you're going
to do that, you're definitely not going to be multi-cloud in that application.
And in many cases, one of the stated goals for going multi-cloud is that you can avoid the
downtime of a single provider. People love to point at the big AWS outages or see they were
down for half a day. And there is a societal question of what happens when everyone is down
for half a day at the same time. But in most cases, what I'm seeing, you're instead of getting
rid of a single point of failure, you're introducing a second one. If either one of them is down, your application's down, so you've doubled your outage on AWS. So if you can't process payments, does it really matter that your website stays up?
It becomes an interesting question. And those are the ones that you know about, let alone the
third and fourth order dependencies that are almost impossible to map unless everyone is
as diligent as you are. It's a heavy, heavy lift. I'm going to push back a little bit. Now,
for example, this company I'm advising that's running GCP and calling out to Lambda
is in that position. Either GCP or Lambda goes off the air. On the other hand, if you look at
somebody like Zoom, they're probably running parallel full stacks on the different cloud
providers. And if you're doing that, then you can at least plausibly claim that you're in a good
place because if Dynamo has an outage and everything relies on Dynamo, then you can at least plausibly claim that you're in a good place because if Dynamo has an outage and everything relies on Dynamo,
then you shift your load over to GCP or Oracle,
and you're still on the air.
Yeah, but what is up as well?
Because Zoom loves to sign me out on my desktop
whenever I log into it on my laptop and vice versa,
and I wonder if that authentication and login system
is also replicated full stack to everywhere it goes and what the
fencing on that looks like and how the communication between all those things works. I wouldn't doubt
that it's possible that they've solved for this, but I also wonder how thoroughly they've really
tested all of it too. Not because I question them any, just because this stuff is super intricate
as you start tracing it down into the nitty gritty levels of the madness that consumes all
these abstractions.
Well, right. That's a conventional wisdom that is really wise and true, which is that if you have software that is alleged to do something like allow you to get going on another
cloud, unless you've tested it within the last three weeks, it's not going to work when you need
it. Oh, it's like a DR exercise. The next commit you make breaks it once you have the thing working
again. It sits around as a binder, and it's a best guess. And let's be serious, a lot of these DR exercises presume that you're
able to, for example, change DNS records on the fly or be able to get a virtual machine provision
in less than 45 minutes, because when there's an actual outage surprise, everyone's trying to do
the same things. There's a lot of stuff in there that gets really wonky at weird levels.
A related similar exercise, which is people who want to be on AWS, but want to be multi-region.
It's actually a fairly similar kind of problem. If I need to be able to fail out of US East 1,
well, God help you, because if you need to, everybody else needs to as well. But would that
work? Before you go multi-cloud, go multi-region first. Tell me how easy it is, because then you
have full feature parity, presumably, between everything. It should just be a walk in the park. Send me a postcard
once you get that set up, and I'll eat a bunch of words. And it turns out basically no one does.
Another area of lock-in around a lot of this stuff, and a thing that makes it very hard to
go multi-cloud, is the security model of how does that interface with
various aspects. And in many cases, I'm seeing people doing full-on network overlays. They don't
have to worry about the different security group models and VPCs and all the rest. They can just
treat everything as a node sitting on the internet, and the only thing it talks to is an overlay
network, which is terrible. But that seems to be one of the only ways people are able to build
things that span multiple providers with any degree of success. That is painful because much and all, as we all
like to scoff at and so on, and the degree of complexity you get into there, it is the case
that your typical public cloud provider can do security better than you can. They just can't.
It's a fact of life. And if you're using a public cloud provider and not taking advantage of their
security offerings infrastructure, that's probably dumb.
But if you really want to be multi-cloud, you kind of have to, as you said.
In particular, this gets back to the problem of expertise, because it's hard enough to hire somebody who really understands IAM deeply and how to get that working properly.
Try and find somebody who can understand that level of thing on two different cloud providers at once.
Oh, gosh.
This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into
production. I'm going to just guess that it's awful because it's always awful. No one loves
their deployment process. What if launching new features didn't require you to do a full-on code
and possibly infrastructure deploy? What if you could test on a small subset of users and then
roll it back immediately if results aren't what you expect? LaunchDarkly does exactly this. To
learn more, visit launchdarkly.com and tell them Corey sent you and watch for the wince.
Another point you made in your blog post was the idea of lock-in, of people being worried that going all-in on a provider was setting them up to
be, I think, oracled is the term that was tossed around, where once you're dependent on a provider,
what's to stop them from cranking the pricing knobs until you squeal?
Nothing. And I think that is a perfectly sane thing to worry about. Now, in the short term,
based on my personal experience working with AWS leadership, I think that it's probably not a big short-term risk. AWS is clearly aware that most of the growth
is still in front of them. The amount of all of IT that's on the cloud is still pretty small. And so
the thing to worry about right now is growth. And they are really, really genuinely, sincerely
focused on customer success and will bend over backwards to deal with the customer's problems as they are. And I've seen places where people have
negotiated a huge multi-year enterprise agreement based on reserved instances or something like
that and then realize, oh, wait, we need to switch our whole technology stack, but you've got us by
the RIs and AWS will say, no, no, it's okay. We'll tear that up and rewrite it and get you where you need to go. So in the short term, between now and 2025, would I worry about my cloud provider doing that?
Probably not so much. But let's go a little further out. Let's say it's, you know, 2030 or
something like that. And at that point, you know, Andy Jassy decided to be a full-time sports mogul
and Satya Nadella has gone off to be a recreational sailboat owner or something like that. And private equity operators
come in and take very significant stakes in the public cloud providers and get a lot of their guys
on the board. And you have a very different dynamic, and you have something that starts to
feel like Oracle, where their priority isn't, you know, optimizing for growth and customer success.
Their priority is optimizing for a quarterly bottom line.
Revenue extraction becomes the goal.
That's absolutely right.
And this is not a hypothetical scenario.
It's happened.
Most large companies do not control the amount of money they spend per year
to have desktop software that works.
They pay whatever Microsoft's going to say they pay
because they don't have a choice.
And a lot of companies are in the same situation with their database.
They don't get to budget their database budget.
Oracle comes in and says, here's what you're going to pay, and that's what you pay.
You really don't want to be in that situation with your cloud.
And that's why I think it's perfectly reasonable for somebody who is doing cloud transition
at a major financial or manufacturing or service provider company to have an eye to this.
You know, let's not completely ignore the lock-in issue.
There is a significant scale with enterprise deals and contracts.
There is almost always a contractual provision that says if you're going to raise a price with any cloud provider,
there's a fixed period of time of notice you must give before it happens.
I feel like the first mover there winds up getting soaked because everyone is going to panic and migrate in other directions.
I mean, Google tried it with Google Maps for their API and not quite Google Cloud, but also scared the bejesus out of a whole bunch of people who were, wait, is this a harbinger of things to come?
Well, not in the short term, I don't think.
And I think, you know, Google Maps is absurdly underpriced.
That's a hellishly expensive service.
And it's supposed to pay for itself by advertising on Maps.
I don't know about that.
I would see that as the exception rather than the rule.
I think that it's reasonable to expect cloud prices nominally at least to go on decreasing
for at least the short term, maybe even the medium term.
But that can't go on forever. It also feels to me like, having looked at an awful lot of AWS
environments, that if there were to be some sort of regulatory action or some really weird outage
for a year, that meant that AWS could not onboard a single new customer. Their revenue year over
year would continue to increase purely by organic growth
because there is no forcing function that turns a thing off when you're done using it.
In fact, they could migrate things around to hardware that works.
They can continue billing you for the thing that's sitting there idle.
And there is no governance path on that.
So on some level, winding up doing a price increase is going to cause a
massive company focus on fixing a lot of that. It feels on some level like it is drawing attention
to a thing that they don't really want to draw attention to from a purely revenue extraction
story. When CentOS backwalked their 10-year support line to two years, suddenly with an
idea that it would drive RHEL adoption.
Well, suddenly a lot of people looked at their environment
and saw they had old RHEL instances they weren't using
and massively short-sighted, massively irritated
a whole bunch of people who needed that in the short term,
but by the renewal, we're going to be onto Ubuntu
or something else.
It feels like it's going to backfire massively.
And I'd like to imagine the strategists
of whoever takes the reins of these companies
is going to be smarter than that. But here we are. Here we are. And it's interesting you should
mention regulatory action. At the moment, there are only three credible public cloud providers.
And it's not obvious that Google's really in it for the long haul. Last I checked, they were
claiming to maybe be breaking even on it. That's not a good number. You'd like there to be more
than that. And if it goes on like that, eventually some politician is going to say,
oh, maybe they should be regulated like public utilities because they kind of are, right?
And I would think that anybody who did get into oracleizing would be, you know, accelerate that
happening. Having said that, we do live in the atmosphere of 21st century capitalism, and growth is the god that must be worshipped at all costs.
Who knows? It's a cloudy future. Hard to see.
It really is.
I also want to be clear on some level that with Google's current position, if they weren't taking a small loss, at least, on these things, I would worry.
Like, wait, you're trying to catch AWS AWS and you don't have anything better to invest
that money into than just, well, time to start taking profits from it. So I can see both sides
of that one. Right. As I keep saying, I've already said once during this slot, you know, the total
cloud spend in the world is probably on the order of one or 200 billion per annum and global IT
is in multiple trillions. So there's a lot more space for growth, years and years worth of it.
Now, the challenge too,
is that people are worried about this
from a long-term strategic point of view.
So one thing you talked about in your blog post
is the idea of using hosted open source solutions.
Like instead of using Kinesis,
you'd wind up using Kafka.
Or instead of using DynamoDB,
you'd use their managed Cassandra service, or as I think of it, Amazon Basics Cassandra.
And effectively going down the path of letting them manage this thing, but you then have a theoretical exodus path.
Where do you land on that?
I think that speaks to a lot of people's concerns, and I've had conversations with really smart people about that who like that idea. Now, to be realistic,
it doesn't make migration easy because you've still got all the CI and CD and monitoring and management and scaling and alarms and alerts and paging and et cetera, et cetera, et cetera,
wrapped around it. So it's not as though you could just pick up your managed Kafka off AWS and drop
a huge installation onto GCP easily.
But at least your data plan APIs are the same,
so a lot of your code would probably still run okay.
So it's a plausible path forward.
And when people say, I want to do that,
well, it does mean that you can't go all serverless,
but it's not a totally insane path forward.
So one last point in your blog post that I think a lot of people think about only after they get
bitten by it is the idea of data gravity. I alluded earlier in our conversation to data egress charges,
but my experience has been that where your data lives is effectively where the rest of your cloud usage tends to aggregate. How do you see it?
Well, it's a real issue, but I think it might perhaps be a little overblown. People throw the
term petabytes around, and people don't realize how big a petabyte is. A petabyte is just an
insanely huge amount of data, and the notion of transmitting one over the internet is terrifying.
And there are lots of
enterprises that have multiple petabytes around. And so they think, well, you know, it would take
me 26 years to transmit that, so I can't. And they might be wrong. The internet's getting faster
all the time. Did you notice? I've been able to move some purely personal projects,
insane amounts of data, and it gets there a lot faster than you did. Secondly, in the case
of AWS Snowmobile, we have an existence proof that you can do exabyte-ish scale data transfers in
time it takes to drive a truck across the country. Inbound only snowmobiles are not, at least according
to all public examples, are valid for Exodus. But you know, this is kind of a place where
regulatory action might come into play
if what the people were doing was seen to be abusive.
I mean, there's an existence proof you can do this thing.
But here's another point.
So suppose you have like 15 petabytes.
That's an insane amount of data
displayed in your key corporate application.
So are you actually using that
to run the application?
Or is a huge proportion of that stuff
just logs and data gathered
of various kinds that's being used
in analytics applications
and AI models and so on?
Do you actually need all that data
to actually run your app?
And could you, in fact,
just pick up the stuff you need for your app,
move it to a different cloud provider,
run there,
and leave your analytics on the first one?
Not a totally insane idea.
It's not a terrible idea at all.
It comes down to the idea as well of when you're trying to run a query against a bunch of that data,
do you need all the data to transit or just the results of that query as well?
It's a question of can you move the compute closer to the data
as opposed to move the data to where the compute lives?
Well, you know, and a lot of those people who have those huge data pools
have it sitting on S3, and a lot of it migrated off into Glacier,
so it's not as if you could get at it in milliseconds anyhow. I just ask myself,
how much data can anybody actually use in a day in the course of satisfying some transaction
request from a customer? And I think it's not data bytes. It just isn't. Now, okay, there are
exceptions. There's the intelligence community. there's the oil drilling community, there are some communities who genuinely will use insanely huge seas of data
on a routine basis. But I think that's kind of a corner case. So before you shake your head and
say, ah, they'll never move because of data gravity, you need to prove that to me. And I
might be a little bit skeptical. And I think that that is probably a very fair request.
Just tell me what it is you're going to be doing here to validate the idea that is in your head.
Because the most interesting lies I've found customers tell isn't intentionally to me or anyone else.
It's to themselves.
The narrative of what they think they're doing from the early days takes root. And never mind the fact that, yeah, it turns out that now that you've scaled
out, maybe development isn't 80% of your cloud bill anymore. You learn things and your understanding
of what you're doing has to evolve with the evolution of the applications.
Yep. It's a fun time to be around. I mean, it's so great. Right at the moment,
lock-in isn't that big an issue.
And let's be clear. I'm sure you agree with me on this, Corey, is if you're a startup and you're trying to grow and scale and prove you've got a viable business and sure that you have exponential growth and so on, don't think about lock-in.
Just don't go near it. Pick a cloud provider. Pick whichever cloud provider your CTO already knows how to use and just go all in on them and use all their most advanced features and be serverless if you can.
It's the only sane way forward.
You're short of time.
You're short of money.
You need growth.
Well, what if you need to move strategically in five years?
You should be so lucky.
Great.
Deal with it then.
Or, well, what if we want to sell to retail as our primary market and they hate AWS?
Well, go all in on provider.
Probably not that one.
Pick a different provider and go all in.
I do not care which cloud any given company picks.
Go with what's right for you.
But then go all in.
Because until you have a compelling reason to do otherwise, you're going to spend more time solving global problems locally.
That's right. And we've never actually said this because probably because it's something that both
you and I know at the core of our being, but it probably needs to be said that being multi-cloud
is expensive, right? Because the nouns and verbs that describe what clouds do are different in
Google land and AWS land. They're just different. And it's hard to think about those things.
And you lose the capability
of using the advanced serverless stuff.
There are a whole bunch of costs
to being multi-cloud.
Now, maybe if you're existentially
afraid of lock-in, you don't care.
But for, I think, most normal people,
it's expensive.
Pay now, pay later, you will pay.
Wouldn't you ideally like to see
that dollar go as far as possible?
I'm right there with you. Because it's not just the actual infrastructure cost that's expensive. It costs
something far more dear and expensive. And that is the cognitive expense of having to think about
both of these things. Not just how each cloud provider works, but how each one breaks. You've
done this stuff longer than I have. I don't think that either of us trust a system that we don't understand the failure cases for
and how it's going to degrade.
It's, oh, great, you built something new and awesome.
Awesome.
How does it fall over?
What direction is it going to hit?
So what side should I not stand on?
It's based on understanding
of what you're about to blow holes in.
That's right.
And, you know, I think particularly
if you're using AWS heavily,
you know that there are some things that you might as well bet your business on because, you know, if they're down, so is the rest of the world and who cares.
And other things, maybe a little chancier.
So understanding failure modes, understanding your stuff, you know, the cost sharp edges, understanding manageability issues, it's not obvious.
It's really not.
Tim, I want to thank you for taking the time to go through this, frankly, excellent post with me.
If people want to learn more about how you see things, and I guess how you view the world,
where's the best place to find you?
Well, I'm on Twitter, just Tim Bray, T-I-M-B-R-A-Y, and my blog's at tbray.org.
And that's where that piece you were just talking about is.
And that's kind of my online presence.
And we will, of course, put links to it in the show notes.
Thanks so much for being so generous with your time.
It's always a pleasure to talk to you.
Well, it's always fun to talk to somebody who has shared passions,
and we clearly do.
Indeed.
Tim Bray, Principal at Textuality Services.
I'm cloud economist Corey Quinn, and this is Screaming in the Cloud.
If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice.
Whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice,
along with an angry comment that you then need to take to all of the other podcast platforms out there,
purely for redundancy, so you don't get locked into one of them.
If your AWS bill keeps rising and your blood pressure is doing the same, then you need the
Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor
recommendations to your business, and we get to the point. Visit duckbillgroup.com to get started.
This has been a humble pod production stay humble