Screaming in the Cloud - A Cloud Economist is Born - The AlterNAT Origin Story
Episode Date: November 9, 2022About BenBen Whaley is a staff software engineer at Chime. Ben is co-author of the UNIX and Linux System Administration Handbook, the de facto standard text on Linux administration, and is th...e author of two educational videos: Linux Web Operations and Linux System Administration. He is an AWS Community Hero since 2014. Ben has held Red Hat Certified Engineer (RHCE) and Certified Information Systems Security Professional (CISSP) certifications. He earned a B.S. in Computer Science from Univ. of Colorado, Boulder.Links Referenced:Chime Financial: https://www.chime.com/alternat.cloud: https://alternat.cloudTwitter: https://twitter.com/iamthewhaleyLinkedIn: https://www.linkedin.com/in/benwhaley/
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
Forget everything you know about SSH and try Tailscale.
Imagine if you didn't need to manage PKI or rotate SSH keys every time someone leaves.
That'd be pretty sweet, wouldn't it?
With Tailscale SSH, you can do exactly that.
Tailscale gives each server and user device a node key to connect to its VPN,
and it uses the same node key to authorize and authenticate SSH.
Basically, you're SSHing the same way you manage access to your app.
What's the benefit here?
Built-in key rotation.
Permissions as code.
Connectivity. Between any two devices.
Reduced latency.
And there's a lot more, but there's a time limit here.
You can also ask users to re-authenticate for that extra bit of security.
Sounds expensive?
Nope.
I wish it were.
Tailscale is completely free for personal use on up to 20 devices.
To learn more, visit snark.cloud slash tailscale. Again,
that's snark.cloud slash tailscale. Welcome to Screaming in the Cloud. I'm Corey Quinn,
and this is an episode unlike any other that has yet been released on this August podcast.
Let's begin by introducing my first-time guest somehow, because apparently
an invitation got lost in the mail somewhere. Ben Whaley is a staff software engineer at Chime
Financial and has been a AWS community hero since Andy Jassy was basically in diapers,
to my level of understanding. Ben, welcome to the show.
Corey, so good to be here. Thanks for having me on.
I'm embarrassed that you haven't been on the show before. You're one of those people that
slipped through the cracks, and somehow I was very bad at following up slash hounding you
into finally agreeing to be here. But you certainly waited until you had something
auspicious to talk about.
Well, you know, I'm the one that really should be embarrassed here.
You did extend the invitation, and I guess I just didn't feel like I had something to drop.
But I think today we have something that will interest most of the listeners without a doubt.
So folks who have listened to this podcast before, or read my newsletter, or follow me on Twitter,
or have shared an elevator with me, or at any point have passed me on the street have heard me complain about the managed
NAT gateway and its egregious data processing fee of four and a half cents per gigabyte. And I have
complained about this for small customers because they're in the free tier. Why is this thing
charging them 32 bucks a month? And I have complained about this
on behalf of large customers
who are paying the GDP of the nation of Belize
in data processing fees
as they wind up shoving very large workloads to and fro,
which is, I think, part of the prerequisite requirements
for having a data warehouse.
And you are no different than the rest of these people
who have those challenges, with the singular exception that you have done something about it.
And what you have done is so, in retrospect, blindingly obvious that I am embarrassed.
The rest of us never thought of it.
It's interesting because when you are doing engineering, it's often the simplest solution that is the best.
I've seen this repeatedly.
And it's a little surprising that it didn't come up before,
but I think it's in some way just a matter of timing.
But what we came up with,
and is this the right time to get into it?
Do you want to just kind of name the solution here? Oh, by all means.
I'm not going to steal your thunder.
Please tell us what you have wrought.
We're calling it Alternat.
And it's an alternative solution
to a high availability NAT solution. As everybody knows, NAT Gateway is sort of the default choice. It certainly is what AWS pushes everybody towards. But there is in fact a legacy solution, NAT instances. These were around long before NAT Gateway made an appearance. And like I said, they're considered legacy. But with the help
of lots of modern AWS innovations and technologies like lambdas and auto scaling groups with max
instance lifetimes and the latest generation of networking, improved or enhanced instances,
it turns out that we can maybe not quite get as effective as a NAT gateway, but we can save a lot of money and skip those data processing charges entirely by having a NAT instance solution with a failover NAT gateway, which I think is kind of the key point behind this solution.
So are you interested in diving into the technical details?
That is very much the missing piece right there.
You're right.
What we used to use was NAT instances.
That was the thing that we used because we didn't really have another option. And they had an
interface in the public subnet where they lived and an interface hanging out in the private subnet,
and they had to be configured to wind up passing traffic to and fro. Well, okay, that's great and
all, but isn't that kind of brittle and dangerous? I basically have a single instance
as a single point of failure,
and these are the days early on
when individual instances did not have
the level of availability and durability they do now.
Yeah, it's kind of awful, but here you go.
I mean, the most galling part
of the Manage NAT Gateway service
is not that it's expensive.
It's that it's expensive,
but also incredibly good at what it does.
You don't have to think about this whole problem anymore.
And as of recently,
it also supports IPv4 to IPv6 translation as well.
It's not that the service is bad.
It's that the service is stonkingly expensive,
particularly at scale.
And everything that we've seen before is either, oh, run your
own NAT instances or bend your knee and pays your money. And a number of folks have come up with
different options where this is ridiculous. Just go ahead and run your own NAT instances.
Yeah, but what happens when I have to take it down for maintenance or replace it? It's like,
well, I guess you're not going to the internet today. This has the, in hindsight, obvious solution.
Well, we run the managed NAT gateway
because the 32 bucks a year in instance hour charges
don't actually matter at any point of scale
when you're doing this,
but you wind up using that for day in, day out traffic.
And the failover mode is simply,
you'll use the expensive managed NAT gateway
until the instance is healthy again, and then automatically change the route table back and forth.
Yep, that's exactly it.
So the auto-scaling NAT instance solution has been around for a long time, well before
even NAT gateway was released.
You could have NAT instances in an auto-scaling group where the size of the group was one,
and if the NAT instance
failed, it would just replace itself. But this left a period in which you'd have no internet
connectivity during that, you know, when the NAT instance was swapped out. So the solution here is
that when auto-scaling terminates an instance, it fails over the route table to a standby NAT gateway,
rerouting the traffic.
So there's never a point at which there's no internet connectivity, right?
The NAT instance is running, processing traffic, gets terminated after a certain period of time, configurable 14 days, 30 days, whatever makes sense for your security strategy.
Could be never, right?
You could choose that you want to have your own maintenance window in which to do it.
Let's face it, this thing is more or less sitting there as a network traffic
router, for lack of a better term. There is no need to ever log into the thing and make changes
to it until and unless there's a vulnerability that you can exploit via somehow just talking
to the TCP stack when nothing's actually listening on the host. You know, you can run your own AMI that has been pared down to almost nothing.
And that instance doesn't do much.
It's using just the Linux kernel to sit on two networks and pass traffic back and forth.
It has a translation table that kind of keeps track of the state of connections.
And so you don't need to have any service running.
To manage the system,
we have SSM. So you can use Session Manager to log in. But frankly, you can just disable that.
You almost never even need to get a shell. And that's, in fact, an option we have in the solution
is to disable SSM entirely. One of the things I love about this approach is that it is turnkey.
You throw this thing in there, and it's good to go.
And in the event that the instance becomes unhealthy, great, it fails traffic over to
the managed NAT gateway while it terminates the old node and replaces it with a healthy one,
and then fails traffic back. Now, I do need to ask, what is the story of network connections
during that failover and failback scenario? Right. That's the primary drawback,
I would say, of the solution is that any established TCP connections that are on the
NAT instance at the time of a route change will be lost. So say you have... TCP now terminates on
the floor. Pretty much. The connections are dropped. If you have an open SSH connection from
a host in the private network to a on the internet and the instance fails over
to the NAT gateway, the NAT gateway doesn't have the translation table that the NAT instance had.
And not to mention the public IP address also changes because you have an Elastic IP assigned
to the NAT instance, a different Elastic IP assigned to the NAT gateway. And so because
that upstream IP is different, the remote host is tracking the wrong IP.
So those connections, they're going to be lost.
So there are some use cases where this may not be suitable.
We do have some ideas on how you might mitigate that, for example, with the use of a maintenance window to schedule the replacement.
Replace less often so it doesn't have to affect your workflow as much.
But frankly, for many use cases, my belief is that it's actually fine. In our use case at Chime,
we found that it's completely fine and we didn't actually experience any errors or failures. But
there might be some use cases that are more sensitive or less resilient to failure in the
first place. I would also point out that a lot of how software
is going to behave
is going to be a reflection
of the era in which
it was moved to cloud.
Back in the early days of EC2,
you had no real sense of reliability
around any individual instance.
So everything was written
in a very defensive manner.
These days, with instances
automatically being able to flow
among different hardware so we don't get
instance interrupt notifications the way we once did in a semi-constant basis it more or less has
become what presents as bulletproof so a lot of people are writing software that's a bit more
brittle but it's always been a best practice that when a connection fails okay what happens at
failure do you just give up and throw your hands in the air and shriek for help? Or do you attempt to retry a few times,
ideally backing off exponentially?
In this scenario, those retries will work.
So it's a question of how well have you built your software?
Okay, let's say that you've made
the worst decisions imaginable.
And okay, if that connection dies,
the entire workload dies.
Okay, you have the option to refactor it to be a little bit
better behaved, or alternately, you can keep paying the managed net gateway tax of four and
a half cents per gigabyte in perpetuity forever. I'm not going to tell you what decision to make,
but I know which one I'm making. Yeah, exactly. The cost savings potential of it far outweighs
the potential maintenance troubles, I guess,
that you could encounter.
But the fact is, if you're relying on Managed NAT Gateway and paying the price for doing
so, it's not as if there's no chance for connection failure.
NAT Gateway could also fail.
I will admit that I think it's an extremely robust and resilient solution.
I've been really impressed with it, especially so after having worked on this project. But it doesn't mean
it can't fail. And beyond that, upstream of the NAT gateway, something could in fact go wrong.
Internet connections are unreliable, kind of by design. So if your system is not resilient to
connection failures, there's a problem to solve there anyway.
You're kind of relying on hope.
So it's a kind of a forcing function in some ways
to build architectural best practices, in my view.
I can't stress enough that I have zero problem
with the capabilities and the stability
of the ManageNet gateway solution.
My complaints about it start and stop entirely
with the price. Back when you first showed me the blog post that is releasing at the same time as
this podcast, and you can visit that at alternat.cloud, you sent me an early draft of this.
And what I loved the most was that your math was off because of a not complete understanding of the gloriousness that is just how egregious the NAT gateway charges are.
Your initial analysis said, all right, if you're throwing half a terabyte out to the Internet, this has the potential of cutting the bill by, I think it was $10,000 or something like that. It's, oh no, no.
It has the potential to cut the bill by an entire $22,500
because this processing fee
does not replace any egress fees whatsoever.
It's purely additive.
If you forget to have a free S3 gateway endpoint
in a private subnet,
every time you put something into
or take something out of S3,
you're paying four and a half cents per gigabyte on that. Despite the fact there's no internet
transitory work, it's not crossing availability zones, it is simply a four and a half cent fee
to retrieve something that only costs you, at most, 2.3 cents per month to store in the first
place. Flip that switch, that becomes completely free.
Yeah, I'm not embarrassed at all to talk about the lack of education I had around this topic.
The fact is, I'm an engineer primarily, and I came across the cost stuff because it kind of
seemed like a problem that needed to be solved within my organization. And if you don't mind,
I might just linger on this point and kind of think back a few months. I looked at the AWS bill and I saw
this egregious EC2 other category. It was taking up the majority of our bill. Like the single
biggest line item was EC2 other. And I was like, what could this be? I want to wind up flagging
that just because that bears repeating, because I often get
people pushing back of, well, how bad it's one managed net gateway.
How much could it possibly cost?
Ten dollars?
No, it is the majority of your monthly bill.
I cannot stress that enough.
And that's not because the people who work there are doing anything that they should
not be doing or didn't understand all
the nuances of this. It's because for the security posture that is required for what you do,
you are at time financial, let's be clear here, putting everything in public subnets was not
really a possibility for you folks. Yeah, not only that, but there are plenty of services that
have to be on private subnets. For example, AWS Glue services must run in private VPC subnets if you want them to be able to talk to other
systems in your VPC. They cannot live in public subnets. So you're essentially,
if you want to talk to the internet from those jobs, you're forced into some kind of NAT solution.
So I dug into the EC2 other category and I started trying to figure out what was going on there.
There's no way, natively, to look at what traffic is transiting the NAT gateway.
There's not an interface that shows you what's going on, what's the biggest talkers over that network.
Instead, you have to have flow logs enabled, and you have to parse those flow logs.
So I dug into that.
Well, you're missing a step first, because in a lot of environments,
people have more than one of these things.
So you get to first do the scavenger hunt of,
okay, I have a whole bunch of managed NAT gateways.
And first I need to go diving into CloudWatch metrics
and figure out which are the heavy talkers.
It's usually one or two,
followed by a whole bunch of small stuff,
but not always.
So figuring out which VPC
you're even talking about is a necessary prerequisite. Yeah, exactly. The data around
it is almost missing entirely. Once you come to the conclusion that it is a particular
NAT gateway, that's a set of problems to solve on its own. But first, you have to
go to the flow logs. You have to figure out what are the biggest
upstream IPs that it's talking to. Once you have the IP, it still isn't apparent what that host is.
In our case, we had all sorts of outside parties that we were talking to a lot. And it's a matter
of sorting by volume and figuring out, well, this IP, what is the reverse IP? Who is potentially the host there? I actually had some
wrong answers at first. I set up VPC endpoints to S3 and DynamoDB and SQS because those were some
talk talkers. And that was a nice way to gain some security and some resilience and save some money.
And then I found, well, Datadog, that's another top talkcker for us. So I ended up creating a nice private link to Datadog,
which they offer for free, by the way,
which is more than I can say for some other vendors.
But then I found some outside parties,
there wasn't a nice private link solution available to us.
And yet it was by far the largest volume.
So that's what kind of started me down this track
is analyzing the NAT gateway myself
by looking at VPC flow logs.
Like it's shocking that there isn't a better way
to find that traffic.
It's worse than that because VPC flow logs
tell you where the traffic is going
and in what volumes, sure,
on an IP address and port basis.
But okay, now you have a Kubernetes cluster
that spans two availability zones.
Okay, great. What is actually passing through that? So you have one big application that just
seems awfully chatty. You have multiple workloads running on the thing. What's the expensive thing
talking back and forth? The only way that you can reliably get the answer to that, that I found,
is to talk to people about what those workloads are actually doing and failing
that you're going code spelunking. Yep, you're exactly right about that. In our case, it ended
up being apparent because we have a set of subnets where only one particular project runs. And when I
saw the source IP, I could immediately figure that part out. But if it's a Cates cluster and the private subnets,
yeah, how are you going to find it out? You're going to have to ask everybody that has workloads
running there. And we're talking about, in some cases, millions of dollars a month. Yeah,
it starts to feel a little bit predatory as far as how it's priced and the amount of work you have
to put in to track this stuff down. I've done this a handful of times myself, and it's always painful unless you discover something pretty early on, like, oh, it's
talking to S3, because that's pretty obvious when you see that. It's, yeah, flip this switch,
and this entire engagement just paid for itself a hundred times over. Now, let's see what else
we can discover. That is always one of those fun moments, because first, customers are super
grateful to learn that.
Oh, my God, I flipped that switch and I'm saving a whole bunch of money because it starts with gratitude.
Thank you so much.
This is great. And it doesn't take a whole lot of time for that to alchemize into anger of, wait, you mean I've been being ridden like a pony for this long and no one bothered to mention that if I click a button, this whole thing just goes away. And when you mention this to your AWS account team, like they're solicitous, but they either have to present as I didn't know that existed either, which is not a good look.
Or, yeah, you caught us, which is worse.
There's no positive story on this.
It just feels like a tax on not knowing trivia about AWS.
I think that's what really
winds me up about it so much. Yeah, I think you're right on about that as well.
My misunderstanding about the NAT pricing was data processing is additive to data transfer.
I expected when I replaced NAT gateway with NAT instance that I would be substituting data transfer costs
for NAT gateway costs, NAT gateway data processing costs. But in fact, NAT gateway incurs both data
processing and data transfer. NAT instances only incur data transfer costs. And so this is a big
difference between the two solutions.
Not only that, but if you're in the same region, if you're egressing out of your, say, US East 1 region and talking to another hosted service also within US East 1, never leaving the AWS network, you don't actually even incur data transfer costs.
So if you're using a NAT gateway, you're paying data processing.
To be clear, you do, but it is cross AZ in most cases,
billed at one penny egressing.
And on the other side, that hosted service
generally pays one penny ingressing as well.
Don't feel bad about that one.
That was extraordinarily unclear.
And the only reason I know the answer to that
is that I got tired of getting stonewalled
by people that later turned out didn't know the answer. So I ran a series of experiments and you're paying between 13.5 cents to 9.5 cents for every
gigabyte egressed. And this is a phenomenal cost. And at any kind of volume, if you're doing
terabytes to petabytes, this becomes a significant portion of your bill. And this is why people
hate the NAT gateway so much. I am going to short circuit an angry comment I can
already see coming on this, where people are going to say, well, yes, but at the multi-petabyte scale,
nobody's paying on-demand retail price. And they're right. Most people who are transiting
that kind of data have a specific discount rate applied to what they're doing varies depending upon usage
and use case sure great but i'm more concerned with the people who are sitting around dreaming
up ideas for a company where i want to wind up doing some sort of streaming service i talked to
one of those companies very early on in my tenure as a consultant around the billing piece and they
wanted me to check their napkin
math because they thought that at their numbers, when they wound up scaling up, if their projections
were right, that they were going to be spending $65,000 a minute. And what did they not understand?
And the answer was, well, you didn't understand this other thing, so it's going to be more than
that. But no, you're directionally correct. So that idea that started off on a napkin,
of course they didn't build it on top of AWS.
They went elsewhere.
And last time I checked,
they'd raised well over a quarter billion dollars in funding.
So that's a business that AWS would love to have
on a variety of different levels,
but they're never going to even be considered
because by the time someone is at scale,
they either have built this somewhere else
or they went broke trying.
Yep, absolutely.
And we might just make the point there
that while you can get discounts on data transfer,
you really can't, or it's very rare
to get discounts on data processing for the NAT gateway.
So any kind of savings you can get on data transfer
would apply to a NAT instance solution,
saving you four and a half cents per gigabyte
inbound and outbound
over the NAT gateway equivalent solution.
So you're paying a lot for the benefit
of a fully managed service there.
Very robust, nicely engineered,
fully managed service, as we've already, nicely engineered, fully managed service,
as we've already acknowledged,
but an extremely expensive solution
for what it is,
which is really just a proxy in the end.
It doesn't add any value to you.
The only way to make that more expensive
would be to route it through something
like Splunk or whatnot.
And Splunk does an awful lot
for what they charge per gigabyte,
but it just feels like it's rent-seeking
in some of the worst ways possible.
And what I love about this is that you've solved the problem
in a way that is open source.
You have already released it in Terraform code.
I think one of the first to-dos on this for someone
is going to be, okay, now also make it CloudFormation
and also make it CDK so you can drop it in however you want.
And anyone can use this.
I think the biggest mistake people might make
in glancing at this is,
well, I'm looking at the hourly charge
for the NAT gateways,
and that's 32 and a half bucks a month.
And the instances that you recommend
are hundreds of dollars a month
for the big network optimized stuff.
Yeah, if you care about the hourly rate
of either of those two things,
this is not for you.
That is not the problem that it solves.
If you're an independent learner annoyed about the $30 charge you got for a managed NAT gateway,
don't do this.
This will only add to your billing concerns.
Where it really shines is once you're at, I would say, probably about 10 terabytes a
month, give or take, in managed NAT
gateway data processing is where it starts to consider this. The breakeven is around six or so,
but there is value to not having to think about things. Once you get to that level of spend,
though, it's worth devoting a little bit of infrastructure time to something like this.
Yeah, that's effectively correct.
The total cost of running the solution,
like all in, there's eight elastic IPs,
four NAT gateways, if you're, say you're in four zones,
could be less if you're in fewer zones,
like N NAT gateways, N NAT instances,
depending on how many zones you're in.
And I think that's about it.
And I said right in the documentation,
if any of those baseline fees
are a material number for your use case,
then this is probably not the right solution.
Because we're talking about saving thousands of dollars.
Any of these small numbers for NAT gateway hourly costs,
NAT instance hourly costs,
that shouldn't be a factor,
basically. Yeah, it's like when I used to worry about costing my customers a few tens of dollars in Cost Explorer or CloudWatch or request fees against S3 for their cost and usage reports,
it's, yeah, that does actually have a cost. There's no real way around it. But look at the
savings they're realizing by going through that. Yeah, they're not going to come back and complaining about their five-figure
consulting engagement costing an additional $25 in AWS charges and then lowering it by a third.
So there's definitely a difference as far as how those things tend to be perceived. But it's easy
to miss the big stuff when chasing after the little stuff like that. This is part of the problem I have
with an awful lot of cost tooling out there.
They completely ignore cost components like this
and focus only on the things
that are easy to query via API.
Of, oh, we're going to cost optimize
your Kubernetes cluster
when they think about compute and RAM.
And okay, that's great,
but you're completely ignoring all of data transfer
because there's still no great way to get at that programmatically. And it really is missing the forest for the trees.
I think this is key to any cost reduction project or program that you're undertaking.
When you look at a bill, look for the biggest spend items first and work your way down from
there just because of the impact you can have. And that's exactly what I did in this project.
I saw that EC2 other slash NAT gateway was the big item and I started brainstorming ways
that we could go about addressing that.
Now I have my next targets in mind.
Now that we've reduced this cost to effectively nothing,
extremely low compared to what it was,
we have other new line items on our build
that we can start optimizing.
But in any cost project, start with the big things. You have other new line items on our bill that we can start optimizing. But in any cost
project, start with the big things. You have come the long way around to answer a question I get
asked a lot, which is, how do I become a cloud economist? And my answer is, you don't. It's
something that happens to you. And it appears to be happening to you, too. My favorite part about
this solution that you built, incidentally, is that it is being released under the auspices of your employer, Chime Financial, which is immune to being acquired by Amazon just to kill this thing and shut it up because Amazon already has something shitty called Chime.
They don't need to wind up launching something else or acquiring something else and ruining it because they have a slack competitor of sorts called Amazon Chime.
There's no way they
could acquire you. Everyone would get lost in the hallways. Well, I have confidence that Chime will
be a good steward of the project. Chime's goal and mission as a company is to help everyone
achieve financial peace of mind. And we take that really seriously. We even apply it to ourselves.
And that was kind of the impetus behind developing this in the first place.
You mentioned earlier we have Terraform support already.
And you're exactly right.
I'd love to have CDK, CloudFormation, Pulumi support, and other kinds of contributions are more than welcome from the community.
So if anybody feels like participating, if they see a feature that's missing, let's make this project the best that it can be. I suspect we can save many companies hundreds of thousands or millions
of dollars. And this really feels like the right direction to go. And this is easily a multi-billion
dollar savings opportunity globally. That's huge. I would be flabbergasted if that was the outcome
of this. The hardest part is reaching these people and
getting them on board with the idea of handling this. And again, I think there's a lot of
opportunity for the project to evolve in the sense of different settings depending upon risk
tolerance. I can easily see a scenario where in the event of a disruption to the NAT instance,
it fails over to the managed NAT gateway, but fail back becomes manual. So you don't have a
flapping route table back and forth or a hold down timer or something
like that.
Because again, in that scenario, the failure mode is just, well, you're paying four and
a half cents per gigabyte for a while until you wind up figuring out what's going on,
as opposed to the failure mode of you wind up disrupting connections on an ongoing basis.
And for some workloads, that's not tenable.
This is absolutely, for the common case, the right path forward.
Absolutely.
I think it's an enterprise-grade solution, and the more knobs and dials that we add to
tweak to make it more robust or adaptable to different kinds of use cases, the best
outcome here would actually be that the entire solution becomes irrelevant because AWS fixes
the NAT gateway
pricing. If that happens, I will consider the project a great success. I will be doing backflips
like you wouldn't believe. I would sing their praises day in, day out. I'm not saying reduce
it to nothing even. I'm not saying it adds no value. I would change the way that it's priced
because honestly, the fact that I can run an EC2 instance and be charged $0 on a per gigabyte basis,
yeah, I would pay a premium on an hourly charge
based upon traffic volumes,
but don't meter it per gigabyte.
That's where it breaks down.
Absolutely.
And why is it additive to data transfer also?
Like, I remember first starting to use VPC
when it was launched
and reading about the NAT instance requirement and thinking, wait a minute, I have to pay this extra management and hourly fee just so my private host could reach the internet?
That seems kind of janky.
And Amazon established a norm here because Azure and GCP both have their own equivalent of this now.
This is a business choice.
This is not a technical choice. They could just run this under the hood and not charge anybody for it or build in
the cost. And it wouldn't be this thing we have to think about. I almost hate to say it, but Oracle
Cloud does for free. Do they? It can be done. This is a business decision. It is not a technical
capability issue where, well, it does
incur costs to run these things. I understand that. And I'm not asking for things for free.
I very rarely say that this is overpriced when I'm talking about AWS billing issues. I'm talking
about it being unpredictable. I'm talking about it being impossible to see in advance. But the
fact that it costs too much money is rarely my complaint. In this case, it costs too much money.
Make it cost less.
If I'm not mistaken, GCP's equivalent solution is the exact same price.
It's also four and a half cents per gigabyte.
So that shows you that there's business games being played here.
Like, Amazon could get ahead and do right by the customer
by dropping this to a much more reasonable
price.
I really want to thank you both for taking the time to speak with me and building this
glorious, glorious thing.
Where can we find it?
And where can we find you?
Alternet.cloud is going to be the place to visit.
It's on Chime's GitHub, which will be released by the time this podcast comes out.
As for me, if you want to connect, I'm on Twitter.
I am the Whaley is my handle.
And of course, I'm on LinkedIn.
Links to all of that will be in the podcast notes.
Ben, thank you so much for your time and your hard work.
This was fun.
Thanks, Corey. Ben Whaley, staff software engineer at Chime Financial and AWS community hero.
I'm cloud economist Corey Quinn, and this is Screaming in the Cloud.
If you've enjoyed this podcast, please leave a five-star review on your podcast platform
of choice.
Whereas if you've hated this podcast, please leave a five-star review on your podcast platform
of choice, along with an angry rant of a comment that I will charge you not only four and a half cents per word to read,
but four and a half cents to reply, because I am experimenting with myself with being a rent-seeking
schmuck. If your AWS bill keeps rising and your blood pressure is doing the same, then you need the Duck Bill Group.
We help companies fix their AWS bill by making it smaller and less horrifying.
The Duck Bill Group works for you, not AWS.
We tailor recommendations to your business and we get to the point.
Visit duckbillgroup.com to get started.
This has been a HumblePod production.
Stay humble.