Screaming in the Cloud - A Conversation on Cloud WAN with Kris Gillespie
Episode Date: February 15, 2024Kris Gillespie, lead platform engineer for Silverflow, joins Corey Quinn on "Screaming in the Cloud" to talk about Cloud WAN's exciting new role in cloud networking. Kris explains Silverflow'...s journey, from the original problems with network scalability and the resolution of IP conflicts, to fully utilizing Cloud WAN for global connectivity and easier network management. Kris, who enjoys simplifying complex network architectures, discusses how Cloud WAN has enabled Silverflow to seamlessly integrate between regions and cloud providers, meeting their mission-critical needs for low latency and reliable transaction processing. Listen in to see how Cloud WAN has transformed the approach to solving fundamental network problems, demonstrating the importance for companies and engineers of knowing how to navigate the constantly evolving cloud landscape. Show Highlights: (00:00) Introduction to the show(01:57) Kris recounts the initial challenges Silverflowy and the discovery of Cloud WAN(04:15) The advantages of Cloud WAN over traditional transit gateways(08:35) Infrastructure management with OrgFormation (12:15) Insights into the use of historical and current networking technologies (21:13) challenges and implications of transitioning to IPv6(33:10) Kris highlights the real need for Cloud WAN(37:50) Closing remarksAbout KrisKris is a 28-year industry veteran. He started in '95 back in Australia on the help desk for the first ISP in the country. Since then has moved to the Netherlands, switching roles between network, systems and storage engineering. During this time has been involved in developing certifications for both IBM and (the now defunct) EMC, among others. Worked heavily in the finance/banking sector. The last 10 years has been keenly focused on the cloud space and as is the term these days, combined these skills into what's popularly coined, a "Platform Engineer"Currently works for a payments processing startup, Silverflow, as their Principal Platform Engineer, leading their Platform team and ensuring the platform can scale globally.Links Referenced:LinkedIn: https://www.linkedin.com/in/krisgillespie/blog: https://blog.viking-ops.io/
Transcript
Discussion (0)
We kind of had to wrap our head around the cost and how it would justify it.
Now we're at the point where any feature team can just decide to roll out a new VPC and
it automatically goes, it grabs the allocation and it's done.
And there's no conflict.
Everything works.
Welcome to Screaming in the Cloud.
I'm Corey Quinn.
One of the fundamental things about cloud is something that's largely
been abstracted away almost to the point where it's become something of a lie. Whereas the
network that really exists in cloud is not necessarily what is presented to us as customers.
It's a polite fiction, which is, I guess, a different way of saying virtualization.
But what's happening there is often not understood at almost any level, just because that skill set is not as front and center
as it once needed to be. I had a conversation with today's guest about this exact thing at
reInvent, and I figured we'd have this conversation a longer format where my voice wasn't basically
down to a croak. Chris Gillespie is a principal platform engineer at
Silverflow. Chris, thank you for agreeing to do this in a more public format.
Oh, you're welcome. I actually really enjoyed our conversation at a reInvent. And even though
that was quite a public conversation, let's see if this can be more useful to your listeners out
there as well. What I love about reInvent and honestly, all conferences is the hallway track. You get to have conversations and things just sort of
pop up out of nowhere because you didn't really know that they were coming.
As I recall, what really got us started talking was that you were the first and so far only
CloudWan customer that I have found in the wild. CloudWand being yet another networking service that is poorly described by AWS,
and even more surprising on some level, you were brimming with praise about it. So terrific.
Someone will talk to me about it. Does it actually work on the product at Amazon? What's the deal?
The journey to CloudWand came completely by accident. When I started working at Silverflow, they're a payments processing company.
So the idea is basically to have this one API
which we can do transactions on,
but then scale it globally, right?
So have it in the US, Europe, Asia Pacific, wherever.
So we had everything running in one region
because we're based in Europe, we're in the Netherlands.
But we were talking to some possible customers in North America,
and then we start to think like, okay, how do we actually scale this properly?
Because everything has been designed, you know, following best practices.
So we have VPCs with slash 16s everywhere.
Or when I say we, I mean, when I walked in the door, they were there.
And I'm like, okay, slash 16s feels a bit wasteful, but okay.
And then we got to the point where we're like, okay, how do we scale this?
How do we actually take these 30, 40, 50 accounts with VPCs everywhere, slash 16s everywhere,
and now replicate that two, three, four, five, six times.
Without having IP conflicts up the wazoo, which is always fun when you start pairing
things together that were not designed to be paired.
Tell me about it.
We already had IP conflicts and we were already at the point where it's like,
okay, somebody mentioned somewhere that it's possible to,
well, that of course you can do transit gateway peering,
but that dynamic routing will come eventually.
So my mission, not last year,
but the year before at reInvent
was to find somebody who works at AWS
that could tell me about it.
So I just went hunting for that person.
And yeah, eventually they told me about CloudWare.
And that was like the hot moment.
I keep wanting to play with it.
One of the problems I've had with it historically
has been that at its minimum scale,
I think it's something like $500 a month
to run the smallest possible expression of it,
which isn't particularly useful
because you want to have multiple things talking to it. So at that point, we're talking thousands of dollars a month for
me to build out anything that remotely resembles a reasonable test lab. And sorry, Amazon, I'm not
quite at a point where I'm just willing to throw that kind of R&D budget at explaining your own
services for you because you are bad at it. So I just wait until I encounter people in the wild.
And then like any good consultant, I turn other people's production into my test accounts. So I'm glad to hear that it solved
some of the painful parts for you. What exactly was it that things like Transit Gateway and
variety of convoluted peering setups weren't getting done for you?
I mean, Transit Gateways work fine in a regional context. So yeah, you can actually share them across accounts,
but they're a regional service.
Once you start wanting to connect multiple regions together,
the connection between them is, well, the routing is static.
So you need to have some sort of Lambda updating routing tables
on your transit gateways to maintain any kind of, I don't
know, let's say accurate view of your network, especially if you have things like, you know,
VPN connections or, you know, customer gateways or anything that's kind of dynamic. So yeah,
you're kind of mixing dynamic with static and then you need to add some extra, yeah,
Lambdas and stuff that you have to write yourself
to manage this.
And I'm thinking, why?
Why should I write some TypeScript code
or Python or whatever to manage networking
that you should do?
Right, I was under the impression
that router was a piece of hardware,
not a job description for someone.
So there's a, you're half step above
passing packets by hand at that point.
Basically, yes.
And if you're talking about a VPC peering,
then that's just like everything there, everything there.
And you sort things out through knuckles.
And I never, ever want to deal with the knuckles.
That's like, I just want to kill myself then.
Even AWS's policies around network ACLs has been,
don't use them unless you have a hard bound requirement
where you must use them.
Use security groups
because it's one of those hidden things
you don't think about or see.
They're not very granular.
You're limited in how many you can have applied.
And when something isn't working that should,
you will drive yourself up a wall
before you remember that they're there.
Exactly.
You need to have logging everywhere.
Know that you need to turn it on here, here, here, here, and here,
and then try to correlate it all the way through.
So as far as visibility on the network layer, it's also quite a challenge.
That's also something that we're very hard trying to solve.
But again, then it leads you down the path of more AWS services.
Oso makes it easy for developers to build authorization into their applications. With Oso, you can model, extend, and enforce your authorization as your applications scale.
Organizations like Intercom, Headway Product Board, and PagerDuty have migrated to Oso to build fine-grained authorization backed by a highly available and performance service.
Check out Oso today at osohq.com.
That's O-S-O-H-Q dot com.
I was a grumpy Unix sysadmin who then learned Linux for a job. And one year in, we had the 2008 financial crisis. Suddenly, no one's hiring and salary freezes across the board,
and I was fairly bored at work. So I wound up spending that year getting my CCNA because it was, okay, what area am I hand-waving over
in my day job that I feel like
I really should know more about?
So networking was an easy answer.
I'm not saying I'm any great shakes at it,
but I definitely understand it a lot better than I did.
And it made me a better systems person as a direct result.
And over the, dear Lord, almost 20 years since,
I'm realizing just how rare I am just by
with that surface level understanding of a lot of networking concepts, because in cloud, you don't
actually have to think about the network at all until suddenly one day you very much do. And by
that point, you don't even know where to begin. It's an entire area that is sort of slipped below
the surface level of awareness that is still critically important because a computer without a network is basically
an expensive space heater. It's actually really, I would say, ironic as well. So my background is
initially on, let's say, help desk, systems. Then I went into networking, storage engineering,
back to systems, back to networking. And now all that kind of wraps
together and you call it, let's say, platform engineering. I mean, I don't even know what to
call what I do anymore. But when I was hired for my job currently, they had no idea that I had any
networking experience. It was just because my last, like, say, eight, 10 years has been largely
focused on, let's say, cloud. And so that in most people's minds is infrastructure as code, CICD, and that's about it. And people instead wind up getting judgy and
annoying because in job interviews, you're bad at doing things like implementing quick sort on a
whiteboard. I can't do any of that. I am not a developer, right? See, I used to say the same
thing. And then I realized that I was writing an awful lot of configuration as code and a lot of
scripts that were getting fairly up there. And Python just started to intrinsically make sense. And now, of course, I write the most common
programming language, which is YAML. And now I have this amazing alchemical app. I have this
amazing alchemical ability to turn YAML files into AWS bills. It's kind of amazing. It's a horrible
party trick and you're never invited back to that party. No. Well, I mean, yeah, so we actually use a hybrid or I would say a Frankenstein's monster
version of a cloud formation called org formation. Let's say it's organizationally aware cloud
formation. So you can very easily deploy a stack. So all this YAML across 20 accounts, 30 accounts.
Yeah. If you make one little boo-boo and all of a sudden you've deployed
cloud network edges in 20 regions,
then your bill goes.
So yes, we can very much scale our costs
much to the delight of our TAMs.
What's fun about a lot of that too
is that it's this whole world of,
okay, what am I going to do?
I'm going to teach an AWS service
about multiple accounts
and or multiple regions on fun days, possibly both. And you're sparkling over things that AWS really should be doing for
us as customers, but haven't gotten around to yet. And let's be clear, I know that these are
hard problems to solve. They're also a division of a $1.5 trillion company. And I don't think
that asking the rest of us to do volunteer work to spackle over their faults is necessarily fair either. Like at some point of scale, the burden shifts.
Yeah. I mean, we're at the point now where, and this will sound actually crazy, but we're building
a development environment specifically around CloudWan because now it's getting to the point
where we can't reason changes anymore. They're becoming too, well, too impactful. So we need to, we need an environment where we can actually test theories and, you know,
create test cases and make sure that what we do is correct. But I'm at the point where I'm like,
I also want to ask our TAMs to, let's say, contribute to this environment because it's
really expensive for us just to test things that we don't have any other way of testing.
It's the networking blast radius is always somewhat terrifying to me. I don't know about you,
but when I was working on networks once upon a time, one of the first things you learn to do
is you set up a cron job or an at job on whatever it is you're working on, usually a firewall.
And after some elapsed period of time, it automatically reverts to the previous config or you can run the command and put a sleep in there.
Because if the change doesn't work, suddenly you've locked yourself out and you now get to either drive across town to the data center or open a remote hands ticket or have all kinds of fun things that happen as a result.
And it's it's kind of scary.
I've been doing a lot of home networking lately.
And even now it's like, crap, I have to go downstairs again. My first job in the Netherlands, that was
almost the first thing that I did was I spaced out in that second. This is like 2001. So I'm
working on a Cisco router, get the config. And of course, it didn't go into my mind that as soon as
you add the first firewall rule, it adds an implicit deny after it.
So I didn't put the office in there.
So I'm running out the door to the car as everyone goes, hey, Chris.
And I'm already gone.
I'm like going to the data center to go and fix that.
It's like I am 99% sure I know exactly what's about to come out of your mouth.
Yeah. And I also think that's also why people are shy of it as well,
because once you really start to get your hands dirty
in this domain, the impact of any, you know,
boo-boo can be immense.
It also seems to me that networking has been very slow
to embrace change.
And again, I can say this as a systems person,
that's not necessarily intended
to be pejorative. Again, the scale is massive, mistakes matter, and it can be very convoluted.
But it seems that it's still in its infancy when it comes to the idea of programmatic control.
Everything seems to be done by hand and then applied as weird one-offs. There's no testing
facility to speak of.
I still remember in my very early days having to patch a monstrous Perl script called Rancid
so that it could speak to Radware load balancers.
And all this monstrosity did
was it logged into a variety of network equipment
that you told it to,
grabbed the copy of the config on the thing,
and then committed it to Subversion,
which was a Git precursor.
And then it would either email diffs out, or then you had an entire history of the change
on these things.
But the fact that that was an industry standard for as long as it was, and I really hope it's
not now, but it probably still is, is wild.
Oh, I'm sure it's sitting somewhere in a dark corner of a data center still humming along.
I would not be surprised.
But even the tech, right?
I mean, BGP as a protocol still runs the internet.
It's, what, 30 plus years old now.
And even in AWS, under the hood,
the transit gateways and CloudWan itself,
so the core network edges and everything is just talking BGP.
In fact, one of the things that I find most funny
is the connectinect attachments.
So on Transit Gateways and on the CloudWan, you have what's called a Kinect attachment,
which is a way how you can connect a third-party device.
So like maybe a software-defined networking device from someone like, I don't know,
Fortinet or Aviatrix or whoever, even Cisco.
But it's actually over a GRE tunnel.
And I don't know how old you are,
but a GRE tunnel is like ancient tech.
Eucalyptus, as I recall, or if not OpenStack,
used to do their entire fake presented network layer
by having everything running through GRE tunnels
between the physical hosts
and it would just build abstraction layers within them.
So yeah, I'm old is the short answer to that.
Been there, done that, have the battle scars
from the rack nuts that I was working on that week.
I've gone gray from all this networking
in the last few years.
Oh, everyone working in networking is old.
It was great.
It's like, oh, you have gray hair.
It's like, wow, what was it like back in your day, Grandpa?
It's like, I'm 24 years old.
What are you asking here?
Yeah, it ages you.
It really does.
Yeah, that's what we're knee deep in at the moment.
So we're busy.
We had to reprovision most of what we already had
because we had IP conflicts everywhere.
We actually used another service,
which the pricing for it drives me crazy.
It's a IPAM.
That's amazing.
You pay for the IPs that are under management.
Think of that.
I love that.
It's basically the world's most expensive version
of Microsoft Excel.
I understand the value of it because I look at this,
it's like, okay, here's a list of IP addresses that have understand the value of it because I look at this, it's like,
okay, here's a list of IP addresses that have been allocated across your entire AWS estate.
Great. And it charges you per IP address per month in that thing. And it sounds like this is a job for a spreadsheet where it starts to add value is, okay, pretend it's not just you
or a team of three people. Imagine now that you're a giant multinational and you have
a entire number of divisions that are all contributing to this IP scheme and the rest.
How do you wind up tracking all of that in one central place? It quickly becomes worth its weight
in gold. Before that, when I was in my on-prem days, the gold standard for this, because it
doesn't surprise surprise, turns out that spreadsheets don't scale super well. It was a company called Device 42,
which was a great way of having even rack level inventory.
And of course, my Route 53 as a database joke
started by annotating VMs with text records
saying what physical hosts they were on.
Yeah, that'll work.
Tracking this stuff's hard.
Yeah, absolutely.
But yeah, I mean, for us at IPAM,
we kind of had to wrap our head around the cost
and how it would justify it.
Now we're at the point where any feature team can just decide to roll out a new VPC and it automatically goes, grabs the allocation and it's done.
And there's no conflict. Everything works. It attaches to Cloud when routes are there.
And then from our perspective now, the management is like next to zero. It just, everything is automated. So from that perspective, we're very, very happy.
Yeah. When I see an AWS service and the pricing just strikes me as Looney Tunes,
my default assumption, especially these days is, okay, that probably means that I'm not thinking
about it in the right way and, or I'm not the target market. Even at small scale, things like
the managed NAT gateway at, you know, 30 bucks a month for the instance hours. So it offends independent learners,
but at small scale, okay, it starts to make sense.
But it really gets problematic
and you're costing $30,000 a day
on just the data you're shoving through the things.
What on earth is going on over there?
It's a, the pricing becomes architecture.
And I think that that is a big problem right now
is that networking in a cloud context is so radically different economically from anything you wind up doing on-prem.
Where ports of bandwidth to the Internet, for example, is generally charged at 95th percentile.
So you basically wind up having every five-minute sample the course of the month, sorting them from largest to smallest, chop off the top 5% and whatever the next one is, that's how much it costs you for the month. So yeah, get that
wrong and you'll wind up having significant overage charges. But once you pay for the size of the pipe,
it can be bored or it can be saturated and it doesn't matter at all. Now, suddenly every bite
passing through is metered and charged for combined with instance hours, which in a home lab
has never really been a thing. Increasingly, it means there is no home lab story for an awful lot
of AWS's increasingly impressive networking options. You have to almost find a company
that's using these things, and then, because they can justify the developer environment,
but as independent learners, we just can't. No, I mean, so even for myself, you know,
personally, I kind of have the ambition to
write about this as well.
So I'm halfway through it, but I sometimes toy with the idea of, you know, spinning some
of this stuff up to, you know, create some nice, you know, pictures or whatever.
But I'm just like, no, I am not going to pay for this.
Not even close to it.
It's way too expensive.
And then, yeah, I don't even know how even smaller companies could
even experiment with it because, you know, if you forget to turn it off for a month, you've all
of a sudden got a two, three K bill that you never expected, which yeah, can hurt quite a lot.
It's one of those big challenges. Back when I was learning this stuff myself, Cisco had a reasonably
decent program that I'm sure still exists because it's written in Java and that stuff lives forever
called Packet Tracer. And you could wind up building fake networks because you
didn't have a spare quarter million dollars to buy one of their catalyst switches that did this
stuff at scale. So you could set these things up and learn how to do the configuration and the rest
that it mostly worked. Where's that equivalent in AWS land? It's a problem from a home lab
perspective. I remember getting old Cisco gear
off of eBay or from employers that were decommissioning stuff just to build out a
somewhat reasonable Homelab. But that is such a different scale even now compared to what the
actual networking concerns of big companies tend to be. Just because small networks are mostly,
and I'm going to get yelled at for this, but mostly a solved
problem.
Oh yeah, absolutely.
I mean, I consider myself a bit of a networking guy, but at home I just use, for example,
and I'm not pimping any or I'm not pushing anything at all, but I love the Unify gear
because 99% of the work is done for me.
I just put it in, set up some networks and it just works.
So I don't want to think about it.
I'm still running that for Wi-Fi here.
I didn't love their AWS security breach
that they didn't fully disclose
when the indictment came out.
They had been hiding it.
They sued Krebs for reporting on it
in ways they didn't like.
But it's still, it is,
everything else is either way more expensive
or back in the days of flashing Linksys
all-in-one Wi-Fi nonsense thing
with OpenWard or GDWard or something,
just you could actually get more capabilities.
Exactly that.
So home networking is solved.
And I mean, even, you know, in my experience with most engineers,
and I'm not talking down to anybody at all,
because I know that, you know, there's a thousand topics out there,
but I would say as a good systems engineer,
which is what I really think I am, well, good is debatable,
but I think the more exposure you have to all the aspects, right? Because a system doesn't sit there
in isolation, right? It's connected to a network. It has storage. It has all these things. The more
you can dive into any of these topics, the better you will be at your job, the better you'll be able
to help the developers or the feature teams or, you know, explain things to management so that
they can make better decisions
with the budget or whatever, right?
So I don't understand why people avoid these topics.
And so much of it comes from getting it wrong
the first time.
I got yelled at this years ago by a boss
who didn't understand this concept.
And even now at home, my network here is 192.168.1.0.24.
In other words, there are 254 usable IP addresses that I can have on the network.
When it came time to build a separate IoT network, the common thing, oh, okay, so put the next block up.
So 192.168.2.
And down that path lies madness.
I picked the 192.168.128.
And because if I need to expand either network,
there's massive amounts of headroom.
Like, do you really see a scenario
when you're ever going to have
more than 254 IP addresses on a home network?
Have you met Kubernetes?
It eats IP addresses like it's nobody's freaking business.
My oven has an IP address.
I mean, everything now is connected, right?
So it will only get worse.
And you saw the AWS change.
As we record this on January 31st,
it takes effect tomorrow,
which is specifically that every IPv4 address
will cost roughly $3 to $4 per month,
regardless of whether it's attached or not. That means that that's going
to cost about $43 every year for every IPv4 address. And when this was first launched,
I was pretty enthusiastic about it because that's great. It's driving IPv6 adoption.
The problem is, is that people are slow to adopt IPv6 when they're at AWS working on service teams.
There are a laundry list of AWS services that flat out don't work, that require these things.
So the price of a whole bunch of things has just gone up
and people are about to be surprised and then some
when they see what happens to their bill
at the end of February.
I'm expecting my phone to basically explode.
Yes, I mean, I guess they need to raise their invoices a little bit.
I have a customer who will be charged many millions of dollars a year for this change.
One, I say like there's only one of them. And the official response is great. Well,
what about bring your own IP? We don't charge for that. We can bring your own IP allocation
and use that. Oh, great. So I'm just going to re-IP every device I have that's public facing and talks to customers.
Who's going to do all that work for communication
and networking and avoiding outages?
Jack, hold you.
I didn't think so.
So I guess I'm going to take it on the chin
and pay the millions of dollars.
And it's just a disaster.
I mean, even if you want to bring your own IPs,
then you need to go to, well, Aaron or APNIC or RIPE
or one of those organizations to even beg.
Oh, these days you're going to the secondary markets. They're not passing out the full
stuff anymore. It's gone now. And the stuff that they occasionally do is coming out of
Bogon space and whatnot. It's like, hey, do you want some IP addresses that a good third of the
internet's rarest devices refuse to acknowledge is valid and will just drop your packets on the
floor? Wow. The Bogon network. It's like there are actually some people whose sites I think should live in that IP space,
but that's just me being small and petty.
It's fine.
I mean, everybody has their way.
Exactly.
No one is excited by the prospect of building permissions
except for the people at Oso.
With Oso's authorization as a service,
you have building blocks for basic permissions patterns
like RBAC, REBAC, ABAC,
and the ability to extend
to more fine-grained authorization
as your applications evolve.
Build a centralized authorization service
that helps your developers build and deploy
new features quickly. Check out
Oso today at
osohq.com. Again, that's
O-S-O-H-Q
dot com. It's been a really interesting ride just watching the evolution OSOHQ.com. Again, that's OSOHQ.com.
It's been a really interesting ride
just watching the evolution of AWS networking
as it's gone from,
like originally with EC2 Classic,
which was just called EC2 back in those days,
everything is a big flat network
and you better be good with security policies.
And then they build abstractions on top of it
and abstractions on top of that.
Easy example, a public versus private subnet is simply a human convenience.
There is no declaration, public or private, in their APIs.
It's simply a question of does this get IP addresses assigned to it, yay or nay?
And, oh, is there a NAT gateway to let it speak to other things?
Yeah, I mean, these are, as you say, just for human convenience,
but they don't actually do anything.
And you could argue that IPv6,
when it actually eventually comes through,
will be kind of an interesting thing as well, right?
Because there's no concept of NATing with IPv6.
So a lot of ways how people consider security
will be, well, changed
because you can't hide behind
a NAT gateway. So now every device that you have, theoretically, if you don't do your policies
correctly, so your security groups or whatever, will be accessible, including in your home, right?
So that's a very interesting thought that probably also people haven't really considered.
My ISP doesn't support IPv6 natively, unfortunately.
So I set up a tunnel for a bit and that was great.
And the firewall rules are super important.
But what I found that was disheartening was how many things still broke.
Logging into some services would have some sort of application firewall that just hung my connection.
It would never load.
And it took me a bit to figure out
what it was until I forcibly disabled IPv6 on that node. And suddenly I could log into web pages.
I found that I have a few IoT stuff that are now leaking Matter IPv6 addresses onto my actual home
network. And it's, okay, so why does my computer have seven different IPv6 addresses here with one interface? That seems
a little off. And it's, oh, dear Lord, we're in no way ready for this.
So, and it's funny because like, for as long as I've even been in the Netherlands, you know,
going to my first RIPE meeting, they were talking about, you know, the IPv6, you know, uptake. And
it's like, there was this sad little graph with like barely moved up. Right.
And I think even now it's probably slightly up. But I am curious if before I retire, if IPv6 is
still anywhere further. I think that we're going to see a lot more interest in it just as soon as
people start realizing just how much this is costing them. Yeah. I mean, cost is usually a
driver to almost any kind of change like this. Other cloud providers have been charging the same
effective fee for many years. The difference is, is that they adopted this either from the outset
or when they were a lot smaller. They didn't wait until 2024 when a decent percentage of the
internet was going through them as the world's largest cloud provider. I think that it is going
to be wild. It's going to make AWS billions of dollars a year,
which, okay, good for them. Watch them all attribute it to how good they are at generative AI.
But okay, it just feels like on the one hand, it's rent seeking. But on the other,
I do understand it. These are a scarce and diminishing resource. You need to manage it
well. You're not allowed to get any more of them. And it costs them a giant pile of money to acquire these things.
How do the economics balance? From that perspective, I do understand as well. And
at least from my organization, we are extremely lean on the, let's say, externally facing IPs
that we even have. I think maybe we have a couple of hands worth of external IPs. So,
well, at least for us, we're not particularly worried. Yeah, we're in the same boat, but our
company account is relatively small, like 500 bucks a month, and it's going to go up by about
10% based on this charge. And I'm not going to sit there and try and hunt down the, what is that,
four or what is that, 10 or 11 IP addresses across the entire estate. I'm not going to hunt that down and with prejudice, it's not worth my time, but it is a noticeable bump on what I'm paying AWS.
And that's not because I'm being irresponsible. Well, of course, it's what others have been doing
for the longest period and it is a form of rent seeking as well. So yeah, maybe to shift the topic
a bit back to CloudWare, if you don't mind. By all means, please. One other little
thing, which is also kind of interesting, is that the industry that we're in, or at least my company
is in, payments processing, you can imagine that we're going to work with, hopefully if we are
successful, which of course, my dear overlords will ensure, we'll work with a lot of larger
companies that probably don't like Amazon,
right? They don't want their things, their credit card transactions processed on Amazon.
I have worked with a number of those companies myself.
So you have data in transit and you have data at rest. Almost everybody cares about data at rest.
Data in transit, people care, but I mean, there's more of a gray area there,
I would say. You can play with it. One of the aspects that we're also looking into,
and the rest of my engineering team will beat me up when they hear this, but I mean,
everybody knows that this is going to happen eventually. But how we built our platform is
to separate, let's say, the backend connectivity. So how we connect all the different card schemes,
so all the credit card
schemes. We separated that. Actually, you can imagine it like a kind of, we've stacked where
we do the actual workload. So all the workload processing, all of that is up above. We have
CloudWan in the middle. It's like this nice glue to kind of connect everything together and to do
the separation. And at the bottom side, we do all the connectivity. So the really expensive stuff at the bottom and all the
stuff that we can push out everywhere at the top. Sorry, I don't want to do too much of a
big monologue here. No, no, please. This is fascinating. Tell me more.
The reason why we did it this way is because the bottom part is very expensive, right? So
we're talking data centers, we're talking physical connectivity, we're talking all these kinds of things in Europe, North America, Asia, everywhere. These are
expensive. But the ones at the top are purely AWS, right? So we do everything as much as possible in
AWS leading on their services as well. This means that we can also deploy in any region
very quickly and hook it up via CloudWare down into the various card schemes
very quickly. So if a customer calls up and says, hey, I'm in Tokyo, but we don't have anything
there, but we can hook it up via, well, even a local zone, which I learned a lot about at
reInvent, back through CloudWare and then back to the processing in Europe or North America, depends which one makes sense,
then we can be live within days, weeks.
Of course, customer integration time takes a long time,
but we can be ready for them to start integrating
and testing within a day in logical terms.
It's extremely quick.
But what if we need to go to other cloud vendors, right?
So say someone says,
I'm not going to touch Amazon at all.
No, I don't want that.
So this is where those connect attachments come in
because then we can do an SDN device
or we could even be in a different cloud, right?
Because it's just a GRE tunnel.
So then we're talking like, okay,
we have a connect attachment to a SDN device and we connect that across to Azure.
Now we've bridged that gap.
So all of our expensive stuff can stay in one place, but now we can expand very easily into other cloud vendors.
Without all the hot trouble of trying to get incompatible interpretations of IPSec working between providers and getting the security groups working,
the routing and the rest.
I talked to a company that spent four months on that
before giving up completely
and deciding to take a different approach.
Yeah, I'm looking forward to seeing
how that winds up branching out.
I think we need to see more customers using it that way
and building tooling around it.
I mean, historically, things like Terraform arose
because everyone is trying to solve the exact same problems.
This feels like it's a lot more rarefied as far as who is going to experience these particular
requirements. That number only grows with time, but I think it's just going to take a while for
us to start seeing that awareness trickling into the mainstream. For us, the real need comes from
low latency, right? So if you're at a restaurant and you have your card on your phone, you want to tap it down,
you don't want to sit there for 30 seconds to a minute waiting for it to come on, come
on, come on, come on.
You want to tap it and go, right?
You just want it to work now.
So the number one thing that we, oh, actually we have two things, latency and never drop
an authorization, right? You don't want to
be double charged. You don't want to have it go beep and then nothing, right? That's impossible,
right? So can't lose anything and it must be fast. So those two requirements give us quite
some budget in the networking space, right? So I can understand that it's also not something that
a lot of companies would use either, right? Because it is quite a niche problem to have. But I mean, even if you have, you know, I don't know, if you're
in multi-regional setup and you have any kind of external connectivity, then this is where it starts
to really make sense. Yeah, it's definitely something that is clearly solving problems
for folks. I have to confess, when they first told me about CloudWan, I was skeptical because I was trying to map it to the problem of the week that
I was tackling at the time. And like, this is useless. I could barely use this as a database.
What's going on? Not for lack of trying, but it was a, okay, all I have to do whenever I think
I've gotten a lock on something or written something off is talk to a customer who's
using it.
And I learn an awful lot, not always for the better in some cases. And there's occasionally
times where I cannot find a single customer for the life of me. And that does inform some
educated guesses as far as just how many people are using this thing. But I'm glad to see that
you folks are out there. Did you get to catch up with any other Cloudland customers or is it
possible you're the only one these days?
So on that, more customers are onboarding.
They don't like to call us the biggest user anymore.
Maybe it feels wrong for them.
Oh, they love doing that as part of a sales process too.
Like, do you have any idea how many biggest S3 customers
I've encountered just this past year alone?
Exactly.
We're the biggest S3.
Sure you are.
Well, the only difference,
at least how I feel,
is that at least we are talking directly
to the service team
to actually also give them ideas
and feature requests.
Because one of the biggest problems that we have,
and I can mention this,
is the fact that when you build out your network,
you have these core network edges.
These are those 500 euro a month devices.
They're basically transit gateways,
but they're called CNEs.
These things cost you 500 a month.
When you add more than one,
so you have two,
they connect to each other.
If you have four,
then each one connects to each other.
So you have a full mesh.
The problem then starts to go with routing
because what we try to do,
we try to be smart.
We're like, okay, let's do a centralized egress, right?
Because now we have all these accounts
and normally every account had NAT gateways,
internet gateways, and that was costing us money, right?
So we're like, okay, now we have CloudWare.
We can centralize this.
So we have like 50 accounts,
we have a central egress,
and you're going to go through that.
Perfect.
So now we only have one NAT gateway,
one internet gateway.
We put a network firewall in there. All that nice stuff. Perfect.
Better centralization, better story around it, better cost economics, better cost efficiency.
All of that is true. It's great. Except, so say, for example, we have an egress in EU-West 1,
but we have a workload in EU-Central 1. We also have one in EU- one and EU East one. Sorry, US East one, US West one. And we have
an egress in US East one. The problem is that in the other regions that do not have an egress,
you have no way to direct the traffic to a preferred egress, right? So we had the
funny situation that in EU Central, it was going out the US egress. And US might go to EU,
or it might go to the US. And it differed segment to segment. So we had like devs going this way
in this segment, production going that way. And we're like, what the hell?
And that can have consequences for a number of things.
Sure. I mean, we have customers that like to whitelist IP addresses. We tell them not to,
but they'll do it anyway.
And then suddenly things break. If you can get big companies to stop doing that game,
oh my God, that's the biggest thing that's going to make it impossible for some companies to get
off of their allocated IPv4 stuff because it takes an act of God to get companies to update firewalls.
We now have a slash 64 of IPv6. Please enter that into your firewall.
IP by IP. Thank you. Off you go. Yeah, by hand. Please enter that into your firewall. IP by IP.
Thank you.
Off you go.
Yeah, by hand.
It's the worst internship ever.
I swear we have customers
that I believe truly do that.
But OK, I didn't say that.
Of course not.
I really want to thank you
for taking the time
to talk to me about this.
If people want to learn more
about how you see these things,
where's the best place
for them to find you?
At the moment, find me on LinkedIn.
I believe I sent it across to you and I am working on a blog. So I will send that out. Well, I will
put it on my LinkedIn profile very soon. I have four posts so far and I'm going to keep working
on it because yeah, I think that this is an interesting story and yeah, that's the easiest
way to find me. And otherwise, I'll be at the AWS
Summit in Amsterdam in a couple of months and probably reinvent again this year. So I'll be
around. I look forward to seeing you at least one of those things. Thanks so much for taking the
time to speak with me. I appreciate it. Thank you very much for having me. I really enjoyed it.
Chris Gillespie, Principal Platform Engineer at Silverflow. I'm cloud economist Corey Quinn,
and this is Screaming in the Cloud.
If you enjoyed this podcast,
please leave a five-star review
on your podcast platform of choice.
Whereas if you hated this podcast,
please leave a five-star review
on your podcast platform of choice,
along with an angry comment
that will fail to save properly
because that podcast platform
just implemented IPv6 this morning badly.