Screaming in the Cloud - A Conversation on Cloud WAN with Kris Gillespie

Starting point is 00:00:00 We kind of had to wrap our head around the cost and how it would justify it. Now we're at the point where any feature team can just decide to roll out a new VPC and it automatically goes, it grabs the allocation and it's done. And there's no conflict. Everything works. Welcome to Screaming in the Cloud. I'm Corey Quinn. One of the fundamental things about cloud is something that's largely

Starting point is 00:00:27 been abstracted away almost to the point where it's become something of a lie. Whereas the network that really exists in cloud is not necessarily what is presented to us as customers. It's a polite fiction, which is, I guess, a different way of saying virtualization. But what's happening there is often not understood at almost any level, just because that skill set is not as front and center as it once needed to be. I had a conversation with today's guest about this exact thing at reInvent, and I figured we'd have this conversation a longer format where my voice wasn't basically down to a croak. Chris Gillespie is a principal platform engineer at Silverflow. Chris, thank you for agreeing to do this in a more public format.

Starting point is 00:01:11 Oh, you're welcome. I actually really enjoyed our conversation at a reInvent. And even though that was quite a public conversation, let's see if this can be more useful to your listeners out there as well. What I love about reInvent and honestly, all conferences is the hallway track. You get to have conversations and things just sort of pop up out of nowhere because you didn't really know that they were coming. As I recall, what really got us started talking was that you were the first and so far only CloudWan customer that I have found in the wild. CloudWand being yet another networking service that is poorly described by AWS, and even more surprising on some level, you were brimming with praise about it. So terrific. Someone will talk to me about it. Does it actually work on the product at Amazon? What's the deal?

Starting point is 00:01:58 The journey to CloudWand came completely by accident. When I started working at Silverflow, they're a payments processing company. So the idea is basically to have this one API which we can do transactions on, but then scale it globally, right? So have it in the US, Europe, Asia Pacific, wherever. So we had everything running in one region because we're based in Europe, we're in the Netherlands. But we were talking to some possible customers in North America,

Starting point is 00:02:27 and then we start to think like, okay, how do we actually scale this properly? Because everything has been designed, you know, following best practices. So we have VPCs with slash 16s everywhere. Or when I say we, I mean, when I walked in the door, they were there. And I'm like, okay, slash 16s feels a bit wasteful, but okay. And then we got to the point where we're like, okay, how do we scale this? How do we actually take these 30, 40, 50 accounts with VPCs everywhere, slash 16s everywhere, and now replicate that two, three, four, five, six times.

Starting point is 00:02:55 Without having IP conflicts up the wazoo, which is always fun when you start pairing things together that were not designed to be paired. Tell me about it. We already had IP conflicts and we were already at the point where it's like, okay, somebody mentioned somewhere that it's possible to, well, that of course you can do transit gateway peering, but that dynamic routing will come eventually. So my mission, not last year,

Starting point is 00:03:19 but the year before at reInvent was to find somebody who works at AWS that could tell me about it. So I just went hunting for that person. And yeah, eventually they told me about CloudWare. And that was like the hot moment. I keep wanting to play with it. One of the problems I've had with it historically

Starting point is 00:03:34 has been that at its minimum scale, I think it's something like $500 a month to run the smallest possible expression of it, which isn't particularly useful because you want to have multiple things talking to it. So at that point, we're talking thousands of dollars a month for me to build out anything that remotely resembles a reasonable test lab. And sorry, Amazon, I'm not quite at a point where I'm just willing to throw that kind of R&D budget at explaining your own services for you because you are bad at it. So I just wait until I encounter people in the wild.

Starting point is 00:04:02 And then like any good consultant, I turn other people's production into my test accounts. So I'm glad to hear that it solved some of the painful parts for you. What exactly was it that things like Transit Gateway and variety of convoluted peering setups weren't getting done for you? I mean, Transit Gateways work fine in a regional context. So yeah, you can actually share them across accounts, but they're a regional service. Once you start wanting to connect multiple regions together, the connection between them is, well, the routing is static. So you need to have some sort of Lambda updating routing tables

Starting point is 00:04:42 on your transit gateways to maintain any kind of, I don't know, let's say accurate view of your network, especially if you have things like, you know, VPN connections or, you know, customer gateways or anything that's kind of dynamic. So yeah, you're kind of mixing dynamic with static and then you need to add some extra, yeah, Lambdas and stuff that you have to write yourself to manage this. And I'm thinking, why? Why should I write some TypeScript code

Starting point is 00:05:10 or Python or whatever to manage networking that you should do? Right, I was under the impression that router was a piece of hardware, not a job description for someone. So there's a, you're half step above passing packets by hand at that point. Basically, yes.

Starting point is 00:05:24 And if you're talking about a VPC peering, then that's just like everything there, everything there. And you sort things out through knuckles. And I never, ever want to deal with the knuckles. That's like, I just want to kill myself then. Even AWS's policies around network ACLs has been, don't use them unless you have a hard bound requirement where you must use them.

Starting point is 00:05:45 Use security groups because it's one of those hidden things you don't think about or see. They're not very granular. You're limited in how many you can have applied. And when something isn't working that should, you will drive yourself up a wall before you remember that they're there.

Starting point is 00:05:59 Exactly. You need to have logging everywhere. Know that you need to turn it on here, here, here, here, and here, and then try to correlate it all the way through. So as far as visibility on the network layer, it's also quite a challenge. That's also something that we're very hard trying to solve. But again, then it leads you down the path of more AWS services. Oso makes it easy for developers to build authorization into their applications. With Oso, you can model, extend, and enforce your authorization as your applications scale.

Starting point is 00:06:30 Organizations like Intercom, Headway Product Board, and PagerDuty have migrated to Oso to build fine-grained authorization backed by a highly available and performance service. Check out Oso today at osohq.com. That's O-S-O-H-Q dot com. I was a grumpy Unix sysadmin who then learned Linux for a job. And one year in, we had the 2008 financial crisis. Suddenly, no one's hiring and salary freezes across the board, and I was fairly bored at work. So I wound up spending that year getting my CCNA because it was, okay, what area am I hand-waving over in my day job that I feel like I really should know more about? So networking was an easy answer.

Starting point is 00:07:13 I'm not saying I'm any great shakes at it, but I definitely understand it a lot better than I did. And it made me a better systems person as a direct result. And over the, dear Lord, almost 20 years since, I'm realizing just how rare I am just by with that surface level understanding of a lot of networking concepts, because in cloud, you don't actually have to think about the network at all until suddenly one day you very much do. And by that point, you don't even know where to begin. It's an entire area that is sort of slipped below

Starting point is 00:07:43 the surface level of awareness that is still critically important because a computer without a network is basically an expensive space heater. It's actually really, I would say, ironic as well. So my background is initially on, let's say, help desk, systems. Then I went into networking, storage engineering, back to systems, back to networking. And now all that kind of wraps together and you call it, let's say, platform engineering. I mean, I don't even know what to call what I do anymore. But when I was hired for my job currently, they had no idea that I had any networking experience. It was just because my last, like, say, eight, 10 years has been largely focused on, let's say, cloud. And so that in most people's minds is infrastructure as code, CICD, and that's about it. And people instead wind up getting judgy and

Starting point is 00:08:29 annoying because in job interviews, you're bad at doing things like implementing quick sort on a whiteboard. I can't do any of that. I am not a developer, right? See, I used to say the same thing. And then I realized that I was writing an awful lot of configuration as code and a lot of scripts that were getting fairly up there. And Python just started to intrinsically make sense. And now, of course, I write the most common programming language, which is YAML. And now I have this amazing alchemical app. I have this amazing alchemical ability to turn YAML files into AWS bills. It's kind of amazing. It's a horrible party trick and you're never invited back to that party. No. Well, I mean, yeah, so we actually use a hybrid or I would say a Frankenstein's monster version of a cloud formation called org formation. Let's say it's organizationally aware cloud

Starting point is 00:09:14 formation. So you can very easily deploy a stack. So all this YAML across 20 accounts, 30 accounts. Yeah. If you make one little boo-boo and all of a sudden you've deployed cloud network edges in 20 regions, then your bill goes. So yes, we can very much scale our costs much to the delight of our TAMs. What's fun about a lot of that too is that it's this whole world of,

Starting point is 00:09:39 okay, what am I going to do? I'm going to teach an AWS service about multiple accounts and or multiple regions on fun days, possibly both. And you're sparkling over things that AWS really should be doing for us as customers, but haven't gotten around to yet. And let's be clear, I know that these are hard problems to solve. They're also a division of a $1.5 trillion company. And I don't think that asking the rest of us to do volunteer work to spackle over their faults is necessarily fair either. Like at some point of scale, the burden shifts. Yeah. I mean, we're at the point now where, and this will sound actually crazy, but we're building

Starting point is 00:10:15 a development environment specifically around CloudWan because now it's getting to the point where we can't reason changes anymore. They're becoming too, well, too impactful. So we need to, we need an environment where we can actually test theories and, you know, create test cases and make sure that what we do is correct. But I'm at the point where I'm like, I also want to ask our TAMs to, let's say, contribute to this environment because it's really expensive for us just to test things that we don't have any other way of testing. It's the networking blast radius is always somewhat terrifying to me. I don't know about you, but when I was working on networks once upon a time, one of the first things you learn to do is you set up a cron job or an at job on whatever it is you're working on, usually a firewall.

Starting point is 00:11:00 And after some elapsed period of time, it automatically reverts to the previous config or you can run the command and put a sleep in there. Because if the change doesn't work, suddenly you've locked yourself out and you now get to either drive across town to the data center or open a remote hands ticket or have all kinds of fun things that happen as a result. And it's it's kind of scary. I've been doing a lot of home networking lately. And even now it's like, crap, I have to go downstairs again. My first job in the Netherlands, that was almost the first thing that I did was I spaced out in that second. This is like 2001. So I'm working on a Cisco router, get the config. And of course, it didn't go into my mind that as soon as you add the first firewall rule, it adds an implicit deny after it.

Starting point is 00:11:47 So I didn't put the office in there. So I'm running out the door to the car as everyone goes, hey, Chris. And I'm already gone. I'm like going to the data center to go and fix that. It's like I am 99% sure I know exactly what's about to come out of your mouth. Yeah. And I also think that's also why people are shy of it as well, because once you really start to get your hands dirty in this domain, the impact of any, you know,

Starting point is 00:12:12 boo-boo can be immense. It also seems to me that networking has been very slow to embrace change. And again, I can say this as a systems person, that's not necessarily intended to be pejorative. Again, the scale is massive, mistakes matter, and it can be very convoluted. But it seems that it's still in its infancy when it comes to the idea of programmatic control. Everything seems to be done by hand and then applied as weird one-offs. There's no testing

Starting point is 00:12:43 facility to speak of. I still remember in my very early days having to patch a monstrous Perl script called Rancid so that it could speak to Radware load balancers. And all this monstrosity did was it logged into a variety of network equipment that you told it to, grabbed the copy of the config on the thing, and then committed it to Subversion,

Starting point is 00:13:03 which was a Git precursor. And then it would either email diffs out, or then you had an entire history of the change on these things. But the fact that that was an industry standard for as long as it was, and I really hope it's not now, but it probably still is, is wild. Oh, I'm sure it's sitting somewhere in a dark corner of a data center still humming along. I would not be surprised. But even the tech, right?

Starting point is 00:13:23 I mean, BGP as a protocol still runs the internet. It's, what, 30 plus years old now. And even in AWS, under the hood, the transit gateways and CloudWan itself, so the core network edges and everything is just talking BGP. In fact, one of the things that I find most funny is the connectinect attachments. So on Transit Gateways and on the CloudWan, you have what's called a Kinect attachment,

Starting point is 00:13:51 which is a way how you can connect a third-party device. So like maybe a software-defined networking device from someone like, I don't know, Fortinet or Aviatrix or whoever, even Cisco. But it's actually over a GRE tunnel. And I don't know how old you are, but a GRE tunnel is like ancient tech. Eucalyptus, as I recall, or if not OpenStack, used to do their entire fake presented network layer

Starting point is 00:14:13 by having everything running through GRE tunnels between the physical hosts and it would just build abstraction layers within them. So yeah, I'm old is the short answer to that. Been there, done that, have the battle scars from the rack nuts that I was working on that week. I've gone gray from all this networking in the last few years.

Starting point is 00:14:28 Oh, everyone working in networking is old. It was great. It's like, oh, you have gray hair. It's like, wow, what was it like back in your day, Grandpa? It's like, I'm 24 years old. What are you asking here? Yeah, it ages you. It really does.

Starting point is 00:14:39 Yeah, that's what we're knee deep in at the moment. So we're busy. We had to reprovision most of what we already had because we had IP conflicts everywhere. We actually used another service, which the pricing for it drives me crazy. It's a IPAM. That's amazing.

Starting point is 00:14:55 You pay for the IPs that are under management. Think of that. I love that. It's basically the world's most expensive version of Microsoft Excel. I understand the value of it because I look at this, it's like, okay, here's a list of IP addresses that have understand the value of it because I look at this, it's like, okay, here's a list of IP addresses that have been allocated across your entire AWS estate.

Starting point is 00:15:14 Great. And it charges you per IP address per month in that thing. And it sounds like this is a job for a spreadsheet where it starts to add value is, okay, pretend it's not just you or a team of three people. Imagine now that you're a giant multinational and you have a entire number of divisions that are all contributing to this IP scheme and the rest. How do you wind up tracking all of that in one central place? It quickly becomes worth its weight in gold. Before that, when I was in my on-prem days, the gold standard for this, because it doesn't surprise surprise, turns out that spreadsheets don't scale super well. It was a company called Device 42, which was a great way of having even rack level inventory. And of course, my Route 53 as a database joke

Starting point is 00:15:51 started by annotating VMs with text records saying what physical hosts they were on. Yeah, that'll work. Tracking this stuff's hard. Yeah, absolutely. But yeah, I mean, for us at IPAM, we kind of had to wrap our head around the cost and how it would justify it.

Starting point is 00:16:06 Now we're at the point where any feature team can just decide to roll out a new VPC and it automatically goes, grabs the allocation and it's done. And there's no conflict. Everything works. It attaches to Cloud when routes are there. And then from our perspective now, the management is like next to zero. It just, everything is automated. So from that perspective, we're very, very happy. Yeah. When I see an AWS service and the pricing just strikes me as Looney Tunes, my default assumption, especially these days is, okay, that probably means that I'm not thinking about it in the right way and, or I'm not the target market. Even at small scale, things like the managed NAT gateway at, you know, 30 bucks a month for the instance hours. So it offends independent learners, but at small scale, okay, it starts to make sense.

Starting point is 00:16:48 But it really gets problematic and you're costing $30,000 a day on just the data you're shoving through the things. What on earth is going on over there? It's a, the pricing becomes architecture. And I think that that is a big problem right now is that networking in a cloud context is so radically different economically from anything you wind up doing on-prem. Where ports of bandwidth to the Internet, for example, is generally charged at 95th percentile.

Starting point is 00:17:17 So you basically wind up having every five-minute sample the course of the month, sorting them from largest to smallest, chop off the top 5% and whatever the next one is, that's how much it costs you for the month. So yeah, get that wrong and you'll wind up having significant overage charges. But once you pay for the size of the pipe, it can be bored or it can be saturated and it doesn't matter at all. Now, suddenly every bite passing through is metered and charged for combined with instance hours, which in a home lab has never really been a thing. Increasingly, it means there is no home lab story for an awful lot of AWS's increasingly impressive networking options. You have to almost find a company that's using these things, and then, because they can justify the developer environment, but as independent learners, we just can't. No, I mean, so even for myself, you know,

Starting point is 00:18:04 personally, I kind of have the ambition to write about this as well. So I'm halfway through it, but I sometimes toy with the idea of, you know, spinning some of this stuff up to, you know, create some nice, you know, pictures or whatever. But I'm just like, no, I am not going to pay for this. Not even close to it. It's way too expensive. And then, yeah, I don't even know how even smaller companies could

Starting point is 00:18:25 even experiment with it because, you know, if you forget to turn it off for a month, you've all of a sudden got a two, three K bill that you never expected, which yeah, can hurt quite a lot. It's one of those big challenges. Back when I was learning this stuff myself, Cisco had a reasonably decent program that I'm sure still exists because it's written in Java and that stuff lives forever called Packet Tracer. And you could wind up building fake networks because you didn't have a spare quarter million dollars to buy one of their catalyst switches that did this stuff at scale. So you could set these things up and learn how to do the configuration and the rest that it mostly worked. Where's that equivalent in AWS land? It's a problem from a home lab

Starting point is 00:19:03 perspective. I remember getting old Cisco gear off of eBay or from employers that were decommissioning stuff just to build out a somewhat reasonable Homelab. But that is such a different scale even now compared to what the actual networking concerns of big companies tend to be. Just because small networks are mostly, and I'm going to get yelled at for this, but mostly a solved problem. Oh yeah, absolutely. I mean, I consider myself a bit of a networking guy, but at home I just use, for example,

Starting point is 00:19:31 and I'm not pimping any or I'm not pushing anything at all, but I love the Unify gear because 99% of the work is done for me. I just put it in, set up some networks and it just works. So I don't want to think about it. I'm still running that for Wi-Fi here. I didn't love their AWS security breach that they didn't fully disclose when the indictment came out.

Starting point is 00:19:50 They had been hiding it. They sued Krebs for reporting on it in ways they didn't like. But it's still, it is, everything else is either way more expensive or back in the days of flashing Linksys all-in-one Wi-Fi nonsense thing with OpenWard or GDWard or something,

Starting point is 00:20:05 just you could actually get more capabilities. Exactly that. So home networking is solved. And I mean, even, you know, in my experience with most engineers, and I'm not talking down to anybody at all, because I know that, you know, there's a thousand topics out there, but I would say as a good systems engineer, which is what I really think I am, well, good is debatable,

Starting point is 00:20:24 but I think the more exposure you have to all the aspects, right? Because a system doesn't sit there in isolation, right? It's connected to a network. It has storage. It has all these things. The more you can dive into any of these topics, the better you will be at your job, the better you'll be able to help the developers or the feature teams or, you know, explain things to management so that they can make better decisions with the budget or whatever, right? So I don't understand why people avoid these topics. And so much of it comes from getting it wrong

Starting point is 00:20:53 the first time. I got yelled at this years ago by a boss who didn't understand this concept. And even now at home, my network here is 192.168.1.0.24. In other words, there are 254 usable IP addresses that I can have on the network. When it came time to build a separate IoT network, the common thing, oh, okay, so put the next block up. So 192.168.2. And down that path lies madness.

Starting point is 00:21:21 I picked the 192.168.128. And because if I need to expand either network, there's massive amounts of headroom. Like, do you really see a scenario when you're ever going to have more than 254 IP addresses on a home network? Have you met Kubernetes? It eats IP addresses like it's nobody's freaking business.

Starting point is 00:21:42 My oven has an IP address. I mean, everything now is connected, right? So it will only get worse. And you saw the AWS change. As we record this on January 31st, it takes effect tomorrow, which is specifically that every IPv4 address will cost roughly $3 to $4 per month,

Starting point is 00:22:02 regardless of whether it's attached or not. That means that that's going to cost about $43 every year for every IPv4 address. And when this was first launched, I was pretty enthusiastic about it because that's great. It's driving IPv6 adoption. The problem is, is that people are slow to adopt IPv6 when they're at AWS working on service teams. There are a laundry list of AWS services that flat out don't work, that require these things. So the price of a whole bunch of things has just gone up and people are about to be surprised and then some when they see what happens to their bill

Starting point is 00:22:36 at the end of February. I'm expecting my phone to basically explode. Yes, I mean, I guess they need to raise their invoices a little bit. I have a customer who will be charged many millions of dollars a year for this change. One, I say like there's only one of them. And the official response is great. Well, what about bring your own IP? We don't charge for that. We can bring your own IP allocation and use that. Oh, great. So I'm just going to re-IP every device I have that's public facing and talks to customers. Who's going to do all that work for communication

Starting point is 00:23:07 and networking and avoiding outages? Jack, hold you. I didn't think so. So I guess I'm going to take it on the chin and pay the millions of dollars. And it's just a disaster. I mean, even if you want to bring your own IPs, then you need to go to, well, Aaron or APNIC or RIPE

Starting point is 00:23:23 or one of those organizations to even beg. Oh, these days you're going to the secondary markets. They're not passing out the full stuff anymore. It's gone now. And the stuff that they occasionally do is coming out of Bogon space and whatnot. It's like, hey, do you want some IP addresses that a good third of the internet's rarest devices refuse to acknowledge is valid and will just drop your packets on the floor? Wow. The Bogon network. It's like there are actually some people whose sites I think should live in that IP space, but that's just me being small and petty. It's fine.

Starting point is 00:23:49 I mean, everybody has their way. Exactly. No one is excited by the prospect of building permissions except for the people at Oso. With Oso's authorization as a service, you have building blocks for basic permissions patterns like RBAC, REBAC, ABAC, and the ability to extend

Starting point is 00:24:08 to more fine-grained authorization as your applications evolve. Build a centralized authorization service that helps your developers build and deploy new features quickly. Check out Oso today at osohq.com. Again, that's O-S-O-H-Q

Starting point is 00:24:24 dot com. It's been a really interesting ride just watching the evolution OSOHQ.com. Again, that's OSOHQ.com. It's been a really interesting ride just watching the evolution of AWS networking as it's gone from, like originally with EC2 Classic, which was just called EC2 back in those days, everything is a big flat network and you better be good with security policies.

Starting point is 00:24:40 And then they build abstractions on top of it and abstractions on top of that. Easy example, a public versus private subnet is simply a human convenience. There is no declaration, public or private, in their APIs. It's simply a question of does this get IP addresses assigned to it, yay or nay? And, oh, is there a NAT gateway to let it speak to other things? Yeah, I mean, these are, as you say, just for human convenience, but they don't actually do anything.

Starting point is 00:25:07 And you could argue that IPv6, when it actually eventually comes through, will be kind of an interesting thing as well, right? Because there's no concept of NATing with IPv6. So a lot of ways how people consider security will be, well, changed because you can't hide behind a NAT gateway. So now every device that you have, theoretically, if you don't do your policies

Starting point is 00:25:33 correctly, so your security groups or whatever, will be accessible, including in your home, right? So that's a very interesting thought that probably also people haven't really considered. My ISP doesn't support IPv6 natively, unfortunately. So I set up a tunnel for a bit and that was great. And the firewall rules are super important. But what I found that was disheartening was how many things still broke. Logging into some services would have some sort of application firewall that just hung my connection. It would never load.

Starting point is 00:26:04 And it took me a bit to figure out what it was until I forcibly disabled IPv6 on that node. And suddenly I could log into web pages. I found that I have a few IoT stuff that are now leaking Matter IPv6 addresses onto my actual home network. And it's, okay, so why does my computer have seven different IPv6 addresses here with one interface? That seems a little off. And it's, oh, dear Lord, we're in no way ready for this. So, and it's funny because like, for as long as I've even been in the Netherlands, you know, going to my first RIPE meeting, they were talking about, you know, the IPv6, you know, uptake. And it's like, there was this sad little graph with like barely moved up. Right.

Starting point is 00:26:45 And I think even now it's probably slightly up. But I am curious if before I retire, if IPv6 is still anywhere further. I think that we're going to see a lot more interest in it just as soon as people start realizing just how much this is costing them. Yeah. I mean, cost is usually a driver to almost any kind of change like this. Other cloud providers have been charging the same effective fee for many years. The difference is, is that they adopted this either from the outset or when they were a lot smaller. They didn't wait until 2024 when a decent percentage of the internet was going through them as the world's largest cloud provider. I think that it is going to be wild. It's going to make AWS billions of dollars a year,

Starting point is 00:27:26 which, okay, good for them. Watch them all attribute it to how good they are at generative AI. But okay, it just feels like on the one hand, it's rent seeking. But on the other, I do understand it. These are a scarce and diminishing resource. You need to manage it well. You're not allowed to get any more of them. And it costs them a giant pile of money to acquire these things. How do the economics balance? From that perspective, I do understand as well. And at least from my organization, we are extremely lean on the, let's say, externally facing IPs that we even have. I think maybe we have a couple of hands worth of external IPs. So, well, at least for us, we're not particularly worried. Yeah, we're in the same boat, but our

Starting point is 00:28:10 company account is relatively small, like 500 bucks a month, and it's going to go up by about 10% based on this charge. And I'm not going to sit there and try and hunt down the, what is that, four or what is that, 10 or 11 IP addresses across the entire estate. I'm not going to hunt that down and with prejudice, it's not worth my time, but it is a noticeable bump on what I'm paying AWS. And that's not because I'm being irresponsible. Well, of course, it's what others have been doing for the longest period and it is a form of rent seeking as well. So yeah, maybe to shift the topic a bit back to CloudWare, if you don't mind. By all means, please. One other little thing, which is also kind of interesting, is that the industry that we're in, or at least my company is in, payments processing, you can imagine that we're going to work with, hopefully if we are

Starting point is 00:28:57 successful, which of course, my dear overlords will ensure, we'll work with a lot of larger companies that probably don't like Amazon, right? They don't want their things, their credit card transactions processed on Amazon. I have worked with a number of those companies myself. So you have data in transit and you have data at rest. Almost everybody cares about data at rest. Data in transit, people care, but I mean, there's more of a gray area there, I would say. You can play with it. One of the aspects that we're also looking into, and the rest of my engineering team will beat me up when they hear this, but I mean,

Starting point is 00:29:34 everybody knows that this is going to happen eventually. But how we built our platform is to separate, let's say, the backend connectivity. So how we connect all the different card schemes, so all the credit card schemes. We separated that. Actually, you can imagine it like a kind of, we've stacked where we do the actual workload. So all the workload processing, all of that is up above. We have CloudWan in the middle. It's like this nice glue to kind of connect everything together and to do the separation. And at the bottom side, we do all the connectivity. So the really expensive stuff at the bottom and all the stuff that we can push out everywhere at the top. Sorry, I don't want to do too much of a

Starting point is 00:30:14 big monologue here. No, no, please. This is fascinating. Tell me more. The reason why we did it this way is because the bottom part is very expensive, right? So we're talking data centers, we're talking physical connectivity, we're talking all these kinds of things in Europe, North America, Asia, everywhere. These are expensive. But the ones at the top are purely AWS, right? So we do everything as much as possible in AWS leading on their services as well. This means that we can also deploy in any region very quickly and hook it up via CloudWare down into the various card schemes very quickly. So if a customer calls up and says, hey, I'm in Tokyo, but we don't have anything there, but we can hook it up via, well, even a local zone, which I learned a lot about at

Starting point is 00:30:57 reInvent, back through CloudWare and then back to the processing in Europe or North America, depends which one makes sense, then we can be live within days, weeks. Of course, customer integration time takes a long time, but we can be ready for them to start integrating and testing within a day in logical terms. It's extremely quick. But what if we need to go to other cloud vendors, right? So say someone says,

Starting point is 00:31:25 I'm not going to touch Amazon at all. No, I don't want that. So this is where those connect attachments come in because then we can do an SDN device or we could even be in a different cloud, right? Because it's just a GRE tunnel. So then we're talking like, okay, we have a connect attachment to a SDN device and we connect that across to Azure.

Starting point is 00:31:48 Now we've bridged that gap. So all of our expensive stuff can stay in one place, but now we can expand very easily into other cloud vendors. Without all the hot trouble of trying to get incompatible interpretations of IPSec working between providers and getting the security groups working, the routing and the rest. I talked to a company that spent four months on that before giving up completely and deciding to take a different approach. Yeah, I'm looking forward to seeing

Starting point is 00:32:12 how that winds up branching out. I think we need to see more customers using it that way and building tooling around it. I mean, historically, things like Terraform arose because everyone is trying to solve the exact same problems. This feels like it's a lot more rarefied as far as who is going to experience these particular requirements. That number only grows with time, but I think it's just going to take a while for us to start seeing that awareness trickling into the mainstream. For us, the real need comes from

Starting point is 00:32:38 low latency, right? So if you're at a restaurant and you have your card on your phone, you want to tap it down, you don't want to sit there for 30 seconds to a minute waiting for it to come on, come on, come on, come on. You want to tap it and go, right? You just want it to work now. So the number one thing that we, oh, actually we have two things, latency and never drop an authorization, right? You don't want to be double charged. You don't want to have it go beep and then nothing, right? That's impossible,

Starting point is 00:33:11 right? So can't lose anything and it must be fast. So those two requirements give us quite some budget in the networking space, right? So I can understand that it's also not something that a lot of companies would use either, right? Because it is quite a niche problem to have. But I mean, even if you have, you know, I don't know, if you're in multi-regional setup and you have any kind of external connectivity, then this is where it starts to really make sense. Yeah, it's definitely something that is clearly solving problems for folks. I have to confess, when they first told me about CloudWan, I was skeptical because I was trying to map it to the problem of the week that I was tackling at the time. And like, this is useless. I could barely use this as a database. What's going on? Not for lack of trying, but it was a, okay, all I have to do whenever I think

Starting point is 00:34:00 I've gotten a lock on something or written something off is talk to a customer who's using it. And I learn an awful lot, not always for the better in some cases. And there's occasionally times where I cannot find a single customer for the life of me. And that does inform some educated guesses as far as just how many people are using this thing. But I'm glad to see that you folks are out there. Did you get to catch up with any other Cloudland customers or is it possible you're the only one these days? So on that, more customers are onboarding.

Starting point is 00:34:29 They don't like to call us the biggest user anymore. Maybe it feels wrong for them. Oh, they love doing that as part of a sales process too. Like, do you have any idea how many biggest S3 customers I've encountered just this past year alone? Exactly. We're the biggest S3. Sure you are.

Starting point is 00:34:43 Well, the only difference, at least how I feel, is that at least we are talking directly to the service team to actually also give them ideas and feature requests. Because one of the biggest problems that we have, and I can mention this,

Starting point is 00:34:57 is the fact that when you build out your network, you have these core network edges. These are those 500 euro a month devices. They're basically transit gateways, but they're called CNEs. These things cost you 500 a month. When you add more than one, so you have two,

Starting point is 00:35:13 they connect to each other. If you have four, then each one connects to each other. So you have a full mesh. The problem then starts to go with routing because what we try to do, we try to be smart. We're like, okay, let's do a centralized egress, right?

Starting point is 00:35:28 Because now we have all these accounts and normally every account had NAT gateways, internet gateways, and that was costing us money, right? So we're like, okay, now we have CloudWare. We can centralize this. So we have like 50 accounts, we have a central egress, and you're going to go through that.

Starting point is 00:35:41 Perfect. So now we only have one NAT gateway, one internet gateway. We put a network firewall in there. All that nice stuff. Perfect. Better centralization, better story around it, better cost economics, better cost efficiency. All of that is true. It's great. Except, so say, for example, we have an egress in EU-West 1, but we have a workload in EU-Central 1. We also have one in EU- one and EU East one. Sorry, US East one, US West one. And we have an egress in US East one. The problem is that in the other regions that do not have an egress,

Starting point is 00:36:14 you have no way to direct the traffic to a preferred egress, right? So we had the funny situation that in EU Central, it was going out the US egress. And US might go to EU, or it might go to the US. And it differed segment to segment. So we had like devs going this way in this segment, production going that way. And we're like, what the hell? And that can have consequences for a number of things. Sure. I mean, we have customers that like to whitelist IP addresses. We tell them not to, but they'll do it anyway. And then suddenly things break. If you can get big companies to stop doing that game,

Starting point is 00:36:49 oh my God, that's the biggest thing that's going to make it impossible for some companies to get off of their allocated IPv4 stuff because it takes an act of God to get companies to update firewalls. We now have a slash 64 of IPv6. Please enter that into your firewall. IP by IP. Thank you. Off you go. Yeah, by hand. Please enter that into your firewall. IP by IP. Thank you. Off you go. Yeah, by hand. It's the worst internship ever.

Starting point is 00:37:08 I swear we have customers that I believe truly do that. But OK, I didn't say that. Of course not. I really want to thank you for taking the time to talk to me about this. If people want to learn more

Starting point is 00:37:18 about how you see these things, where's the best place for them to find you? At the moment, find me on LinkedIn. I believe I sent it across to you and I am working on a blog. So I will send that out. Well, I will put it on my LinkedIn profile very soon. I have four posts so far and I'm going to keep working on it because yeah, I think that this is an interesting story and yeah, that's the easiest way to find me. And otherwise, I'll be at the AWS

Starting point is 00:37:45 Summit in Amsterdam in a couple of months and probably reinvent again this year. So I'll be around. I look forward to seeing you at least one of those things. Thanks so much for taking the time to speak with me. I appreciate it. Thank you very much for having me. I really enjoyed it. Chris Gillespie, Principal Platform Engineer at Silverflow. I'm cloud economist Corey Quinn, and this is Screaming in the Cloud. If you enjoyed this podcast, please leave a five-star review on your podcast platform of choice.

Starting point is 00:38:11 Whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment that will fail to save properly because that podcast platform just implemented IPv6 this morning badly.

Screaming in the Cloud - A Conversation on Cloud WAN with Kris Gillespie

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.