Screaming in the Cloud - The Magic of Tailscale with Avery Pennarun

Starting point is 00:00:00 Hello and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production.

Starting point is 00:00:35 I'm going to just guess that it's awful because it's always awful. No one loves their deployment process. What if launching new features didn't require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren't what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you and watch for the wince. This episode is sponsored by our friends at Revelo. Revelo is the Spanish word of the day, and it's spelled R-E-V-E-L-O. It means I reveal. Now, have you tried to hire an engineer lately? I assure you it is significantly harder than it sounds. One of the

Starting point is 00:01:21 things that Revelo has recognized is something I've been talking about for a while, specifically that while talent is evenly distributed, opportunity is absolutely not. They're exposing a new talent pool to basically those of us without a presence in Latin America via their platform. It's the largest tech talent marketplace in Latin America with over a million engineers in their network, which includes but isn't limited to talent in Mexico, Costa Rica, Brazil, and Argentina. Now, not only do they wind up spreading all of their talent on English ability as well as, you know, their engineering skills, but they go significantly beyond that. Some of the folks on their platform are hands down the most talented engineers that I've ever spoken to. Let's also not forget that Latin America has high time zone overlap with what we have here in the United States. So you can hire full-time remote engineers

Starting point is 00:02:16 who share most of the workday as your team. It's an end-to-end talent service. So you can find and hire engineers in Central and South America without having to worry about, frankly, the colossal pain of cross-border payroll and benefits and compliance, because Revelo handles all of it. If you're hiring engineers, check out revelo.io slash screaming to get 20% off your first three months. That's R-E-V-E-L-O dot I-O slash screaming. Welcome to Screaming in the Cloud. I'm Corey Quinn. Generally, at the start of these shows, I mention something about money. When I have a promoted guest, which means that they are sponsoring this episode, I talk about that. This is not that moment. There's no money changing

Starting point is 00:03:02 hands here. And in fact, I'm about to talk about a product that I am a huge fan of, but I'm also, as of this recording, not paying for. So one might think I'm the product, but no. Let's actually start by talking about money. My guest today is Avery Penneran, the CEO of Tailscale. And as of today, being the day that this goes out, you folks have just raised $100 million in a Series B. First, thank you for joining me, followed immediately by congratulations. It's great to be here, and thank you. It's an exciting announcement that I hope we don't end up spending too much time talking about because money is a lot more boring than technology.

Starting point is 00:03:43 But yeah, we are very happy both to be here and to be making the announcement. Yeah, CRV and Insight Partners are the lead investors on the round. And it's great to see because I've been using Tailscale for a while now. And it is a transformative experience for the way that I think about these things. A while back, I wrote a Lambda layer that lets Lambda functions take advantage of it. But in fairness, I did write it. So anyone looking at that should, that's why you're not a developer full time. You're bad at it. Yes, I am. But I just, I can't stop raving about how useful Tailscale is with the counterpoint that it's also very difficult to explain to people who are not, at least in my experience, broken in a very

Starting point is 00:04:25 particular way, as I am. What is Tailscale? And what does it do? Right? Well, I mean, first of all, it's one of the things I really like about Tailscale and what we built is that, you know, even if you're not a super great developer, like you just described yourself, you can get excited about it, you can use it for things, you can build on top of it and you contribute back without having to understand every single little detail of what it does, right? Tailscale is something that a lot of people get excited about without having to know how it works. They just know what it gives them, right? The answer to what tailscale is, is sort of, it can be hard to explain to people who don't know about the kinds of problems that it solves. But the super short answer is it connects all of your devices and virtual machines and containers

Starting point is 00:05:11 to each other wherever they are without going through an intermediary, right? So it minimizes latency and it maximizes throughput and it minimizes pain. And it sounds like that should be hard, but you can get it all done in like five minutes. I have been using it for a while now. Originally, I was using it and federating through it, I believe, via Google. I rebuilt and tore down the entire network in about five minutes, instead started federating through GitHub. Nowadays, you have apparently changed your position on that identity and you use third

Starting point is 00:05:41 party SSO sources as well as retaining user information and login stuff yourselves, which is just, it's almost a star for choice on some level. But I am such a fan of the product that if you'll forgive me, if I talk for about a minute or so on how I use it and my experience of it. Go for it. So I wind up firing up Tailscale, and I have a network that from any of my devices, I can talk to any other. I have a couple of EC2 machines hanging out in AWS. I have a Raspberry Pi that I use as a DNS server sitting in the other room. I have my iPad. I have my iPhone. I have my laptop. I have my desktop. I have a VM sitting over in Google Cloud. I have a different VM sitting over in Oracle Cloud. And all of these things can talk to each other directly over a secured network. I can override DNS and talk to these things just by the machine name. I can talk to them via the address that winds up being passed out to them through this. It is transformative.

Starting point is 00:06:42 It works on IPv4, IPv6. If I'm on a network without IPv6 access using Tailscale, suddenly I can. I can emerge from almost any other node on this network. And adding a new device to this is effectively opening a link in a browser on either that device or a different one, clicking approve once I log in, and it's done. That is my experience of it so far. Is that directionally correct as far as how you think about the product? Because again, I use DNS text records as a database, for God's sake. I am probably not the world's foremost technical authority on the proper use of things. Right. Yeah. I mean, that's a good description

Starting point is 00:07:21 of what it does. I think it actually, it's weird, right? It's hard to get across in words just how simple it is, right? That one minute description used a bunch of technical sounding terminology that probably the listeners to your podcast will understand. But like the average tech person doesn't need to know any of those things in order to use Tailscale, right? You download it from the app store on your phone and your laptop, and you install Tailscale on both from the app store. You log into your Google account or your GitHub account, and that's it. Those two devices are tied together in time and space. They can see each other. You can access a web server that you're running on your laptop from your phone

Starting point is 00:07:57 without doing anything else, right? And then you can start a VM in AWS, and you load Tailscale in there, and now that's part of your network. And so you don't need to know what IPv4 and IPv6 even are. You don't need to know what DNS even is. It just, you know, the magic sort of comes together. We do a ton of stuff behind the scenes to make that magic work. But it's this one thing

Starting point is 00:08:19 that one customer said to us one time is like, it makes the internet work the way you thought the internet worked until you learned how the internet worked, if that makes sense. Right. It's basically, it works on duct tape and toothpicks all spit together. And it's amazing that it works at all. I mean, this is going to sound relatively banal, but the way that I've used Tailscale the most is on my phone or on my iPad or on my Mac, I will connect to the Tailscale network by default. And when that is done, it passes out my Pi holes IP address as the custom DNS server for the entire network.

Starting point is 00:08:52 So I don't see a whole bunch of ads, not just in browser, but in apps and the rest. And every once in a while when something is broken because an ad server is apparently critical to something, great. I turn off the VPN on that device, use the natural stuff. My experience of the internet gets worse as a result, and the thing starts working again. Then I turn it back on. It is more or less the thing that I use as a very strange looking ad blocker in some respects that I can toggle on and off with the click of a button. But it's magic. It is effectively magic. From the

Starting point is 00:09:25 device side, it's open up an app and toggle a switch, or it is grab from the menu bar on a Mac. There's an application that runs and just click the connect button or the disconnect button. There is no MFA every time you connect. There is no type in a username and password. There is no lengthy handshake. I hit connect and it is connected by the time I have moved the mouse back from the menu bar to the application I was working in. Whenever I show this to someone who uses a corporate VPN, they don't believe me. Right. Yeah, exactly. It's hard to believe. It's like, hey, did anything actually happen here? Because we removed, you know, for example, it doesn't by default catch all your traffic. It

Starting point is 00:10:02 only catches the traffic to your private network. So it's safe to leave it on all the time because it's not interfering with what you're doing. And you're describing using PyHole, which is a Raspberry Pi-based DNS server that is an ad blocker. Most people using PyHole have one at home. So when they're at home, they get ads blocked. But when they leave home, they don't get their ads blocked. If you add Tailscale to that, you can use your PyHole even when you're not at home, and it sort of makes it that much more useful. But I think an important difference from, say, other services that you can use as an ad blocker or a privacy VPN is that we never see your traffic, right? Tailscale creates a private network between

Starting point is 00:10:39 you and all your personal devices, and that private network is private even from us, right? We help you connect the devices to each other. But when your traffic goes to Pihole, it's your Pihole. It's not our ad blocker, it's your ad blocker, right? So we never see what traffic you're going to. We never see what DNS names you're looking up because that was just never made available to us, right? Right. What you do, the level of visibility you have into my network is fascinating in a variety of different ways, but it is also equally fascinating. One of those ways is how limited it is. You know what devices I have, the last time they've connected, the version of Tailscale they're running, an IP address on it. And you also wind up seeing what services are

Starting point is 00:11:23 advertised and available on those networks if I decide to enable that, which is great for things like development. I'm going to be doing development in a local dev sense on an EC2 instance somewhere. And I don't want to set up a tunnel with SSH to wind up having to proxy traffic over there just so I can wind up hitting some high port that I bound to. And I certainly don't want to expose that to the general internet. That is a worst practice for all these things. And Tailscale magically makes this go away. I haven't done this in much depth yet with a variety of my team members. But when you start working on this with

Starting point is 00:12:00 teams who are doing development work, someone can have something running on their laptop and just seamlessly share it with their colleagues. It's transformative, especially in an area where very often that colleague is not sitting in the same room, getting the greasy fingerprints on your laptop screen. Yeah, exactly. So you mentioned the services list, which you have to specifically opt into. And the reason we did that is that, you know, the list of devices and host names and IP addresses, we have to collect because that's how the service works, right? You send us the information about your devices, and then we send the public keys for those devices to the other devices. We can't get out of collecting that. Whereas the services list is purely an interesting

Starting point is 00:12:38 add-on feature. And we decided that we didn't want to collect that by default because it would make people nervous about their privacy. So if you want that feature, you click it on. If you don't want it, you don't turn it on. You can still share services with people inside your network. They just need to know that those services exist. You send them the URL or whatever, and it'll work. But it doesn't show up as a list of things that we can see in that case. But yeah, sharing stuff between your coworkers is definitely a major use case for Tailscale and dev and infrastructure teams in particular.

Starting point is 00:13:07 Like you can, designers, for example, run a test version of the website on their laptop. And then they say, hey, visit this URL on my laptop. And you don't have to be in the same office. You can both be sitting in different cafes in different cities. Tailscale will make it so that the connection between those two computers still works, even if they're both behind firewalls, even if they're both behind different NATs and so on. One of the things that astounded me the most,

Starting point is 00:13:30 because I am reluctant to completely trust things that are new, that touch the network. Early on in my career, I made network engineering mistake one-on-one, which is making a change to the firewall in your data center without having another way in. And the drive across town or calling remote hands to get them to let you back in when you lock things out, because you folks are building these things on a pretty consistent clip. There are a lot of updates and releases across all of the platforms. And

Starting point is 00:14:04 invariably, I find myself on some devices a version behind or so just because of the pace of innovation. Oh, great. We're updating the VPN client. Cool. So I'm going to expect this thing to drop and I'm going to have to go in and jigger it to get it working again. That has never happened. I have finally given in to, I guess, the iron test of this. And I have closed SSH from the internet to most of these nodes. In fact, some of them sit like the pie hole sitting at home. If you're not on my home network, there is no outside way in without breaking in. It is absolutely one of those things that disappears into the background in a way that I was extraordinarily surprised to find.

Starting point is 00:14:42 Right. Well, that is something, I mean, I'm old and grumpy, I guess, is sort of the beginning part of all this, right? I've seen all this annoying stuff that happens with software. And many of us, in fact, at Tailscale are old and grumpy. And we just didn't want to repeat those same things. So first of all, network stuff, to an even stronger degree than virtually any other kind of product, if your network stops working, everything stops working, right? So it's number one priority that Tailscale has to not mess up your network,

Starting point is 00:15:12 because if it does, you instantly lose faith. There's kind of like, Tailscale gives you this magical feeling when you first install it, but that feeling of magic goes away very quickly the first time it screws something up, and you can't connect when you really need to. So we put a huge amount of work into up and you can't connect when you really need to. So we put a huge amount of work into making sure that you can connect when you really need to. We have a lot of automated tests. One of our policies that I think is almost unheard of is that we intend to never deprecate support for older versions of the Tailscale client. And to this day, we're about three years into Tailscale.

Starting point is 00:15:42 We've never deprecated an old client that anybody is using. So eventually people, they're in fact hard to believe, but eventually people do stop using some old versions. So those ones don't work anymore necessarily. But any version of Tailscale that is in use today is going to keep working as long as anybody is using it. We have a very, very, very strong backwards compatibility policy because the worst thing that I can imagine is having some Raspberry Pi sitting out Because the worst thing that I can imagine is having some Raspberry Pi sitting out in the void somewhere that I haven't looked at for two years, that, whoops, Tailscale broke it,

Starting point is 00:16:11 and now I can't connect to it, and now I have to go drive down there and fix it, right? It would be just insultingly terrible for that to happen. And we just make sure that doesn't happen. Another thing that people get excited about is like on a Debian system or whatever, if you've got the Debian package installed, you can do an app get upgrade, Tailscale upgrades, and even your SSH

Starting point is 00:16:29 session doesn't drop. Every now and then people comment. That was the weirdest part. I was expecting it to go away or hang for a long period of time. And sure, I guess it might drop a packet or so. I've never bothered to look because it is so seamless. Right. Yeah, exactly. It's just like, wait, did anything even happen? And it's like, yes. Right. My next thing is that's from underneath you. Yeah. I grep tailscale on the process table. Like, okay, is this just a stale thing that's existing? I'm going to bounce it. No, it is just been started. It was so seamless under the hood that it was amazing. There is something that is, that is a lot of things have been very deeply right on this. Something else that I think is worth pointing out

Starting point is 00:17:05 is that if any company had the brainpower there to roll their own crypto, it would be you folks. But you don't. You're riding on top of WireGuard, an open source project that does full mesh VPNs with terrible user interfaces. Yep. So, you know, I guess disclosure back in 1997 when I started my first startup, I was not smart enough to not roll my own crypto. And therefore the VPN I wrote at the time definitely had giant security holes.

Starting point is 00:17:35 It was also not that popular, so nobody found them. But I, you know, eventually... Except the bank, which I really shouldn't disclose. Kidding, I'm kidding. No, no, no. The bank never used this software.

Starting point is 00:17:44 But yeah, nowadays, I've been through a lot. And I would not describe myself as a security expert, although people often describe me as a security expert. I don't know what that means. But I am enough of an expert to know that I should not be rolling my own crypto. And the people who invented WireGuard, it's one of the... I feel like I'm overstating things, but I'm not. It's one of the biggest leaps forward in cryptography in probably the history of computing.

Starting point is 00:18:12 Now, it builds on a series of things that are part of the same leap forward, right? It's built on the protocol that Signal uses called the Noise Protocol, right? Signal and noise are built on the ED25519 curve popularized by Dan Bernstein, who's a major cryptographer in this area. Sometimes popular, sometimes not popular. Yeah, exactly. He also, near and dear to my heart, wrote DJBDNS, which was a well-known, widely deployed DNS server, by which I, of

Starting point is 00:18:46 course, mean database. Please continue. Yep. I've been a huge fan of basically everything DJB has ever made in the history of... Oh, you're a Qmail person. I am on the Postfix side of what to find. Well, my first startup back in 1997, we made Linux-based server appliances for small businesses. And we used Qmail, we used DJB DNS. We used a couple of other DJB products.

Starting point is 00:19:06 And for the history of that product, leaving aside my VPN, that was a security hole, the DJB stuff never had a single problem. That company was eventually acquired by IBM. One of the first things IBM did is like, whoa, DJB has a super weird software license. We can't be doing this. Let's replace it with software that has a decent license. So they dropped out DJB and DNS and started using Bind. Within a week, there was a security hole in Bind that affected all of these appliances that they now controlled, right? So DJB is a very big brained, super genius in security, whatever you might think of his personality. And it sort of like was the basis for this revolution in cryptography that WireGuard has sort of brought to the networking world.

Starting point is 00:19:52 And it's hard to overstate just like the number of lines of code. There's something like a hundred times less code to implement WireGuard than to implement IPsec. Like that is very hard to believe, but it is actually the case. And that made it something really powerful to build on top of. Like it's super hard for somebody like me to screw up the security of a WireGuard deployment, where it's very easy to screw up the security of an IPSec deployment. This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of hello world demos? Allow me to introduce you to Oracle's Always Free tier.

Starting point is 00:20:29 It provides over 20 free services in infrastructure, networking, databases, observability, management, and security. And let me be clear here, it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small scale applications or do proof of concept testing without spending a dime. You know that I always like to put asterisk next to the word free. This is actually free, no asterisk. Start now.

Starting point is 00:21:16 Visit snark.cloud slash oci-free. That's snark.cloud slash oci-free. I just want to call something out as well, that when I say that you folks definitely have the intellectual firepower to roll your own crypto, should you choose to do so, but you chose not to. If anything, I'm understating it. To be clear, one of the blog posts you had somewhat recently out was how you are maintaining what is effectively your own fork of the Go programming language, which is one of those things when someone hears that, it's like, I'm sorry, could you say that again? Because I am almost certain I misunderstood something. What is the high level version of that?

Starting point is 00:21:56 Well, I think two important points there. One of them is that, yes, we did fork the Go programming language. It's supposed to be a temporary fork because it allows us to do some experiments with the Go backend. And the primary reason we were able to do that is because we employ a couple of people who used to be on the core Go team. And that was not because we went out looking for people who used to be on the core Go team.

Starting point is 00:22:17 That's just how it worked out. But because we do, it's easier for them to fork Go than it would be for the average person. And in many ways, it's easier for them to get their job it would be for the average person. And in many ways, it's easier for them to get their job done by just continuing to work on the code base they've already worked on. But the second point is actually, as compilers go, the Go compiler is probably the very easiest one I've ever seen to be able to fork and edit. It's super clear code. You're just editing Go code, which is already pretty easy. But they really put a ton of work

Starting point is 00:22:44 into making it readable and understandable. So like average people actually can fork the Go compiler and not be completely bamboozled by how difficult everything is, right? It's compared to like GCC, where just building the thing is something that takes you weeks to learn how to do, right? Go is just like... Yeah, let me clear this quarter of my schedule so I can go ahead and do that. No, thank you. Yeah, I've built copies of GCC and it's absolutely nightmarish. Right. And then built people's forks of GCC for special embedded processors and stuff. And this is like, this is a career that you can specialize in building GCC. Right. There are people that do this. Right. And the Go compiler is really, yeah. But the Go compiler, it's like, it's,

Starting point is 00:23:22 it's very nice. It's just a program that's written in Go that compiles under Go. And then you end up with one binary, right? And as long as you have that binary, everything just works, right? And so it's actually surprisingly easy to fork Go. I don't want to, you know, I wouldn't put that on the same level of difficulty as like not screwing up cryptography if you're trying to do it yourself. No, it's a Schneer's law. Anyone can roll their own crypto algorithm that they themselves can't defeat. Yeah, it turns out that basically

Starting point is 00:23:48 breaking crypto is a team sport. Who knew? Yeah, exactly. Generally with security, you have this problem a lot, right? It's a lot harder to build a system that nobody can break into than it is to break into a random system, right? Because the job of securing something against everybody is much harder than the job of finding something you can break into. So I did have a question about something you said earlier, where one of the use cases, one of the design goals is not to have a breaking change to a point where an old device cannot still connect to the private network.

Starting point is 00:24:19 But you do have a key expiry for devices where a device needs to re-log in. And it can be anywhere between 3 and 180 as I look at it. I don't know if some of the more enterprise-y options have longer options that they can set. But how do you not have to drive out to the back of Beyond to re-authenticate that Raspberry Pi every six months? So this is something, it's at the policy layer, and we have not finished refining this to perfection, I would say, right now. What we do have, though, if your key does expire, there's a button in the admin panel to say, like, boost this device for a little bit longer, sort of unexpire it for another

Starting point is 00:25:00 30 minutes. I don't remember how much time it is. Then you can SSH into the device and do a proper key refresh on it without actually having to drive out there. Now, we did, for one version, accidentally break the key reactivation feature so that if the client noticed its key has expired, it actually disconnected from the tailscale network altogether and then didn't receive the message to like, hey, could you please increase the length of your key? That was fixable by power cycling it, which you could often get somebody to do without driving all the way out there. But we fixed that. Yeah, you tried turning it off and back on again is still a surprisingly effective way of troubleshooting something. Yeah, exactly.

Starting point is 00:25:37 So that wasn't, I mean, it was kind of annoying for some people. But yeah, the reason we use, well, by default, every key always expires is because unlimited time credentials are one of the worst security holes that people don't really acknowledge. Because technically, it'll never be the like, you know, it'll never show up as a high severity security hole that you have an unlimited time credential sitting in your home directory. But it is something that, well, I can tell a story. There is a company that I heard about that had, you know, SSH keys are typically unlimited time credentials. The easiest way to do it is you run SSH key gen,

Starting point is 00:26:13 it puts something in your home directory, you copy the public key to all the devices you want to be able to log into. And then you never think about it again. So this is a company that of course, every developer in their company had done this. They had a production network with a bunch of SSH keys in it. Some not very ethical employee worked there, had keys in their production systems, and eventually got fired. Now, of course, this company had good processes in place.

Starting point is 00:26:36 They went through all the devices and took out this person's public key from all the devices. What they didn't know is that during lunch one day, this person had gone around to all their co-workers' workstations that hadn't been locked, downloaded the private keys for those people on his computer before he got fired. And so shortly after he got fired, their entire production network got wiped out. Now, they didn't have enough forensics at the time to know how it all got wiped out. So they spent some time putting it all back in place, this time with forensic. About a month later, they rebuilt everything from scratch, all new public keys and everything. You couldn't possibly have any backdoors in this system, right? And then a month later, it all got wiped out again. This time, the forensic revealed that it was one of

Starting point is 00:27:15 the existing employees coming from a different country that had gotten into their private production network and wiped everything out. How did that happen? It was because this person had years earlier downloaded all their private keys when he wandered around through the office. You can fix this problem instantly by just expiring your keys and forcing a rotation periodically, right? SSH doesn't make that very easy. You can with SSH set up SSH certificate authentication, which is a huge ordeal to get configured. But once it's working, it solves this particular problem, right? Tailscale- On Mac and iOS,

Starting point is 00:27:46 there is a slight improvement to this that I'm a big fan of because I agree with you. I am lousy at rotating my keys, but there's an open source project called Secretive that I use on the Mac that stores the private key in the secure enclave,

Starting point is 00:27:59 which the Mac will not let out of it. And I have to use Touch ID to authenticate every time I want to connect to something, which can get annoying from time to time. But there is no way for someone to copy that off. Historically, I would have a passphrase that was also tied to the key. So if someone grabbed it off the disk, it still theoretically would not be usable.

Starting point is 00:28:17 And that was, but again, that is an absolute vector that needs to be addressed and thought about. You have to go through this effort to sort it all out, right? So Tailscale, we just have this policy. We don't do unlimited length credentials. We do key rotation for everything. And we just sort of set different time limits for this rotation, depending on how picky you want to be about it. But any key expiry is usually much, much better than no key expiry. Even if you set it to a six-month key expiry, you still have, at least it's only this six-month window that somebody could theoretically reuse your keys.

Starting point is 00:28:49 And we can also rotate keys behind the scenes and so on. So in the SSH case, the way people use tail scale, you stop opening the SSH port to the world. You're only SSH when you're connected over tail scale. The fact that your tail scale keys rotate and expire over time is what protects your SSH session. So you could keep using static SSH keys that never expire. Don't try to figure out all this other complicated stuff, right? And you're still protected from these private SSH key, like unlimited length keys. Now that said, for servers, tail scale does have

Starting point is 00:29:22 a button where you can say like, please stop expiring the key. This is a server. Nobody's ever going to get physical access to the machine. The only thing we could do with the private key for this machine is allow other people to SSH into it, which is not very dangerous. It's pretty much like somebody stealing your SSH authorized keys file. It doesn't really matter. And for that case, you turn off the expiry altogether. But expiring keys is intended for use by devices that employees are actually holding in their hands,

Starting point is 00:29:48 where if it expires, it's no big deal. You push the login button and it refreshes. There's something that is very nice about dealing with something that is just so sensible. We've all, at least in the olden days of running sysadmin stuff, we had this problem where we would generate or purchase, back in those days, SSLs, certificates. And great, they expire to a year or so. And at the end of the year, people would forget. And then it would expire, you'd run around fixing this.

Starting point is 00:30:13 And the default knee-jerk response was, that was awful. Let's get the next one for five years so we didn't have to think about it that long. And it's always a wild card. So it gets put all over the place and you wind up with these problems. One of the things that Let's Encrypt has done super well is forcing a rotation every 90 days.

Starting point is 00:30:29 So you know where it is. It's just often enough you want to automate it. And ACM, the AWS certificate manager that they use, takes a slightly different approach. It doesn't give you the private key. It embeds it in other places so they can handle the rotation themselves. And they start screaming in your email if they can't verify that it's time for renewal long before it hits. It's different approaches to the problem. But yeah, five years out, how should I know all the places the certificate has wound up in that intervening time? Most of

Starting point is 00:30:57 the people who did it aren't there anymore. And one day, surprise, a website breaks, either because its SSL cert isn't working or one of the backend services it depends on suddenly doesn't have that working. It's become a mess. So having a forced modernity to these things is important. Right. It's forced modernity. And it's just basically, it's all behind the scenes. Like you don't even think about the fact that Tailscale gave you a key because that is not

Starting point is 00:31:21 relevant to your day-to-day life, right? You logged in, something happened, all these devices ended up on your network. What actually happened is that public and private keys, you know, a private key was generated, the public keys were distributed properly, things are getting rotated, but you don't have to care about all that stuff. So it's fun that Tailscale is what we call secure by default, right? People love to use it because it's easier, it makes their life easier, but security teams like it because actually it changes the default security posture from like, oh, I'm going to have to tell everybody to please stop doing these five things because

Starting point is 00:31:52 it always creates security holes to like, whoa, the thing that they're going to do most naturally is actually going to be safe, right? I really like that about it. You're not thinking about certificates, but the certificates are getting rotated exactly as they should be. There's just something so nice about computers doing the heavy lifting for us. It's one of the weird things about tailscales. It falls into a very strange spot where there is effectively zero maintenance burden on me, but I still use it just to toggle it on or off in scenarios often enough to remember that it's there and that I'm using it. It is the perfect sweet spot of being somewhat close to top of mind, but never in a sense that is, oh, I got to deal with this freaking thing

Starting point is 00:32:31 again. It never feels that way. Logging into it, it has long lived sessions in the browser. So it isn't one of those, ah, you have to go back to GitHub and reauthenticate and do all these other dog and pony show things. It just works. It is damn near a consumer level of ease of use, start to finish. The hard part, of course, is how on earth do you explain this to someone without a background in this space? Yeah, exactly. It's something we ask ourselves sometimes.

Starting point is 00:32:58 It's like, well, you know, Tailscale is great for developers right now. It is easy enough to use even for consumers, but how would you explain it to consumers and find enough to use even for consumers, but like, how would you explain it to consumers and find a good use case for consumers? And it's something that, you know, I think we are going to do eventually, but it hasn't been until up until now, a super high priority for us just because developers are the sort of like the core audience that we haven't even finished building a great product that does everything that they want yet. There is one little

Starting point is 00:33:24 feature in Tailscale that's the beginning of something that's consumer-friendly. It's called TailDrop. I don't know if you've seen this one. You can turn it on, and basically it acts like AirDrop in Apple products, except you don't need to care about physical proximity, and it works with every kind of device, not just Apple devices, right? So you can add it as it shows up in the share pane on your macOS or Windows or iOS device. You can use it from Linux. You just use it to send files of any type and it sends them point to point, not through a cloud provider so that we never see a copy of the file.

Starting point is 00:33:55 It only goes between your devices over your encrypted network. So that's something that... It was like Tailprint for Bonjour could wind up being another aspect of this as well. And I'm still hoping for something almost Ansible-like where run the following command, whether it's pre-approved or not, on a following subset of things. In my case, for example,

Starting point is 00:34:11 I would love it if it would just automatically, when I press the button, update Tailscale across all of the nodes that support it, name the Linux boxes. I don't think you can trigger an App Store update from within a sandbox app on iOS, but I've been surprised before. Yeah, but it's nice to be able to do some things.

Starting point is 00:34:26 We get that request a lot for like, can you push a button to auto update Tailscale? It makes me really sad that we get this request because the need for this is a sign that all of the OS vendors have completely botched software updates, right? Like the OS should be the thing updating your software on a good schedule based on our set of rules. And it shouldn't be the job of every single application to provide their own

Starting point is 00:34:49 software update. It's actually a massive, embarrassing security hole that software can even update itself, right? Because if it can update itself, then, you know, imagine someone breaks into the production services of a company that is offering a particular program. They'd put malware into a version of the software, they put it into the software update server, and then they trigger everything in the network to push this software update to those devices. Now you've got malware installed on all your devices, right? It's very strange that people ask for this as a feature.

Starting point is 00:35:21 Tezcale currently does not have that feature. It doesn't push software updates on its own. But it's such a popular feature that I think we're going to have to implement it because everybody wants this because Windows, for example, is simply just never going to automatically update your software for you. We have to have this weird super admin right on your machine so that we can push software updates because nobody else will. I feel really weird about that is, you know, the security world should be protesting this more. But instead they're asking, can you please put this feature in? Because I've got a checklist on my compliance thing that says, is all your software up to date? I don't have a checklist item that says,

Starting point is 00:35:53 does any of my software have super admin rights that they shouldn't have? It's sort of, I guess, the next level of supply chain management is the big word. Nobody, there is no supply chain management for software. There isn't. For better or worse, I wish there were, but there simply is not. Next year, maybe, we hope. Yeah. So you just have, you have to trust your vendors fundamentally, which, you know, I guess will always be true.

Starting point is 00:36:18 And it's true for Tailscale as well, right? Whether or not we include this software update pushing thing. If you're installing a VPN product provided by a vendor, you have to trust that we're going to put the right stuff into the software. And the best, the only thing I can really do is just be honest about these issues and say, well, look, we try our best. We definitely try not to implement features

Starting point is 00:36:37 that are going to turn into security holes for you. And I think we do a lot better than most vendors do in that area. But it's very hard to be perfect because nobody knows how to do software supply chain well. I hear you. That's a nice thing too. Honestly, the big reason I know I need to update these things and the reason I want to do it is actually you. Because whenever I log in and look at my devices in the Tailscale thing, there's a little icon next to one that there's an update available here. And you have fixed a lot of the niceties on this.

Starting point is 00:37:05 Like, ah, there's an update available for the iOS version. It's really because it's not available in the Apple Store yet. As I sit there spamming the thing that stopped happening. There's a lot of just very nice quality of life improvements that are easy to miss. Yeah. Yeah. It's kind of weird. We actually went a little overboard on the update available notifications for a while

Starting point is 00:37:24 because there's always this trade-off, right? Like I said, we have a policy of never breaking old versions. So when people see the update available notification, they kind of panic. It's like, oh, no, I better install the update before Tailscale cuts me off. It's like, well, we're not actually ever going to cut you off. So you shouldn't have to worry about that stuff. But on the other hand, you're not going to get the latest features and bug fixes unless you're running the latest version. So when people email us saying, hey, I'm using Tailscale from six months ago, and I have

Starting point is 00:37:50 this problem, the first thing our support team does is say, well, can you please try the latest one? And does the problem go away? Because it's kind of inefficient us debugging six-month-old software. So one way we were trying to minimize that cost is like, hey, we could just tell people there's a new version available and then maybe they'll update it themselves. But that resulted in people panicking.

Starting point is 00:38:08 Like, oh no, I need to install the software really, really soon because I can't afford to break my network. Right. And because our system is based on WireGuard, and this is, you know, I'll probably jinx it by saying this, but we've never had an actual security hole that we've had to issue a tailscale update to resolve. People see the update available thing and they're like, oh no, I bet there's a whole bunch of vulnerabilities that they fixed.

Starting point is 00:38:32 It's like, well, no, WireGuard has also never had a vulnerability. Sooner or later, there probably will be one. And when there is one, we'll probably have to make the update notification in red or something instead of just a little icon on the admin panel. Nice job on jinxing it, by the way. I appreciate that. Yeah, I know. I try my best, but I've actually been surprised. It's very much like my experience with all the DJB stuff we used in the past. When we were using QML and DJB DNS for years, there was never once a security hole, right? It's very interesting that it is possible to design software that never once has a security

Starting point is 00:39:09 hole and nobody does that, right? I mean, I would say I'm not as smart as EJB. Our software is probably not going to be as 100% perfect as that. But we try really, really hard to aim for that as a goal. I really want to thank you for taking the time to speak with me about everything Tailscale is up to. And again, congratulations on your Series B. If people want to learn more, where should they go? I guess, tailscale.com is the place. We also have at Tailscale on Twitter. My own personal Twitter is at Penoir, which you probably won't be able to spell unless you Google for me or something.

Starting point is 00:39:46 But it's in the show notes, which makes this even easier. There you go. So yeah, there's lots of information. But the number one thing I tell people is like, look, it is a lot easier to get started than you think it is. Even after you've heard it a hundred times, nobody ever believes how easy it is to get started. Just go to the app store, download the app,

Starting point is 00:40:01 log into your account and you're already done, right? Try that and you don't need to read anything. I would tear you apart for that statement if it were slightly less true than it is, but it is transformative. Give it a try. That's a strong endorsement from me. Thank you so much for your time. I appreciate it.

Starting point is 00:40:15 Thank you, too. Great talking to you and talk next time. Indeed. Avery Petteran, CEO of Tailscale. I'm cloud economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this show, please leave a five-star review in your podcast platform of choice and smash the like and subscribe buttons. Whereas if you've hated it, same thing, five-star review, smash the buttons, and also leave an angry, bitter comment about how you are smart enough to roll your own crypto so you don't understand why other people wouldn't do it. If your AWS bill keeps rising and your blood pressure is doing the same,

Starting point is 00:40:50 then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duck Bill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com to get started. This has been a humble pod production stay humble

Screaming in the Cloud - The Magic of Tailscale with Avery Pennarun

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.