Screaming in the Cloud - From A to Z in Alphabet’s Soup with Seth Vargo

Episode Date: March 10, 2022

About SethSeth Vargo is an engineer at Google. Previously he worked at HashiCorp, Chef Software, CustomInk, and some Pittsburgh-based startups. He is the author of Learning Chef and is passio...nate about reducing inequality in technology. When he is not writing, working on open source, teaching, or speaking at conferences, Seth advises non-profits.Links:Twitter: https://twitter.com/sethvargo

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. The company 0x4447 builds products to increase standardization and security in AWS organizations. And they do this with automated pipelines that use well-structured projects to create secure, easy-to-maintain, and fail-tolerant solutions.
Starting point is 00:00:44 One of those is their VPN product, built on top of the popular OpenVPN project, which has no license restrictions. You're only limited by the network card in the instance. To learn more, visit snark.cloud slash deploy and go. That's snark.cloud slash deploy and go. All one word. Couchbase Capella. Database as a service is flexible, full-featured, and fully managed, with built-in access via key value, SQL, and full-text search. Flexible JSON documents align to your applications and workloads.
Starting point is 00:01:25 Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com slash screaming in the cloud to try Capella today for free and be up and running in three minutes with no credit card required.
Starting point is 00:01:45 Couchbase Capella, make your data sing. Welcome to Screaming in the Cloud. I'm Corey Quinn. I have a return guest today, though it barely feels like it qualifies because Seth Fargo was guest number three on this podcast. I've had a couple of folks on since then, and for better or worse, I'm no longer quite as scared of the microphone
Starting point is 00:02:09 as I was back in those early days. Seth, thank you for joining me. Yeah, thank you so much for having me back, Corey. Really excited to figure out whatever we're talking about today. Well, let's start there, because the last time we spoke, you were, if memory serves,
Starting point is 00:02:23 a developer advocate at Google Cloud. Correct. And you've changed jobs, but not companies, but kind of companies because welcome to large environments. But over the past few years, you've remained at Google. You are no longer at Google Cloud and you're no longer a developer advocate. In fact, your title is simply engineer at Google. And what you've been focusing on, to my understanding, is helping Alphabet companies, namely, you know, the Alphabet, always in parentheses in journalistic style, Google's parent company, because no one thinks of it in terms of Alphabet, is effective. You're effectively helping companies within the conglomerate umbrella securely and privately consume public cloud. Yes, that is correct.
Starting point is 00:03:10 So I used to work in what we call the cloud PA. PA stands for product area. Other product areas are like Chrome and Android. the core PA where I'm helping lead and run an initiative that, like you said, is to help Alphabet companies to securely and privately use public cloud services. So I am going to go out on a limb because my position on multi-cloud has always been pick a cloud. I don't particularly care which one, but pick one and focus on that. I'm going to go out on a limb and presume that given that you are not at Google Cloud anymore, but you are at Google,
Starting point is 00:03:51 you probably have a slight preference as far as which public cloud these various companies within the umbrella should be consuming. Yeah. I mean, obviously I think most viewers will think the answer is GCP. And if you said GCP, you would be like 95% correct. Well, you'd also be slightly less than that correct, because they're doing a whole rebrand thing calling it Google Cloud in public as opposed to GCP. You really don't work for the same org anymore. You're not up to date on the very latest messaging talking points. I missed all, there's so many TLAs that you lose all your TLAs over time.
Starting point is 00:04:25 Oh, yes. So Google Cloud would be like 95% correct. But what you have to really understand is Google has its own, you know, cloud. We didn't call it a cloud at the time. You might call it on-prem or legacy infrastructure, if you will. Primarily built on a scheduling system called Borg, which is like Kubernetes version zero. And a lot of the Alphabet companies have workloads that run on Borg. So we're actually talking about hybrid cloud here, which, you know, you may not think of
Starting point is 00:04:56 Google as like a hybrid cloud customer, but a workload that runs on our production infrastructure called Borg that needs to interact with a workload that runs on Google Cloud, that is hybrid cloud. It's no different than a customer who has their own data center that needs peering to a public cloud provider, whether that's Google Cloud or AWS or Azure. I think the other thing is,
Starting point is 00:05:17 if you look at the regulatory space, particularly a lot of the Alphabet companies operate in say like healthcare or finance or fintech, where certain countries and certain jurisdictions have regulations around like, you must be multi-cloud. You know, that some people might say that means you have to run, you know, the same instance of the same app across clouds, or some people say your data can be here, but your workloads can be over there. That's to be interpreted. But, you know, I would say 95% is GCP,
Starting point is 00:05:44 but there is a, or sorry say 95% is Google Cloud, but there is a small percentage that is definitely going to be other cloud providers and hybrid cloud as well. My position on multi-cloud is often people like to throw it in my face of, see, you gave this general guidance, and therefore, whenever you say something that goes against it, you're a giant phony. And it's, yeah, Twitter doesn't do so well with the nuance. My position of pick a provider and go all in is intended as general guidance for the common case. There are exceptions to this, and any individual company or customer is going to have more context than that general guidance will. So if you say you need to be in multiple clouds for certain reasons, you're probably correct. If you say you need to be in multiple clouds because your regulator
Starting point is 00:06:28 demands it, you are certainly correct. I am not arguing against that in any way. I do want to disclaim one of my biases here as well. And that is specifically that if I were building a startup today and I were not me, by which I mean having spent 10 years in the AWS ecosystem learning not just how it works, but how it breaks, because that's important in production. And, you know, also having a bunch of service owners at AWS on speed dial. And I were approaching this from the naive, I need to pick a cloud, which one would I go with? My bias is for Google Cloud. And the reason behind that is the developer experience is spectacular as the primary, but not only perspective on that. So I'm curious to know that as you're helping what are effectively internal customers move to
Starting point is 00:07:20 Google Cloud, is their interaction with Google Cloud as a platform the same as it would be if I, as a random outside customer, were using Google Cloud? Is there a bunch of internal back channels? Oh, you get the good kind of internal Google Cloud that most of us don't get access to or something else? Yeah, that's a great question. So first, thank you for the kind words on the developer experience. They were honest words, to be clear.
Starting point is 00:07:47 Let me be very direct with you. If I thought your developer experience was trash, I might not say it outright in the effort not to be actively antagonistic to someone I'm having on the show right now, but I would not say it if I didn't believe it. Yeah. And I totally, I know you, I've known you for many years. I totally believe you. But I do thank you for saying that because that was the team that I was on before this was largely responsible for that across the platform. But back to your original question around, like, what does the support experience look like? So it's a little bit of both.
Starting point is 00:08:15 So Alphabet companies, they get a technical account manager very similar to how, you know, a reasonable size spend customer would get a technical account manager. That account manager has access to the cloud support channels. So all of that looks the same.
Starting point is 00:08:29 I think where things look a little bit different is because myself and some of our other leads came from cloud. I generally don't like this phrase, but we know people. So we tend not to go directly to cloud when we can. We want Alphabet companies to really behave and act as if they were an external entity. But we're able to help the technical account manager navigate the support process a little bit better by saying, like, you need to ask for this person, right?
Starting point is 00:08:55 You need to say these words to get in front of the right person to get this ticket assigned to the right person. So the process is still the same, but we're able to leverage our pre-existing knowledge with cloud. The same way if you had a Zoogler, an ex-Googler who worked for your company, would be able to kind of help move that support process along a little bit faster. I am quite sincere when I say that this is a problem that goes far beyond simply Google. A disturbing portion of my job as a cloud economist helping my clients consists of nothing other than introducing Amazonians to one another. And these are hard problems at scale. I work at a company with a dozen people in it. It turns out that, yeah, it's pretty easy to navigate who's responsible for what.
Starting point is 00:09:38 When you have a hyperscale size company in the trillion dollar range, a lot of that breaks down super quickly. Yeah. And there's just a lot of churn at all levels of the organization. And, you know, we talked about this when we first joined the show, like I switched roles, right? I used to be in cloud and now I'm in what we call core. I still get people who are reaching out to me at Google and externally who are saying, oh, can you answer this question? Hey, how do I do this? And I've gradually over the past couple of months convinced people that I don't work on that anymore.
Starting point is 00:10:11 And I try to be helpful where I can. You're even using the old name and everything. They're eventually going to learn, right? I know. They'll learn. They'll be like, what do you call this? GCP? Okay, great.
Starting point is 00:10:18 We don't need you anymore. But it's true, right? People leave the organization. People join the organization. There's reorgs, there's strategic changes, people switch roles within the org. And all of that leads to complexity with navigating what is the size of a small nation in some cases.
Starting point is 00:10:37 When you're blind and your biography says that you enable alphabet companies to securely and privately consume public cloud. Now, that would make perfect sense. And I would really have no further questions based on what we've already said, except for the words securely and privately. And I want to dive into that first. Let's work backwards with the second one first. What does privately mean in this context?
Starting point is 00:10:59 So privately means like privacy preserving for both the alphabet company and the users or customers that they have. So when we look at that from the perspective of the Alphabet company, that means protecting their data from the eyes of the cloud provider. So that's things like customer managed encryption keys if at any point the cloud provider is accessing your data, even for a legitimate purpose, like submitting a support ticket or something or diagnosing a support ticket, that you have visibility into that. Then the privacy preserving side on the alphabet company's customers is about providing that same level of visibility to their customers, as well as making sure that any data that they're storing is private, it's not accessible to certain parties it's following, whether it's actual legislation around how
Starting point is 00:11:50 long data can be persisted, things like GDPR, or if it's just a general data retention, insider risk management, all of that comes into this idea of building a private system or privacy-preserving system. Let's be very clear that my position on it is that Google's relationship with privacy has been somewhat challenged, and due to no small part to the sheer scale of how large Google has grown. And let's be clear, I believe firmly that at certain points of scale, yeah, you deserve elevated levels of scrutiny. That is how we want society to function function by and large.
Starting point is 00:12:27 And there are times where it feels a little odd on the cloud side. For example, as of the time of this recording, somewhat recently, there was a bug in some of the copyright detection stuff where Google Drive would start flagging files as having copyright challenges if they contained just the character one in them, which, okay, clearly a bug, but it was a bit of a reminder for some folks, wait, that's right, Google does tend to scan these things. Well, when you have a bunch of end user customers in a bunch in the ways that Google does, that stuff is baked in
Starting point is 00:12:59 and it shapes how you wind up seeing things. From Amazon's perspective, historically, they basically sold books and then later underpants. And doing e-commerce transactions was basically the extent of their data work with customers. They weren't really running large-scale file sharing systems and abilities in collaboration suites, at least not that really had any of those pesky things called customers. So that is not built into their approach and their needs in the same way. To be clear, I am sympathetic to the problems, but it's also, it's a challenging problem, especially as you continue to evolve and move things into cloud, you absolutely must be able
Starting point is 00:13:37 to trust your cloud provider, or you should not be working on that cloud provider has been my approach. Yeah, I mean, there's certainly things that you can do to mitigate. But in general, there is some level of trust. Forget the data on the availability side, right? Like when the cloud provider says, this is our SLA, and you agree to that SLA, like, yeah, you get money back if they mess it up. But ultimately, you're trusting them to adhere to that SLA, right? And you get recompense if they fail to do so, but that's still like trust. Trust is far more than just on the privacy side, such as the continued investment that Alphabet slash Google is making in Google Cloud. It's easy to take the approach of, well, you've turned off a bunch of consumer services, so therefore you're going to turn off the cloud at some point too.
Starting point is 00:14:36 No, let me be very clear for the record. I do not believe that you are going to one day flip a switch and turn off Google Cloud and neither do Google Cloud. And neither do your customers. Instead, the approach, the way that enterprises express this, it's not about you flipping the switch and turning it off. That's what contracts are for. Their question, and they enshrine this in contracts in some cases, in the event, not that you turn it off, but that you fail to appropriately continue to invest in the platform. Because at enterprise scale, this is how things tend to die. It is not through flipping a switch in most cases,
Starting point is 00:15:12 it's through, we're just going to basically mothball it, keep it more or less exactly as it is until it slowly fades into irrelevance for a long period of time. And when you're providing the infrastructure to run things for serious institutions, that part isn't okay. And credit where due, I have seen every indication that Google means it when they say this is an area of strategic and continued ongoing focus for us as a company. Yeah, I mean, Google is heavily investing in cloud. I mean, this is a brand new group that I'm working in, and we're trying to get Alphabet companies in cloud. I mean, this is a brand new group that I'm working in and we're trying to get Alphabet companies onto cloud. So obviously there's some very high level top down
Starting point is 00:15:48 executive support for this. I will say that the 100% agree with everything you're saying. The traditional enterprise approach of build this Java app, because let's be honest, it's always Java. Build this Java app,
Starting point is 00:16:02 compile it into a jar and run it forever is becoming problematic. We saw this recently with like compile it into a jar and run it forever is becoming problematic. We saw this recently with like the log4j. Yeah, it should be in a container. What the hell? I'm kidding. I'm kidding. Please don't send me email, whatever you do. What's a container? I'm just kidding. The idea of like software rotting is very real. And it's becoming more and more of a risk to security, to privacy, to public cloud providers, to enterprises, where when you see something like Lock4J happen, and you can't answer the question,
Starting point is 00:16:33 like, do we have any code that uses that? If getting the answer to that question takes you six weeks, boy, a lot of stuff can happen in six weeks while that particular thing is exploited. And, you know, it kind of gets into software supply chain a little bit, but I do agree that like secure, private, and stable APIs are super important. And it's an area where Google's investing. At the same time, I think the enterprise industry is moving away a little bit from set it and forget it as a strategy. I want to talk about the security portion as well, as far as securely consuming public cloud goes. And let me start off with a disclaimer here, because I don't want people to misconstrue
Starting point is 00:17:17 what I'm about to say. If you are migrating to one of the big three cloud providers, their security will be better than anything you will be able to achieve as a company yourself. Not you personally, because Google is a bit of an asterisk to that statement, given what you have been doing and have been doing since the 90s in your on-prem world with Borg and the rest. But my philosophy on the relative positioning of the security of cloud providers relative to one another has changed. I spent four months beating the crap out of Azure for having an issue where there was control plane access and then really saying nothing about it. And the day after I put out a blog post on that topic, because I was tired of the lack of response, it came out that right at the same time, AWS had a very similar problem, and had not said anything themselves. And that went back and forth, apparently waiting to wind up doing a release until this happened, and worker security wound up putting one out there. And it was, it was frustrating on a couple of levels. First, the people at both of these companies who work in security are stars. There
Starting point is 00:18:25 is no argument, no bones about that. Problems are going to happen. Things are going to occur as a result. And the only saving grace then is the transparency and communication around it. And there was none of it from them. I'm also more than a little bit irked that my friends at AWS were aware of this, basically watched me drag Azure for four months knowing that they'd done the same thing and never bothered to say a word, but okay, that's a choice. I've been saying for a while that of the big three, Google's security posture is the most impressive.
Starting point is 00:18:56 And it used to be a slight difference. Like you did nosed ahead of AWS in that respect, not by a huge margin, but by a bit. I don't think it's nearly as close these days in my mind. And talking to other large companies about these things and people who are paid to worry about these things all day long, I am very far from alone in that perspective. So I guess my question for you is, as you look at moving the workloads securely to Google Cloud, it feels like security is baked into everything
Starting point is 00:19:31 that all aspects of your company have done. Why is that a specific area of focus? Or is that how it gets baked into everything you folks do? So you kind of like set up the answer for this perfectly. I swear we didn't talk about this extensively. You didn't know any of that was coming, by the way, just to be very clear here. I don't sit here and say, all right, I'm going to say this and here's the right response. No, this is an impromptu, more or less ad hoc show every time I do it.
Starting point is 00:19:56 Yeah. And I'm going to preface this by saying, like, I don't want this to sound like egotistical, but I have never found a company that has as rigorous security and privacy policies, reviews, and procedures as Google. I thought I had, and I was wrong. Yeah. And I have a lot of apologizing to people that do as a result of that. And honestly, every time I interact with our internal security engineering teams or our IP protection teams, I'm that Nathan Fillion meme where he's like, you know, like,
Starting point is 00:20:27 okay, I get it. I get it, right? And then face vomit. I should say so that I can't. Yeah. Oh, yeah. The reason that it's hard for Alphabet companies
Starting point is 00:20:38 to securely and privately move to cloud, specifically for security, is because Alphabet's stance is so much more rigorous than anyone else in the industry, to the point where in some cases, even our own cloud provider doesn't meet the bar for what we require for an internal workload. And that's really what it comes down to is like, the reason that Google is the most secure cloud
Starting point is 00:21:04 is because our bar is so high that sometimes we can't even meet it. I have to assume that the correct answer on this is that you then wind up talking to those product teams and figure out how to get them to a point where they can support that bar. Because the alternative is effectively, it's like, oh yeah, this is Google Cloud. And it's absolutely right for multinational banks to use. But not Google workloads, that stuff's important. And I don't think that that is necessarily how you folks tend to view these things. So it's a bi-directional street, right? So a lot of it is working with a product management team to figure out where we can add these additional security properties into the system.
Starting point is 00:21:46 I should say tri-directional. The second area is where the policy is so specific to Google that Google should actually build its own layer on top of it that adds the security because it's not generally applicable to even big, huge cloud customers. And then the third area is, Google's a very big company. Sometimes we didn't write stuff down. And sometimes we have policies where no one can really articulate
Starting point is 00:22:12 where that policy came from. And something that's new with this approach that we're taking now is like, we're actually trying to figure out where that policy came from and get the impetus of what it was trying to protect against and make sure that it's still applicable. And I don't know if you've ever worked with governments
Starting point is 00:22:29 or large companies, right? They have this spreadsheet of hundreds of thousands of words. You are basically describing my client list. Please continue. I mean, sometimes they have to use an access database because they exhaust the number of rows in an Excel spreadsheet. And it's just checklist upon checklist upon checklist. And that's not how Google does security. Security is a very all-encompassing kind of 360 type of thing.
Starting point is 00:22:51 But we do have policies that are difficult to articulate what they're actually protecting against. And we are constantly reevaluating those and seeing like, this made sense on Borg. Does it actually make sense on cloud? And in some cases, it may not. We get the same protections using, say, a GCP native service, and we can omit that requirement for this particular workload. This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of hello world demos, allow me to introduce you to Oracle's always free tier. It provides over 20 free services and infrastructure, networking, databases, observability,
Starting point is 00:23:32 management, and security. And let me be clear here, it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisk next to the word free. This is actually free. No asterisk. Start now. Visit snark.cloud slash oci-free. That's
Starting point is 00:24:17 snark.cloud slash oci-free. I think that when it comes to things like policies that are intelligently crafted around security, you folks, and to be fair, the AWS security engineers as well, have been doing it right in that, okay, we're going to build a security control to make sure that a thing can't happen. That's not enough. Then there's the defense in depth. Okay, let's say that control fails through some variety of ways. Here are the other things we're going to do to prevent cross-account access, for example. And that in turn winds up continuing to feed on itself and build into a culture of assuming that you can always continue to invest in security.
Starting point is 00:24:56 How far is enough? Well, for most folks, they haven't gone far enough yet. Another way to put this is like, how well do you want to sleep at night um you know there's there's folks on the google security engineering team who are so smart and they work on like our offensive security team so their full-time job is to try to hack google um and then figure out how to prevent that and you know so i've read some of the reports and some of the ways they think, and I'm like, how do you, how do you pick up a mobile phone and go to like any website confidently knowing what you know? Right. And like, how do you, who said anything about confidently? Yeah. How do you use
Starting point is 00:25:36 self-checkout at a supermarket and like, not just like where your, your entire full body tinfoil hat suit. Um, but you know, I think the bigger risk is not knowing what the risks are. And this is a lot what we're seeing in software supply chain too, is a lot of security is around threat modeling, not checklists. But we tend to like gravitate toward checklists
Starting point is 00:25:57 because they're concrete. But you really have to ask yourself, like, do I need the same security properties on my static blog website that is stored on an S3 bucket or a GCS bucket that's public to the internet that I do on my credit card processing service? And a lot of times we don't treat those differently. We don't apply a different threat model to them. And then everything has to have the same level of security.
Starting point is 00:26:18 And then everything is in scope for whatever it is you're trying to defend against. And that is a short path to madness. Yes. Yes. Yes. Your static HTML files and your GCS bucket are in scope for SOC 1 and 2 because you didn't have a way to say they weren't. Yeah. You've also done some, again, the nice thing about being at a company for a while, from
Starting point is 00:26:38 what I can tell, given that I've never done it until I started this place, is you move around and work on different projects. You were involved as well, personally, in notifications project, the joint collaboration thing between a number of companies in the somewhat early days of the pandemic that all of our phones talk to one another and anonymously and in a privacy preserving way, let us know that, hey, by the way, someone you were in close contact with has tested positive for COVID-19 in the previous fixed period of time. What'd you do over there? Yeah, so the Explored Notifications Project was a joint effort primarily between Apple and Google to use Android and iOS devices to help stop the spread of COVID or reduce the spread of COVID as much as possible.
Starting point is 00:27:20 The idea being, because the incubation period is roughly 14 days, at least pre-Omicron, if we could tell you, hey, you might have been exposed and get you to stay at home for three or four days, self-isolate, we could dramatically reduce the spread of COVID. And we know from some of the studies that have come out of the UK and European region, the technology actually reduced the spread of cases by like 1400% in some cases. I was one of the tech leads for the server side. So the way the system works is it uses low energy Bluetooth on iOS and Android devices to basically broadcast random IDs. So I know this is screaming into the cloud, but if we can just quickly screaming into
Starting point is 00:28:02 the void as a rebrand, That's basically what's happening. You're generating these random identifiers and just like yelling them. And there's other phones out there who are listening. And they collect these what we call RPIs or rolling indicators. They have no data in them. They're like literally like a UUID or 32 bytes of random data.
Starting point is 00:28:23 They aren't at all like associated with your device or your person. So then what happens is like, let's say you're in a supermarket, you're near someone for, you know, every so often your phones exchange these IDs. If you then test positive, those IDs go up to a centralized server. The server, again, also has no idea who you are. So the whole thing is privacy preserving end to end. Then the server basically bundles all of what we call the TEKs or the temporary exposure keys into a tarball that go up onto a CDN.
Starting point is 00:28:51 And then every night, all of the devices that are participating in EN download this into a local key match. So at no point does the server ever know that you were in a supermarket with someone else. Only your phone knows that you came in contact with this TEK in the past 14 days or 21 days in some jurisdictions.
Starting point is 00:29:09 And it'll generate an exposure notification or an exposure alert, which says like, hey, in the past 14 days, you've come in contact with someone who has confirmed positive for COVID. And then there's guidance kind of varies by state and by health jurisdiction of like self-isolate or go get tested or whatever.
Starting point is 00:29:23 Or go to the bar in some places, apparently. Yeah. The server itself is actually, there's a verification component because ideally, like, we don't want people to just be like, oh, I'm COVID positive. And then like all their friends get an alert, right?
Starting point is 00:29:40 There needs to be some kind of verification mechanism where you either have a positive test or you have a clinician or a physician who issues you code that you can put into your apps. You can then release your keys. And then there's the actual key server component, which I kind of already described. So it's a pretty complex system. It actually is entirely serverless. So the whole thing, including all like background job processing, it was designed to be serverless from the beginning.
Starting point is 00:30:04 Total greenfield project, right? Like nothing like this exists. So we're really fortunate there. We made some fun and interesting design decisions to keep costs down while, you know, abusing slash using some of the features of serverless like auto-scaling and, you know, being able to fan out across multiple regions.
Starting point is 00:30:19 And using DNS as a database, my personal favorite approach to things. We don't use DNS as a database. We do use Postgres. Lost opportunity. A real database. We don't use DNS as a database. We do use Postgres. Lost opportunity. A real database. But we do use DNS, just not for storing information. So one question I have for you is that you've been at Google for a while.
Starting point is 00:30:38 And you've done an awful lot of things there. But previously, you've also done things that don't really directly align to any of this stuff going on there. You were at HashiCorp and you were at Chef, neither of whom, to my understanding, are technologies that Google makes extensive use of internally for their own stuff. It seems like, and even when you were at Google, you have been continually reinventing what it is that you do. I find that admirable because very often when you see people at a company for a protracted period of time, they sort of get more or less pigeonholed into a role that looks fairly similar from year to year.
Starting point is 00:31:16 You've been incredibly dynamic. Was it intentional and how'd you do it? So I have a diagnosed medical condition called career Dhd I'm just kidding but I do I get bored and it's actually something that I'm really forward with my managers about I've always been very straight with my managers and the people I work with that like eight to twelve months from now I will be doing something different I wish I'd figured that out earlier on in my case the way that I wound up solving for that is yeah I, I've got to come in. I'm going to solve an interesting problem. When I'm done with that, the consulting engagement is over and then I'm going to go away and everyone knows the score going in. It works out way better than, and then I'm going to go cause problems on purpose in other
Starting point is 00:31:57 people's parts of the org because I see problems there. And that was where I always went off the rails. Yeah. I mean, I don't take a similar, like I don't take a dissimilar approach. You know, I try to find high priority strategic things that also align with my interests. And it's important to me that there's things that I can provide and things that I can learn. I never liked to be the smartest person in the room because you just shouldn't be in that room anymore. There's no one for you to learn from. And it's great to share knowledge, but it- I'm not convinced I'm the smartest person in the room right now, despite the fact that right now, I'm the only person in the room that I'm sitting in. I mean, that Minecraft sword is pretty intelligent.
Starting point is 00:32:37 I saw a chihuahua wandering around here too a minute ago. So there is that. But, you know, I think from like a career advice standpoint, I tell everyone you should interview somewhere else at least once a year. And you never know what's out there. And worst case scenario, you kept your interview skills up to date. Keeping those skills in tune is so critically important just because it's a unique skill set that for many folks does not have a whole lot of applicability in their day-to-day job. So if you suddenly have to find a new job, you're great. You're rusty at this. It's been years.
Starting point is 00:33:11 And you're trying to remember, like, okay, when someone asks you what you're looking for in your next job, they're not trying to pick a fight. Don't respond as if they were. Like the basic stuff. It's a skill like anything else. Yeah. And the common questions like, what do you want to do with your life? Or what accomplishment are you most proud of? Having those, not prepared, but knowing in general what you want to say from those is very important when you're thinking about interviewing for other jobs.
Starting point is 00:33:39 But even in a big company, the transfer process is pretty similar for applying externally to other roles. Sometimes there's interviews. Do they make you code on whiteboards to solve algorithm problems? Not me, but in general. Google has evolved its interview process since the last time I went through that particular brand of corporate hazing. Good, good, good. Yeah, the interview process has definitely been refactored a lot, especially with COVID and remote, but also just trying to be accessible to folks. I know one of the big changes Google has made is we no longer require like eight congruent hours of your time.
Starting point is 00:34:16 You can split interviews out over multiple days, which has been really accommodating for folks that have, you know, already have a full time job or have family obligations at home that don't let them just like take eight hours away and devote 100% of their time to interviews. So I think that is, you know, not a whole lot of positive things have come out of COVID, but the flexibility with like interviewing has enabled more people to participate in the interview process that otherwise would not have been able to do so. And there's something to be said for making this more accessible to folks who come from backgrounds that don't all look identical. It's incredibly important. One thing that I definitely want to make sure
Starting point is 00:34:52 we get to before the end of this is something you've been talking about that's a bit orthogonal, but maybe not entirely so, which is software supply chain security. That has been a common thread of discussion in some circles for a while. What is it for those who are unfamiliar, like me sometimes, and what does it imply?
Starting point is 00:35:14 Yeah, so I mean, in the past year, but if you look back, you'll find more cases of it. We live in a world where no company, Google, Amazon, the US government, writes every line of code that they run. And even if you do, right, even if you could find a company that doesn't rely on any external dependencies, what language are they using? Do they write that language? Okay, let's say hypothetically, you write every single line of code and you wrote your own language and only your employees contribute to that language. What operating system are you running on? Because I guarantee you Linus probably contributed to it or Gates contributed to it and they don't work for you.
Starting point is 00:35:55 But let's say you wrote your own operating system, right? So we're getting into like crazy Google things now, right? Like only Google would write their own programming language and their own operating system, right? Who manufactured your CPU, right right like did you actually always dependencies all the way down we see this sometimes with companies talking about oh yeah we're going to go to multiple clouds or a different cloud so that we don't get impacted there's another aws outage in us east one cool great power to you but are you sure your payment provider's not going to go down are are they taking a dependency on us east one great let's say that they're but are you sure your payment provider's not going to go down? Are they taking a dependency on US East 1? Great. Let's say that they're not. Are you sure that their
Starting point is 00:36:29 vendors who are in the critical path are also not taking critical and core dependencies on that? And are you sure that they're aware of who all of those critical dependencies and those vendors are and so on and so forth? It is a vast interconnected web. This is a problem. Dependency sprawl is real. And I don't think that there's a good way to get to the bottom of it, particularly across company boundaries like that. Yeah. And this is where, if you look at the not software supply chain, like if you look at
Starting point is 00:36:56 construction, right? If you're working with a reputable construction agency, they're actually able to tell you, given a granite countertop or a quartz countertop, from what beach and what lot on what date the grains of sand in that countertop came from. That is a reality of that industry that is natural. You think about automotive, the vehicle identification numbers. They tell you exactly what manufacturer, and then there's records that show you exactly what human being on the line put that particular part in that machine.
Starting point is 00:37:30 And we don't have that in software today. Like we have some, you know, bastardized versions of like software bills of material or SBOM. But the simple fact of the matter is like, because software has grown so organically and because this wasn't ingrained in software from the beginning, like it was from, you know, traditional manufacturing, you're going to have an insecure software
Starting point is 00:37:49 supply chain for most of my life. Now, what does that actually mean? Insecure has this negative connotation. It means that you need to make sure that you're aware of everything that you're depending on, which is kind of what you were saying is like both the technical dependencies and the process or the people dependencies. And you need to have a rigorous process for how you're going to respond to these incidents. And I think Log4J was a really good eye-opening moment for folks when they realized that they didn't have a way to make
Starting point is 00:38:17 a large-scale dependency update across their entire fleet of applications. Because who has to do that on a consistent basis? It happens rarely when it happens. It's super important. But I do think that more and more, we're going to see it happen more and more frequently. And ideally, my opinion is that we're going to get to a point where this is inescapable. But ideally, we get to the point where it's like, oh, okay, this dependency is vulnerable. I have a playbook. I follow the playbook. Everything is patched in 30 minutes or less, and I can move on with my life. And it's not a six-week fire drill with people working late and going super crazy trying to mitigate these issues. There's a lot of work happening in this space. We have Salsa, which
Starting point is 00:38:58 is an open standard SLSA for how you declare your software bill of materials and things like binary authorization and attestations. There's Sigstore, there's ChainGuard. There's some companies evolving in this space. Every time I talk to GitHub, I tell them, I'm like, hey, if this VP and that VP talked together and worked on something, you could do something amazing in this space. But I think it's going to be quite a while until we get
Starting point is 00:39:26 to a point where we can say the software supply chain is secure. Because like I was saying at the beginning, like until you manufacture your own CPU, like you're dependent on Intel and AMD. And until you write your own programming language, you're dependent on Ruby, Python, Go, whatever it might be. And until you take no dependencies on some external system, which by the way, might be a bad business decision. Like if someone did the work for you already in an open source ecosystem, it's probably a better business decision to evaluate and use that than to build it yourself. Until we have the analysis on that supply chain, and we can in a dashboard, the click of a button or the run of a command, very easily see the security status of our software supply chain and determine if a particular vulnerability is or is not relevant,
Starting point is 00:40:11 I think we're still going to be in this firefighting mode for at least another couple of years. I want to say you're wrong, but I know you're not. And that's what, I guess, keeps a lot of us awake at night for unfortunate reasons. Seth, I really want to thank you for taking the time to speak with me. If people want to learn more, where's the best place to find you? I'm on Twitter. You can find me. I'm sorry to hear that. So am I. It's the experience. Yeah, you can find me at Seth Vargo. If you say mean and hateful things to me, I actually exercise this finger and you can click the block button real fast.
Starting point is 00:40:44 But yeah, I mean, my DMs are open. If you have any questions, comments, complaints, concerns, you can throw the complaints away and come to me for everything else. Thank you so much for being so generous with your time. I really appreciate it. Yeah. Thanks for having me. It's always a pleasure. Seth Vargo, engineer at Google. I'm cloud economist, Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment asking how dare I malign the good name of the other cloud provider that isn't Google,
Starting point is 00:41:19 that also just so coincidentally happens to employ you. If your AWS bill keeps rising and your blood pressure is doing the same, then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duck Bill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started. This has been a humble pod production
Starting point is 00:42:05 stay humble

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.