Screaming in the Cloud - From A to Z in Alphabet’s Soup with Seth Vargo
Episode Date: March 10, 2022About SethSeth Vargo is an engineer at Google. Previously he worked at HashiCorp, Chef Software, CustomInk, and some Pittsburgh-based startups. He is the author of Learning Chef and is passio...nate about reducing inequality in technology. When he is not writing, working on open source, teaching, or speaking at conferences, Seth advises non-profits.Links:Twitter: https://twitter.com/sethvargo
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
The company 0x4447 builds products to increase standardization and security in AWS organizations.
And they do this with automated pipelines that use well-structured projects to create secure, easy-to-maintain, and fail-tolerant solutions.
One of those is their VPN product, built on top of the popular OpenVPN project,
which has no license restrictions. You're only limited by the network card in the instance.
To learn more, visit snark.cloud slash deploy and go. That's snark.cloud slash deploy and go.
All one word.
Couchbase Capella.
Database as a service is flexible, full-featured, and fully managed,
with built-in access via key value, SQL, and full-text search.
Flexible JSON documents align to your applications and workloads.
Build faster with blazing fast in-memory performance and automated replication and scaling
while reducing cost.
Capella has the best price performance
of any fully managed document database.
Visit couchbase.com slash screaming in the cloud
to try Capella today for free
and be up and running in three minutes
with no credit card required.
Couchbase Capella, make your data sing.
Welcome to Screaming in the Cloud.
I'm Corey Quinn.
I have a return guest today, though it barely feels like it qualifies
because Seth Fargo was guest number three on this podcast.
I've had a couple of folks on since then,
and for better or worse,
I'm no longer quite as scared of the microphone
as I was back in those early days.
Seth, thank you for joining me.
Yeah, thank you so much for having me back, Corey.
Really excited to figure out
whatever we're talking about today.
Well, let's start there,
because the last time we spoke,
you were, if memory serves,
a developer advocate at Google
Cloud. Correct. And you've changed jobs, but not companies, but kind of companies because welcome
to large environments. But over the past few years, you've remained at Google. You are no longer at
Google Cloud and you're no longer a developer advocate. In fact, your title is simply engineer at Google. And what you've been focusing on, to my understanding, is helping Alphabet companies,
namely, you know, the Alphabet, always in parentheses in journalistic style,
Google's parent company, because no one thinks of it in terms of Alphabet,
is effective. You're effectively helping companies within the conglomerate umbrella securely and privately consume public cloud.
Yes, that is correct.
So I used to work in what we call the cloud PA.
PA stands for product area.
Other product areas are like Chrome and Android. the core PA where I'm helping lead and run an initiative that, like you said, is to help
Alphabet companies to securely and privately use public cloud services.
So I am going to go out on a limb because my position on multi-cloud has always been
pick a cloud.
I don't particularly care which one, but pick one and focus on that.
I'm going to go out on a limb and presume that given that you are not at Google Cloud anymore, but you are at Google,
you probably have a slight preference as far as which public cloud these various companies within
the umbrella should be consuming. Yeah. I mean, obviously I think most viewers will think the
answer is GCP. And if you said GCP, you would be like 95% correct.
Well, you'd also be slightly less than that correct, because they're doing a whole rebrand
thing calling it Google Cloud in public as opposed to GCP.
You really don't work for the same org anymore.
You're not up to date on the very latest messaging talking points.
I missed all, there's so many TLAs that you lose all your TLAs over time.
Oh, yes.
So Google Cloud would be like 95% correct.
But what you have to really understand is Google has its own, you know, cloud.
We didn't call it a cloud at the time.
You might call it on-prem or legacy infrastructure, if you will.
Primarily built on a scheduling system called Borg, which is like Kubernetes version zero.
And a lot of the Alphabet companies have workloads that run on Borg.
So we're actually talking about hybrid cloud here, which, you know, you may not think of
Google as like a hybrid cloud customer, but a workload that runs on our production infrastructure
called Borg that needs to interact with a workload that runs on Google Cloud,
that is hybrid cloud.
It's no different than a customer
who has their own data center
that needs peering to a public cloud provider,
whether that's Google Cloud or AWS or Azure.
I think the other thing is,
if you look at the regulatory space,
particularly a lot of the Alphabet companies
operate in say like healthcare or finance or fintech,
where certain countries and certain jurisdictions have regulations around like, you must be multi-cloud.
You know, that some people might say that means you have to run, you know,
the same instance of the same app across clouds,
or some people say your data can be here, but your workloads can be over there.
That's to be interpreted. But, you know, I would say 95% is GCP,
but there is a, or sorry say 95% is Google Cloud, but there is a small percentage that is definitely going to be other cloud providers and hybrid cloud as well.
My position on multi-cloud is often people like to throw it in my face of, see, you gave this general guidance, and therefore, whenever you say something that goes against it, you're a giant phony.
And it's, yeah, Twitter doesn't do so well with the nuance.
My position of pick a provider and go all in is intended as general guidance for the
common case.
There are exceptions to this, and any individual company or customer is going to have more
context than that general guidance will.
So if you say you need to be in multiple clouds for certain reasons, you're probably correct. If you say you need to be in multiple clouds because your regulator
demands it, you are certainly correct. I am not arguing against that in any way.
I do want to disclaim one of my biases here as well. And that is specifically that if I were
building a startup today and I were not me, by which I mean having spent 10 years in the AWS ecosystem learning not just how it works, but how it breaks, because that's important in production.
And, you know, also having a bunch of service owners at AWS on speed dial.
And I were approaching this from the naive, I need to pick a cloud, which one would I go with?
My bias is for Google Cloud. And the reason behind that
is the developer experience is spectacular as the primary, but not only perspective on that.
So I'm curious to know that as you're helping what are effectively internal customers move to
Google Cloud, is their interaction with Google Cloud as a platform the same as it
would be if I, as a random outside customer, were using Google Cloud?
Is there a bunch of internal back channels?
Oh, you get the good kind of internal Google Cloud that most of us don't get access to
or something else?
Yeah, that's a great question.
So first, thank you for the kind words on the developer experience.
They were honest words, to be clear.
Let me be very direct with you. If I thought your developer experience was trash, I might not say it outright in the effort not to be actively antagonistic to
someone I'm having on the show right now, but I would not say it if I didn't believe it.
Yeah. And I totally, I know you, I've known you for many years. I totally believe you.
But I do thank you for saying that because that was the team that I was on before this
was largely responsible for that across the platform.
But back to your original question around,
like, what does the support experience look like?
So it's a little bit of both.
So Alphabet companies,
they get a technical account manager
very similar to how, you know,
a reasonable size spend customer
would get a technical account manager.
That account manager has access
to the cloud support channels.
So all of that looks the same.
I think where things look a little bit different
is because myself and some of our other leads
came from cloud.
I generally don't like this phrase,
but we know people.
So we tend not to go directly to cloud when we can.
We want Alphabet companies to really behave and act as if they were an external entity.
But we're able to help the technical account manager navigate the support process a little bit better by saying, like, you need to ask for this person, right?
You need to say these words to get in front of the right person to get this ticket assigned to the right person.
So the process is still the same, but we're able to leverage our pre-existing knowledge with cloud. The same way if you had a Zoogler, an ex-Googler who worked for your company,
would be able to kind of help move that support process along a little bit faster.
I am quite sincere when I say that this is a problem that goes far beyond simply Google.
A disturbing portion of my job as a cloud economist helping my clients consists of nothing other than introducing Amazonians to one another.
And these are hard problems at scale.
I work at a company with a dozen people in it.
It turns out that, yeah, it's pretty easy to navigate who's responsible for what.
When you have a hyperscale size company in the trillion dollar range, a lot of that breaks down super quickly.
Yeah. And there's just a lot of churn at all levels of the organization. And, you know,
we talked about this when we first joined the show, like I switched roles, right? I used to
be in cloud and now I'm in what we call core. I still get people who are reaching out to me
at Google and externally who are saying, oh, can you answer this question?
Hey, how do I do this?
And I've gradually over the past couple of months convinced people that I don't work
on that anymore.
And I try to be helpful where I can.
You're even using the old name and everything.
They're eventually going to learn, right?
I know.
They'll learn.
They'll be like, what do you call this?
GCP?
Okay, great.
We don't need you anymore.
But it's true, right?
People leave the organization.
People join the organization.
There's reorgs, there's strategic changes,
people switch roles within the org.
And all of that leads to complexity with navigating
what is the size of a small nation in some cases.
When you're blind and your biography says
that you enable alphabet companies to securely
and privately consume public cloud.
Now, that would make perfect sense. And I would really have no further questions based on what we've already said,
except for the words securely and privately.
And I want to dive into that first.
Let's work backwards with the second one first.
What does privately mean in this context?
So privately means like privacy preserving for both the alphabet company and the users or customers that they have.
So when we look at that from the perspective of the Alphabet company, that means protecting their data from the eyes of the cloud provider.
So that's things like customer managed encryption keys if at any point the cloud provider is accessing your data, even for a legitimate purpose, like submitting a support
ticket or something or diagnosing a support ticket, that you have visibility into that.
Then the privacy preserving side on the alphabet company's customers is about providing that same
level of visibility to their customers, as well as making sure that any data that they're storing
is private, it's
not accessible to certain parties it's following, whether it's actual legislation around how
long data can be persisted, things like GDPR, or if it's just a general data retention,
insider risk management, all of that comes into this idea of building a private system
or privacy-preserving system.
Let's be very clear that my position on it is
that Google's relationship with privacy has been somewhat challenged, and due to no small part to
the sheer scale of how large Google has grown. And let's be clear, I believe firmly that at
certain points of scale, yeah, you deserve elevated levels of scrutiny. That is how we
want society to function function by and large.
And there are times where it feels a little odd on the cloud side.
For example, as of the time of this recording, somewhat recently, there was a bug in some of the copyright detection stuff where Google Drive would start flagging files as having copyright challenges if they contained just the character one in them,
which, okay, clearly a bug,
but it was a bit of a reminder for some folks,
wait, that's right, Google does tend to scan these things.
Well, when you have a bunch of end user customers
in a bunch in the ways that Google does,
that stuff is baked in
and it shapes how you wind up seeing things.
From Amazon's perspective, historically,
they basically sold books and then later underpants. And doing e-commerce transactions was basically the extent
of their data work with customers. They weren't really running large-scale file sharing systems
and abilities in collaboration suites, at least not that really had any of those pesky things
called customers. So that is not built into their approach and their needs in the same way.
To be clear, I am sympathetic to the problems, but it's also, it's a challenging problem,
especially as you continue to evolve and move things into cloud, you absolutely must be able
to trust your cloud provider, or you should not be working on that cloud provider has been my approach.
Yeah, I mean, there's certainly things that you can do to mitigate.
But in general, there is some level of trust.
Forget the data on the availability side, right?
Like when the cloud provider says, this is our SLA, and you agree to that SLA,
like, yeah, you get money back if they mess it up.
But ultimately, you're trusting them to adhere to that SLA, right? And you get recompense if they fail to do so, but that's still like trust. Trust is far more than just on the privacy side, such as the continued investment that Alphabet slash Google is making in Google Cloud.
It's easy to take the approach of, well, you've turned off a bunch of consumer services, so therefore you're going to turn off the cloud at some point too.
No, let me be very clear for the record.
I do not believe that you are going to one day flip a switch and turn off Google Cloud
and neither do Google Cloud. And neither
do your customers. Instead, the approach, the way that enterprises express this, it's not about you
flipping the switch and turning it off. That's what contracts are for. Their question, and they
enshrine this in contracts in some cases, in the event, not that you turn it off, but that you
fail to appropriately continue to invest in the platform. Because at
enterprise scale, this is how things tend to die. It is not through flipping a switch in most cases,
it's through, we're just going to basically mothball it, keep it more or less exactly as it
is until it slowly fades into irrelevance for a long period of time. And when you're providing
the infrastructure to run things for serious institutions, that part isn't okay. And credit where due, I have seen every indication
that Google means it when they say this is an area of strategic and continued ongoing focus
for us as a company. Yeah, I mean, Google is heavily investing in cloud. I mean, this is a
brand new group that I'm working in, and we're trying to get Alphabet companies in cloud. I mean, this is a brand new group that I'm working in and we're trying to get Alphabet companies onto cloud.
So obviously there's some
very high level top down
executive support for this.
I will say that the
100% agree with everything you're saying.
The traditional enterprise approach
of build this Java app,
because let's be honest,
it's always Java.
Build this Java app,
compile it into a jar
and run it forever
is becoming problematic. We saw this recently with like compile it into a jar and run it forever is becoming
problematic. We saw this recently with like the log4j. Yeah, it should be in a container. What
the hell? I'm kidding. I'm kidding. Please don't send me email, whatever you do. What's a container?
I'm just kidding. The idea of like software rotting is very real. And it's becoming more
and more of a risk to security, to privacy, to public cloud providers,
to enterprises, where when you see something like Lock4J happen, and you can't answer the question,
like, do we have any code that uses that? If getting the answer to that question takes you
six weeks, boy, a lot of stuff can happen in six weeks while that particular thing is exploited.
And, you know, it kind of gets into software supply chain a little bit,
but I do agree that like secure, private, and stable APIs are super important. And it's an area where Google's investing.
At the same time, I think the enterprise industry is moving away a little bit
from set it and forget it as a strategy.
I want to talk about the security portion as well, as far as securely consuming public cloud
goes. And let me start off with a disclaimer here, because I don't want people to misconstrue
what I'm about to say. If you are migrating to one of the big three cloud providers,
their security will be better than anything you will be able to achieve as a company yourself.
Not you personally, because Google is a bit of an asterisk to that statement, given what you have been doing and have been doing since the 90s in your on-prem world with Borg and the rest.
But my philosophy on the relative positioning of the security of
cloud providers relative to one another has changed. I spent four months beating the crap
out of Azure for having an issue where there was control plane access and then really saying
nothing about it. And the day after I put out a blog post on that topic, because I was tired of
the lack of response, it came out that right at the same time, AWS had a very similar problem, and had not said anything themselves. And that went back and forth, apparently waiting to wind up doing a release until this happened, and worker security wound up putting one out there. And it was, it was frustrating on a couple of levels. First, the people at both of these companies who work in security are stars. There
is no argument, no bones about that. Problems are going to happen. Things are going to occur as a
result. And the only saving grace then is the transparency and communication around it. And
there was none of it from them. I'm also more than a little bit irked that my friends at AWS
were aware of this, basically watched me drag Azure for four months knowing that they'd done the same thing
and never bothered to say a word,
but okay, that's a choice.
I've been saying for a while that of the big three,
Google's security posture is the most impressive.
And it used to be a slight difference.
Like you did nosed ahead of AWS in that respect,
not by a huge margin, but by a bit.
I don't think it's nearly as close these days in my mind. And talking to other large companies
about these things and people who are paid to worry about these things all day long,
I am very far from alone in that perspective. So I guess my question for you is, as you look at moving the workloads securely
to Google Cloud,
it feels like security is baked into everything
that all aspects of your company have done.
Why is that a specific area of focus?
Or is that how it gets baked into everything you folks do?
So you kind of like set up the answer for this perfectly.
I swear we didn't talk about this extensively.
You didn't know any of that was coming, by the way, just to be very clear here.
I don't sit here and say, all right, I'm going to say this and here's the right response.
No, this is an impromptu, more or less ad hoc show every time I do it.
Yeah.
And I'm going to preface this by saying, like, I don't want this to sound like egotistical,
but I have never found a company that has as rigorous security and privacy
policies, reviews, and procedures as Google. I thought I had, and I was wrong.
Yeah. And I have a lot of apologizing to people that do as a result of that.
And honestly, every time I interact with our internal security engineering teams or our
IP protection teams, I'm that Nathan Fillion meme where he's like,
you know, like,
okay, I get it.
I get it, right?
And then face vomit.
I should say so that I can't.
Yeah.
Oh, yeah.
The reason that it's hard
for Alphabet companies
to securely and privately
move to cloud,
specifically for security,
is because Alphabet's stance
is so much more
rigorous than anyone else in the industry, to the point where in some cases, even our own
cloud provider doesn't meet the bar for what we require for an internal workload.
And that's really what it comes down to is like, the reason that Google is the most secure cloud
is because our bar is so
high that sometimes we can't even meet it. I have to assume that the correct answer on this
is that you then wind up talking to those product teams and figure out how to get them to a point
where they can support that bar. Because the alternative is effectively, it's like, oh yeah,
this is Google Cloud. And it's absolutely right for multinational banks to use. But not Google workloads, that stuff's important.
And I don't think that that is necessarily how you folks tend to view these things.
So it's a bi-directional street, right? So a lot of it is working with a product management team
to figure out where we can add these additional security properties into the system.
I should say tri-directional.
The second area is where the policy is so specific to Google that Google should actually build its own layer on top of it that adds the security
because it's not generally applicable to even big, huge cloud customers.
And then the third area is,
Google's a very big company.
Sometimes we didn't write stuff down.
And sometimes we have policies
where no one can really articulate
where that policy came from.
And something that's new with this approach
that we're taking now is like,
we're actually trying to figure out
where that policy came from
and get the impetus of what it was trying to protect against
and make sure that it's still applicable.
And I don't know if you've ever worked with governments
or large companies, right?
They have this spreadsheet of hundreds of thousands of words.
You are basically describing my client list.
Please continue.
I mean, sometimes they have to use an access database
because they exhaust the number of rows in an Excel spreadsheet.
And it's just checklist upon checklist upon checklist.
And that's not how Google does security. Security is a very all-encompassing kind of 360 type of thing.
But we do have policies that are difficult to articulate what they're actually protecting
against. And we are constantly reevaluating those and seeing like, this made sense on Borg.
Does it actually make sense on cloud? And in some cases, it may not.
We get the same protections using, say, a GCP native service, and we can omit that requirement
for this particular workload. This episode is sponsored by our friends at Oracle Cloud.
Counting the pennies, but still dreaming of deploying apps instead of hello world demos,
allow me to introduce you to Oracle's always free tier.
It provides over 20 free services and infrastructure, networking, databases, observability,
management, and security. And let me be clear here, it's actually free. There's no surprise
billing until you intentionally and proactively upgrade your account. This means you can provision
a virtual machine instance or
spin up an autonomous database that manages itself, all while gaining the networking,
load balancing, and storage resources that somehow never quite make it into most free tiers
needed to support the application that you want to build. With Always Free, you can do things like
run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisk next to the word free.
This is actually free. No asterisk. Start now. Visit snark.cloud slash oci-free. That's
snark.cloud slash oci-free. I think that when it comes to things like policies that are intelligently crafted
around security, you folks, and to be fair, the AWS security engineers as well, have been doing
it right in that, okay, we're going to build a security control to make sure that a thing can't
happen. That's not enough. Then there's the defense in depth. Okay, let's say that control
fails through some variety of ways. Here are the other things we're going to do to prevent cross-account access, for example.
And that in turn winds up continuing to feed on itself
and build into a culture of assuming that
you can always continue to invest in security.
How far is enough?
Well, for most folks, they haven't gone far enough yet.
Another way to put this is like,
how well do you want to sleep at night um you know there's
there's folks on the google security engineering team who are so smart and they work on like our
offensive security team so their full-time job is to try to hack google um and then figure out how
to prevent that and you know so i've read some of the reports and some of the ways they think, and I'm like, how do you, how do you pick up a mobile phone and go to like any website confidently knowing what
you know? Right. And like, how do you, who said anything about confidently? Yeah. How do you use
self-checkout at a supermarket and like, not just like where your, your entire full body tinfoil
hat suit. Um, but you know, I think the bigger risk
is not knowing what the risks are.
And this is a lot what we're seeing
in software supply chain too,
is a lot of security is around threat modeling,
not checklists.
But we tend to like gravitate toward checklists
because they're concrete.
But you really have to ask yourself,
like, do I need the same security properties
on my static blog website
that is stored on an S3 bucket or a GCS bucket that's public to the internet that I do on my credit card processing service?
And a lot of times we don't treat those differently.
We don't apply a different threat model to them.
And then everything has to have the same level of security.
And then everything is in scope for whatever it is you're trying to defend against.
And that is a short path to madness.
Yes.
Yes. Yes.
Your static HTML files and your GCS bucket are in scope for SOC 1 and 2 because you didn't
have a way to say they weren't.
Yeah.
You've also done some, again, the nice thing about being at a company for a while, from
what I can tell, given that I've never done it until I started this place, is you move
around and work on different projects.
You were involved as well, personally, in notifications project, the joint collaboration thing between a number of
companies in the somewhat early days of the pandemic that all of our phones talk to one
another and anonymously and in a privacy preserving way, let us know that, hey, by the way, someone
you were in close contact with has tested positive for COVID-19 in the previous fixed period of time.
What'd you do over there?
Yeah, so the Explored Notifications Project was a joint effort primarily between Apple and Google to use Android and iOS devices to help stop the spread of COVID or reduce the spread of COVID as much as possible.
The idea being, because the incubation period is roughly 14 days, at least
pre-Omicron, if we could tell you, hey, you might have been exposed and get you to stay at home for
three or four days, self-isolate, we could dramatically reduce the spread of COVID.
And we know from some of the studies that have come out of the UK and European region,
the technology actually reduced the spread of cases by like 1400% in some cases.
I was one of the tech leads for the server side.
So the way the system works is it uses low energy Bluetooth on iOS and Android devices to basically broadcast random IDs.
So I know this is screaming into the cloud, but if we can just quickly screaming into
the void as a rebrand, That's basically what's happening.
You're generating these random identifiers and just like yelling them.
And there's other phones out there who are listening.
And they collect these what we call RPIs
or rolling indicators.
They have no data in them.
They're like literally like a UUID
or 32 bytes of random data.
They aren't at all like associated with your device
or your person. So then what happens is like, let's say you're in a supermarket, you're near
someone for, you know, every so often your phones exchange these IDs. If you then test positive,
those IDs go up to a centralized server. The server, again, also has no idea who you are.
So the whole thing is privacy preserving end to end. Then the server basically bundles
all of what we call the TEKs
or the temporary exposure keys
into a tarball that go up onto a CDN.
And then every night,
all of the devices that are participating in EN
download this into a local key match.
So at no point does the server ever know
that you were in a supermarket with someone else.
Only your phone knows that you came in contact
with this TEK in the past 14 days
or 21 days in some jurisdictions.
And it'll generate an exposure notification
or an exposure alert, which says like,
hey, in the past 14 days,
you've come in contact with someone
who has confirmed positive for COVID.
And then there's guidance kind of varies by state
and by health jurisdiction of like self-isolate
or go get tested or whatever.
Or go to the bar in some places, apparently.
Yeah.
The server itself is actually,
there's a verification component
because ideally, like,
we don't want people to just be like,
oh, I'm COVID positive.
And then like all their friends get an alert, right?
There needs to be some kind of verification mechanism
where you either have a positive test
or you have a clinician or a physician who issues you code that you can put into your apps.
You can then release your keys.
And then there's the actual key server component, which I kind of already described.
So it's a pretty complex system.
It actually is entirely serverless.
So the whole thing, including all like background job processing, it was designed to be serverless from the beginning.
Total greenfield project, right?
Like nothing like this exists.
So we're really fortunate there.
We made some fun and interesting design decisions
to keep costs down while, you know,
abusing slash using some of the features of serverless
like auto-scaling and, you know,
being able to fan out across multiple regions.
And using DNS as a database,
my personal favorite approach to things.
We don't use DNS as a database.
We do use Postgres.
Lost opportunity. A real database. We don't use DNS as a database. We do use Postgres. Lost opportunity.
A real database.
But we do use DNS, just not for storing information.
So one question I have for you is that you've been at Google for a while.
And you've done an awful lot of things there.
But previously, you've also done things that don't really directly
align to any of this stuff going on there. You were at HashiCorp and you were at Chef,
neither of whom, to my understanding, are technologies that Google makes extensive
use of internally for their own stuff. It seems like, and even when you were at Google,
you have been continually reinventing what it is that you do. I find that admirable because
very often when you see people at a company for a protracted period of time, they sort of get
more or less pigeonholed into a role that looks fairly similar from year to year.
You've been incredibly dynamic. Was it intentional and how'd you do it?
So I have a diagnosed medical condition called career Dhd I'm just kidding but I do I get bored
and it's actually something that I'm really forward with my managers about I've always been
very straight with my managers and the people I work with that like eight to twelve months from
now I will be doing something different I wish I'd figured that out earlier on in my case the
way that I wound up solving for that is yeah I, I've got to come in. I'm going to solve an interesting problem. When I'm done with that,
the consulting engagement is over and then I'm going to go away and everyone knows the score
going in. It works out way better than, and then I'm going to go cause problems on purpose in other
people's parts of the org because I see problems there. And that was where I always went off the
rails. Yeah. I mean, I don't take a similar, like I don't take a dissimilar approach. You know, I try to find high priority strategic things that also align with my
interests. And it's important to me that there's things that I can provide and things that I can
learn. I never liked to be the smartest person in the room because you just shouldn't be in that
room anymore. There's no one for you to learn from. And it's great to share knowledge, but it-
I'm not convinced I'm the smartest person in the room right now, despite the fact that right now,
I'm the only person in the room that I'm sitting in.
I mean, that Minecraft sword is pretty intelligent.
I saw a chihuahua wandering around here too a minute ago. So there is that.
But, you know, I think from like a career advice standpoint, I tell everyone you should interview somewhere else at least once a year.
And you never know what's out there.
And worst case scenario, you kept your interview skills up to date.
Keeping those skills in tune is so critically important just because it's a unique skill set that for many folks does not have a whole lot of applicability in their day-to-day job.
So if you suddenly have to find a new job, you're great.
You're rusty at this.
It's been years.
And you're trying to remember, like, okay, when someone asks you what you're looking for in your next job, they're not trying to pick a fight.
Don't respond as if they were.
Like the basic stuff.
It's a skill like anything else.
Yeah. And the common questions like, what do you want to do with your life?
Or what accomplishment are you most proud of?
Having those, not prepared, but knowing in general what you want to say from those is
very important when you're thinking about interviewing for other jobs.
But even in a big company, the transfer process is pretty similar for applying externally to other roles.
Sometimes there's interviews.
Do they make you code on whiteboards to solve algorithm problems?
Not me, but in general.
Google has evolved its interview process since the last time I went through that particular brand of corporate hazing.
Good, good, good.
Yeah, the interview process has definitely been refactored a lot, especially with COVID and remote, but also just trying to be accessible to folks.
I know one of the big changes Google has made is we no longer require like eight congruent hours of your time.
You can split interviews out over multiple days, which has been really accommodating for folks that have, you know, already have a full time job or have family obligations at home that don't let them just like take eight hours away and devote 100% of their time to interviews.
So I think that is, you know, not a whole lot of positive things have come out of COVID, but
the flexibility with like interviewing has enabled more people to participate in the
interview process that otherwise would not have been able to do so.
And there's something to be said for making this more accessible to folks who come from backgrounds
that don't all look identical.
It's incredibly important.
One thing that I definitely want to make sure
we get to before the end of this
is something you've been talking about
that's a bit orthogonal,
but maybe not entirely so,
which is software supply chain security.
That has been a common thread of discussion
in some circles for a while.
What is it for those who are unfamiliar, like me sometimes, and what does it imply?
Yeah, so I mean, in the past year, but if you look back, you'll find more cases of it.
We live in a world where no company, Google, Amazon, the US government, writes every line of
code that they run. And even if you do, right, even if you could find a company that doesn't
rely on any external dependencies, what language are they using? Do they write that language?
Okay, let's say hypothetically, you write every single line of code and you wrote your own language and only your employees contribute to that language.
What operating system are you running on?
Because I guarantee you Linus probably contributed to it or Gates contributed to it and they
don't work for you.
But let's say you wrote your own operating system, right?
So we're getting into like crazy Google things now, right?
Like only Google would write their own programming language and their own operating system, right?
Who manufactured your CPU, right right like did you actually always dependencies all the way
down we see this sometimes with companies talking about oh yeah we're going to go to multiple clouds
or a different cloud so that we don't get impacted there's another aws outage in us east one cool
great power to you but are you sure your payment provider's not going to go down are are they
taking a dependency on us east one great let's say that they're but are you sure your payment provider's not going to go down? Are they taking a dependency on US East 1? Great. Let's say that they're not. Are you sure that their
vendors who are in the critical path are also not taking critical and core dependencies on that?
And are you sure that they're aware of who all of those critical dependencies and those vendors
are and so on and so forth? It is a vast interconnected web. This is a problem.
Dependency sprawl is real.
And I don't think that there's a good way to get to the bottom of it, particularly across
company boundaries like that.
Yeah.
And this is where, if you look at the not software supply chain, like if you look at
construction, right?
If you're working with a reputable construction agency, they're actually able to tell you, given a granite countertop or
a quartz countertop, from what beach and what lot on what date the grains of sand in that
countertop came from. That is a reality of that industry that is natural. You think about
automotive, the vehicle identification numbers. They tell you exactly what manufacturer,
and then there's records that show you
exactly what human being on the line
put that particular part in that machine.
And we don't have that in software today.
Like we have some, you know, bastardized versions
of like software bills of material or SBOM.
But the simple fact of the matter is like,
because software has grown so organically
and because this wasn't ingrained in software
from the beginning, like it was from,
you know, traditional manufacturing, you're going to have an insecure software
supply chain for most of my life.
Now, what does that actually mean?
Insecure has this negative connotation.
It means that you need to make sure that you're aware of everything that you're depending
on, which is kind of what you were saying is like both the technical dependencies and
the process or the people dependencies. And you need to have a
rigorous process for how you're going to respond to these incidents. And I think Log4J was a really
good eye-opening moment for folks when they realized that they didn't have a way to make
a large-scale dependency update across their entire fleet of applications.
Because who has to do that on a consistent basis? It happens rarely
when it happens. It's super important. But I do think that more and more, we're going to see it
happen more and more frequently. And ideally, my opinion is that we're going to get to a point
where this is inescapable. But ideally, we get to the point where it's like, oh, okay, this
dependency is vulnerable. I have a playbook. I follow the playbook. Everything is patched in 30 minutes or less, and I can move on with my life.
And it's not a six-week fire drill with people working late and going super crazy trying to
mitigate these issues. There's a lot of work happening in this space. We have Salsa, which
is an open standard SLSA for how you declare your software bill of materials and things like binary authorization and attestations.
There's Sigstore, there's ChainGuard.
There's some companies evolving in this space.
Every time I talk to GitHub, I tell them,
I'm like, hey, if this VP and that VP talked together
and worked on something,
you could do something amazing in this space.
But I think it's going to be quite a while until we get
to a point where we can say the software supply chain is secure. Because like I was saying at the
beginning, like until you manufacture your own CPU, like you're dependent on Intel and AMD. And
until you write your own programming language, you're dependent on Ruby, Python, Go, whatever
it might be. And until you take no dependencies on some external system, which by
the way, might be a bad business decision. Like if someone did the work for you already in an
open source ecosystem, it's probably a better business decision to evaluate and use that than
to build it yourself. Until we have the analysis on that supply chain, and we can in a dashboard,
the click of a button or the run of a command, very easily see the security status of our software supply chain and determine if a particular vulnerability is or is not relevant,
I think we're still going to be in this firefighting mode for at least another couple of years.
I want to say you're wrong, but I know you're not. And that's what, I guess, keeps a lot of us
awake at night for unfortunate reasons. Seth, I really want to thank you for taking the time
to speak with me. If people want to learn more, where's the best place to find you?
I'm on Twitter. You can find me.
I'm sorry to hear that. So am I. It's the experience.
Yeah, you can find me at Seth Vargo. If you say mean and hateful things to me,
I actually exercise this finger and you can click the block button real fast.
But yeah, I mean, my DMs are
open. If you have any questions, comments, complaints, concerns, you can throw the complaints
away and come to me for everything else. Thank you so much for being so generous with your time.
I really appreciate it. Yeah. Thanks for having me. It's always a pleasure.
Seth Vargo, engineer at Google. I'm cloud economist, Corey Quinn, and this is Screaming
in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you've hated this podcast,
please leave a five-star review on your podcast platform of choice, along with an angry comment
asking how dare I malign the good name of the other cloud provider that isn't Google,
that also just so coincidentally happens to employ you. If your AWS bill keeps rising and your blood pressure is doing the same,
then you need the Duck Bill Group.
We help companies fix their AWS bill by making it smaller and less horrifying.
The Duck Bill Group works for you, not AWS.
We tailor recommendations to your business
and we get to the point.
Visit duckbillgroup.com to get started.
This has been a humble pod production
stay humble