Screaming in the Cloud - Episode 57: Building the Cloud: The logistics and practicality of going serverless in 2019
Episode Date: April 24, 2019About Richard HartmannRichard "RichiH" Hartmann is the Swiss Army Chainsaw at SpaceNet, leading both a greenfield datacenter build and monitoring. By night, he is involved in several FLOSS pr...ojects, a Prometheus team member, founder of OpenMetrics, and organizing various related conferences, including but not limited FOSDEM, DENOG, and Chaos CommunicationCongress. Links Referenced: https://velocityconf.com/cloudhttps://prometheus.iohttps://www.debian.orghttps://promcon.iohttps://fosdem.orghttps://cloud.withgoogle.com/nexthttps://www.microsoft.com/en-us/buildhttps://reinvent.awsevents.comhttps://twitter.com/twitchih
Transcript
Discussion (0)
Hello and welcome to Screaming in the Cloud with your host, cloud economist Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud. from June 10th to 13th. He's going to cover a lot of topics we've already covered on previous episodes of this show, ranging from Kubernetes and site reliability engineering over to observability
and performance. The idea here is to help you stay on top of the rapidly changing landscape
of this zany world called cloud. It's a great place to learn new skills, approaches, and of
course, technologies. But what's also great about almost any conference is going to be the hallway
track. Catch up with people who are solving interesting problems, trade stories, learn from them,
and ideally learn a little bit more than you knew going into it. There are going to be some great
guests, including at least a few people who've been previously on this podcast, including Liz
Fong-Jones and several more. Listeners to this podcast can get 20% off of most passes with the code CLOUD20. That's C-L-O-U-D-2-0 during registration.
To sign up, go to velocityconf.com slash cloud. That's velocityconf.com slash cloud.
Thank you to Velocity for sponsoring this podcast.
Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined this week by Richard Hartman, who has decades in open source.
We met originally back when we were Freenode staff,
and since then he's done a lot of other things too.
You're a Debian developer, you organize a bunch of conferences,
including but certainly not limited to PromCom, FOSDEM,
and others that I don't care to think about.
And you come from mainframes, now you're into networking,
then you started building out redundant data centers as turnkey solutions,
and apparently you're currently building a data center
that I choose to believe is located in the middle of a swamp.
It's actually a Greenfield project,
and we couldn't build it in the middle of a swamp, of course.
We are going for the highest certification within EN 5600,
which is security and availability
class 4.
Gotcha.
So, among many other things, you're in town
here in San Francisco, and terrifyingly
close to me, for Google
Next, which, as of the time of this recording,
just finished.
You are a member of the Prometheus core team,
but that wound up driving you out here
to sit through effectively three full days of talking about Google's cloud. What do you think?
It was nice. It was interesting. Many of the talks were a little bit sales pitchy,
like a little bit too sales pitchy for my liking. They usually followed the model where initially,
like the first third or so maybe,
they had some higher level technical details,
like not really into depth.
And then they segued their way into why you should be buying from them,
which obviously makes sense from that perspective.
On the other hand, it's not the type of conference which I'm used to, let's say.
It feels like all of the major public cloud vendors have this problem once they hit a
certain point of scale.
They have one big cloud conference every year.
You have Microsoft Build, you have AWS's reInvent, and you have Google Next, where the conference
is trying to do so many things that it almost loses a sense of itself, where you're trying
to sell things to people and there's that sales piece
of it. There's trying to articulate a vision for the next year. There's product announcements.
You're talking to engineers. You're talking to corporate buyers. There are press in attendance.
They have analysts that come through and start to ideally say nice things about them.
And when you get all of that together, it's very hard to build any kind of cohesive narrative that
addresses all of those constituencies. So on some level, whenever you're at one of these, it feels like
you're always in the wrong place, listening to the wrong story from the wrong people. And I've never
found a good way to solve that. I don't think there is a good way to solve this. Of course,
inherently, you have all those different priorities and all those different goals,
and to juggle all of them just doesn't work at least not
at that huge scale which they which they put together so i'm not actually complaining it's
just an observation which i made that this seemed to be this way there were other things also minor
but um one one other thing which i noticed the analyst lounge which is sitting right smack in
the middle of everything it has full catering and everything, whereas the speaker lounge is basically a coffee maker and some granola bars.
So that gives you a little bit insight into the relative value which is assigned to this. But
again, I'm not complaining. It's just I couldn't help but observe that this is happening.
Credit where due. The press lounge was also super nice.
See, that's my point.
And to some extent, this
seems like a bit of a departure from Google's historic positioning as engineers first, last,
and always. And I think that you sort of have to once you grow beyond a certain user profile.
It's interesting to see how that's going to be maintained going forward. I mean,
there have been enough jokes made about it, but historically sticking to things that are not core
to what they've always done, namely search and ads, has always been something that Google has seemed to struggle with.
So while they're saying the right things, I think people are mostly going to adopt a wait-and-see approach, at least for a time.
That is probably correct.
I mean, from my perspective, Google has absolute top-notch engineering, and this is an engineering-driven company by and large.
So it just stands to reason that a lot of the internal culture
is also engineering-driven,
which tends to disregard a lot of other needs
of other people and teams and organizational units.
So I fully agree this messaging needs to change
for more traditional businesses to actually be able
and willing to adopt their product.
On the other hand, I do hope that they don't lose
this striving for technological excellence.
I would be very surprised if they lost
the pursuit of technological excellence.
I would be less surprised if they lost
their willingness to engage with large enterprises.
So it comes down to fundamentally,
I can see them reverting back to what their company was built on,
their corporate DNA, as it were.
I can't see them completely pivoting and abandoning
where they've spent the last 20 years.
I'm not saying it won't happen,
but I have a hard time imagining it.
As of right now, I would tend to agree, to be honest.
On the other hand, if you look at most companies,
like the large ones, they had these huge growth phases
and they were very, very engineering driven.
And then at some point, what will you be promoted for?
And at some point, this becomes more like enterprise stuff,
maybe marketing, maybe economics.
So people with that kind of thinking tend to be promoted more and more
as older as the company gets.
So this will over time change things.
Like I'm not an Apple user,
but looking at Apple from the outside,
this kind of seems to happen
where this focus on engineering and on excellence
just gets a little bit lost
and their edge also gets lost.
It's an interesting problem. Changing gears slightly, let's talk a little bit about something
you said back when we were preparing for the show, specifically that the cloud is nothing new,
it's old again, and it's always been this way, except for the fact that it's somehow
completely different. What do you mean by that? What I meant by that is that fundamentally IT stays the same
while it completely changes every few years.
If you look at any old monolithic application,
which is huge and horrible and everyone will tell you
this thing cannot be maintained, blah, blah, blah, all these things,
still you have functions in there.
And functions on a very basic level are not different at all from a microservice.
You change how the APIs,
how the interfaces, how the service delineations are exposed. You change a little bit of the mix
of how you do it and what you do. And obviously, you always try and raise the bar for tech as a
whole. And it also comes a little into this thing where I like to say IT breathes, where things go in and out.
Like you go from one extreme to the other.
You internalize and you outsource.
You have your monoliths, you have your totally fine-grained things, and it just goes back and forth, back and forth.
And every time you go towards this other extreme, you're trying to solve a or more problems.
And once they've been solved, you will then have other problems. So you go back to the middle and
you overshoot a little and then rinse repeat. This seems to be happening a lot. If you do it
with too much fervor, you might be overdoing it. On the other hand, following this natural
lifecycle of IT is pretty nice because
you're just raising the bar again and again. And when you look at cloud, all those issues
which infrastructure providers have, like how to run a data center, I can tell you,
running is even the small part. Building it is insanely complex. All these things just go away
because you have a different service
delineation and you just build on top of that. You were in town to give a talk. Tell me a little
bit about first what that talk was. It was titled Prometheus, What the Hype is About. It was a
mixture of the usual Prometheus 101 along with why people who are calling themselves cloud
developers should care about this. And what is Prometheus for those who have not yet attended a Prometheus 101 talk?
Prometheus is a monitoring framework.
It ingests time series data as in numeric data, which changes over time.
You might think service latency, you might think user count,
how many errors you have, temperature, whatever, just changes over time.
It's not geared for events,
so you can't put log lines or anything in there.
It's purely for numeric data changing over time.
And what you can do with it is you can ingest a lot,
a lot, a lot of data with relatively few resources.
Like you can easily do on normal hardware,
you can easily, or normal VM,
you can easily do a million samples
per second and more it comes out at roughly 200k samples per core like if you want more just put
more cores and you're done basically so it's super efficient in ingesting that data and also
exposing that data back to the user as you have these immense amounts of data you obviously need
a way to to actually get this data again. So we have something called labeling,
which is basically key value pairs.
And you are allowed to assign arbitrary key value pairs
to your data to then be able to select
and slice and dice your data
through this N-dimensional matrix,
which you're building up.
So you could do by region,
you could do by customer,
you could do by prod or dev.
And all these things,
which normally are stuck in a hierarchical data model
are all of a sudden available to you
as direct first-class things.
But having those labels is only half the story.
Of course, you obviously need some way
to actually work with that data.
And that's another of the really nice things about Prometheus.
You have this one single functional language
which you have to learn.
It's called PromQL.
And it's basically doing vector math
on your monitoring data. So instead of just having this one graph, which never changes and you can't
really do anything with it, of course you encode stuff into a image file. You can actually take
this data and two data signs on it and it's during complete language. It's super powerful.
It kind of takes some getting used to, but it's really nice once you learn it. And the next thing is you use this for alerting, you use this for analysis, you use this for
graphing, you use this for dashboarding.
You can use it to get your data out in JSON format.
You have this one single way to access all the data and it's always the same as opposed
to a lot of other systems where you have to think differently about accessing the data
depending on if you want to do alerting or report.
This might be something of a controversial question, or rather the question is not. The
answer is probably going to be hotly debated. But at what point does it make sense to do something
like that or to implement something like that versus deploying one of the many, many, many
monitoring vendors that purport to do not only what you've described,
but everything else as well.
When does deploying or building your own monitoring system
make sense for an organization?
Fundamentally, it's always the same make or buy question.
And this is no other.
Obviously, I'm biased, so I would tend to run things myself,
which works.
And for small teams and such, it's super easy to just spin up a new instance and do some
monitoring on whatever you want to do.
Maybe you just want to do some hocking or whatever, and you're super flexible in what
you do.
But that's only part of the story.
The other thing which Permissus enabled was it shifted the whole of IT monitoring. And again, I'm biased, but from my perspective,
it actually shifted or uplifted a whole segment of IT
as in monitoring to a new level.
So there's a lot of vendors which now support similar things.
I mean, I do have personal opinions about a few of them,
but fundamentally, unless they do something completely wrong,
it's not a bad thing to use them. This ties in to some extent to, I guess,
a past life and something you still dabble in from time to time of network engineering.
Once upon a time, if a company wanted to do anything that even touched on IT,
they needed to have someone with network engineering expertise in-house. Today,
it's debatable whether that's still the case.
What do you think? You still need people who know how to do these things, but their daily workload will change, massively change. So you might not need someone who, or not a lot of people who
are aware of the intricacies of Ethernet or whatever. Like VRRP setups tend to be
somewhat icky, and if you can avoid them, by all means avoid them.
But avoiding them usually means having either an overlay network or having dynamic routing,
which I think is a perfect solution, but it's quite complicated.
But again, cloud shifts the service delineation,
and all of a sudden you have to do all those nitty-gritty details yourself.
You can buy this as a service.
Still, you will need someone who is aware of how those fundamentals work so you might still need your vpn gateways you might need
someone to connect a vpc from on-prem to your cloud or to your multi-cloud or whatever so you
still need the knowledge about how things work but the actual day-to-day job will change and by
extension obviously the actual skill set needed also changes along with it. But
you still need domain experts, same as in anything else. Even if you have a hosted database, it still
makes sense to have people who actually are aware of how things work in the background, so they can
make good decisions about how to set this thing up. To some extent, people have been saying for
generations now, it seems like, that in the future, you'll never have to worry about the
undifferentiated heavy lifting or the toil. You can only focus on writing business logic and doing
things that move your business directly forward. I mean, in my own career, once upon a time, I
started off as a large-scale email admin, and that was something every company needed. Today,
almost no company needs that. It's click, click, done with a hosted provider or very occasionally you have a small central group
that runs Exchange internally or something like that.
I can't shake the feeling that to some extent
the level of expertise required for most companies
who are not themselves deep into the, I guess,
IT space as what they do need to have
a strong grounding in network engineering,
network theory,
being able to handle complex routing situations, etc.
It feels like that has been abstracted away by and large
from a lot of, I guess, typical companies.
Is that a naive approach?
I recognize we are sitting in San Francisco
where everything here is a web app.
There is an entire ecosystem out there of companies
that that does not apply to.
I understand that.
I wasn't aware you had a career. There is an entire ecosystem out there of companies that that does not apply to. I understand that.
I wasn't aware you had a career.
My parents still believe I don't.
It's fine.
Okay.
Yes.
Again, the subset of skills needed changes dramatically,
and a lot of those details are just abstracted away
behind a new service delineation.
So a lot of the things you don't really need in your day-to-day anymore,
like it still makes sense to maybe have one person,
it might not even need to be in the same company who just knows that stuff,
of course, else you're bound to make mistakes from the past again and again.
Of course, you will always need at least some knowledge of how things work.
But I fully agree that this depth of knowledge
fully moves to infrastructure providers. And it's probably a good thing because most people,
like most enterprises, at least from my networking perspective, have a really hard time even getting
networking people because they just don't care about this type of network.
So hiding this behind a proper service, which is managed by experts, absolutely
makes sense, at least for those who can do actual cloud. You still have tons and tons and tons of
legacy implementations. And you have fields and industries where IT is currently nice,
but it's not essential. And those have completely different needs,
completely different needs from anyone
who's in cloud web app API world
and just living a quite nice life, to be honest.
I refuse to accept that here at Twitter for Pets headquarters.
So on a similar vein,
serverless has been sort of taking over the world
with similar promises,
that the only thing you'll ever have to worry about in the future is application code, that
it's going to be a magic coming of almost paradise where only pure developments matters.
Everything else is handled for you by one of several cloud providers.
And everyone's touting this as a new thing.
Is it?
Yeah, I think CGI has been pretty new. So on the one hand, again, it's old and it's newing this as a new thing. Is it? Yeah, I think CGI bin is pretty new.
So on the one hand, again, it's old and it's new at the same time.
And it also ties a little bit into this toil thing.
Of course, to some extent, toil is good
because it lets you learn about how the underlying things work
so you have a better understanding of why something might be happening in a certain way.
But jumping back to serverless,
the concept of putting a piece of code in some place
and having this executed when an external event comes along is not new.
CGI bin is fundamentally the same.
You have a web browser usually, and this makes a call to a thing,
and this thing gets executed, and it returns with some data,
and then it dies.
So exactly the same thing happens in serverless.
Like you have a lot more emphasis on different APIs.
You have a lot more emphasis on events.
You have this awareness that these events
will usually not be generated by a human
or by a web browser, but by something else.
So a lot of those things are evolving in a good
and in a nice and more
efficient and effective way. But fundamentally, it's the same as before. One of the misnomers
that I tend to see from time to time when talking to people about serverless is that there's a
belief that, well, I have some code and now it's going to take that code and it's going to run it
for me. Yeah, I have a wristwatch that can do that. There's not a lot of value in being able to say that,
yes, I have a computer.
What is more interesting to some extent is,
yes, what you say, the event model,
being able to impact when that code runs,
what it takes in and what it returns.
There are economic factors that feel different this time.
And maybe that's a bit of a red herring,
but the idea of not having to worry
in any traditional sense about scaling, that was always a concern with CGI bin.
Not having to worry about paying for things to sit idle when they weren't being addressed.
Instant on, consumption-based economic models start to be transformative for some use cases.
What I think is also very interesting and differentiates this somewhat from CGI bin is that there's a thousand different
ways to write serverless functions. Most of them are absolutely terrible, especially with things
with custom runtimes, write it whatever language you want. To my recollection, CGI bin was mostly
a Perl requirement, wasn't it? It started as Perl. There were other languages which
were shoehorned onto it. The entire challenge that I see in, I guess, trying to view this as sort of the second
coming of CGI bin is, again, everything old is new again.
What I'm wondering is, was there anything in between CGI bin and serverless?
Because we haven't talked about CGI bin for 15 years in most shops, and serverless as
a thing is four or five years old.
What happened in between?
Serverless might be the third coming of CGI bin.
Of course, you have App Engine in between.
This ties back to this
engineering-driven excellence thing where
Google was kind of trying to tell people,
hey, this thing exists and maybe you want
to use it and maybe we also use it
for our own services and have
quite some good experience with it.
But people didn't really care.
It probably just wasn't the
current time of, like in the global market, the time of this shift going in this direction again.
So as far as CGI bin versus modern serverless, one of the big benefits of modern serverless is
elasticity, the ability to only have things on demand when you need them, and you don't pay for
them when they're
not hanging around. Back in the days of CGI bin, I still had to provision servers, care an awful
lot about capacity planning, screw up an awful lot of capacity planning, and then resign in disgrace.
How does that look today? I would argue that in the days of CGI bin, there was also this promise
of someone else has taken care of scaling or of running servers for you, which is not very different from what we hear today. Of course, fundamentally, it's more or less
the same. But the thing which in both cases made people go back towards the other extreme
or will which will happen with with serverless at some point, at least in my opinion, you
still need to keep state like that's the dirty secret. You have super fluent complexity where
people just add features
course they can and it's just new and cool and they just do whatever. And you have the system
inherent complexity and you cannot reduce this. You can put it behind different services. You can
have different APIs, you can have different service delineations, but this complexity needs
to live somewhere. And as a networking person person one of the main complex things is keeping
state for long term and to persist it in a way that you can still access all those pictures of
cats or whatever long term and none of these questions are answered by serverless it's just
okay it is someone else's problem but then when you scale up more and more and it's all the time
someone else's problem which is super nice at point, you will probably hit that wall of, I need this to be faster. I need this to be more
performant. And so you might be tempted to just bring your code and your data and your state
more together again. And this is probably something which we'll be seeing in, I don't know,
five years, 10 years, but it will happen. Of course, that's always the case. Mark my words,
people who listened to this in 2030,
I've been right.
And we'll probably be having the same debates in 2030
when that happens.
Of course.
And it's going to be different terminology,
different buzzwords.
My Twitter for Pat's reference will seem incredibly dated.
And, oh, Google, the same way we talk about IBM today.
Because nothing's new again.
It always seems that history rhymes.
A question I have for you as someone who's building data centers in swamps in 2019,
what is the story for data center economics in a world where, for most use cases, a cloud provider
is going to have economies of scale that no traditional data center provider will have.
They will be able to offer greater elasticity. They will be able to offer greater elasticity.
They will be able to offer armies of people
to fix relatively routine issues
that a typical provider would have to be concerned with.
What is the case for a data center in these days?
On a very fundamental level,
the case is where do you think the cloud is running?
Look at all the numbers
which are being pushed out about capacity.
I mean, you can play a bullshit numbers game
and talk about how you use two Eiffel Towers of steel
or 20, which is a pretty arbitrary measure.
Of course, you can build in concrete or in steel,
so you can even change that.
That was one of the things that surprised me
at the Google Next keynote.
I didn't realize when you ordered steel from a supplier
that you ordered it in, the units you used were numbers of eiffel towers that that was strange
to me yeah it's a totally it like it says business practice to order steel in in just fractions of
eiffel towers that's super common i'll take four eiffel towers and two-thirds of a titanic please
yep but you'll have to dive for the latter one. So anyway, tons and tons and tons of energy are being poured into building data centers,
into making data centers more efficient.
So this cloud is running somewhere.
So all those big providers also need to have data centers.
That's one part of the answer.
So while people might forget that data centers exist, and this is totally fine,
because I mean, you also don't daily think about power plants. Yet, if you plug into a wall outlet, you have power, it's just something which exists,
and it's there and just works. And you have this clearly defined service delineation, which in the
case of power hasn't changed in a few decades, or maybe 100 years, but still, you have this thing,
and you rely on it. This is the definition of infrastructure. People don't think about it,
it just works. And if it stops working, they're really, really upset and for good reason.
So for smaller providers, building a data center still makes tons of sense because there's tons
and tons and tons of industry and of customers who are not able or willing to go into the cloud
just as of right now. It might be that they have certain legal requirements, especially in Germany, a lot of them are a
lot harsher than anywhere else in the world.
So a lot of external people who need to okay how a company is run, especially when it comes
to financial data or how it comes to health data or something like that, you can't really
put this in the cloud, unless you run that cloud yourself, which is often called hybrid cloud.
And obviously you can squeeze maybe two or three bucks out more
if you go all in on a public cloud,
but this gives you less control.
So a large part of building data centers
and running data centers,
if you're not one of those huge players these days,
it would be those customers who need co-location,
who need really top-notch service in
those data centers, and who need them to be up and running 24-7 guaranteed. So this is the market
we are chasing. And to be honest, we see quite some interest. There is huge interest. It might
be part of the filter bubble that you're just not as aware of this, especially in the Bay Area,
for obvious reasons. But there's a huge market still.
The challenge that I see is that when I do leave the Bay Area, as happens from time to time,
it turns out planes do fly everywhere.
I find myself talking to an awful lot of quote-unquote traditional companies
who are in heavily regulated industries that are making at least partial shifts to cloud.
They're still investing in data centers, of course,
but that investment is now
being made with an eye towards tapering off further and further over the next 10 years.
10 years is usually like if you have a medium to large size contract, 10 years would probably be a
good measure for default contract time. So it makes sense that this is also the length of time
people would be talking about.
I'm not fully convinced this means they will move fully away within the time.
It might just be that that's their planning horizon.
So that's how far ahead they can plan and do plan.
We'll see what happens.
For the foreseeable future, there's definitely no shortage in people who need this, who really
rely on this.
And I think that's part of the challenge everyone struggles with.
One of the things I love about these large cloud conferences
is that we're able to talk to people
who have very different use cases from our own.
It's always nice to envision a use case
we hadn't personally considered
or talk to someone who is building a thing
that you didn't realize existed.
That's fun.
It's always neat to step outside of my Twitter
for pets bubble.
Yeah.
If people enjoy what you have to say for some unforeseeable reason, where can they hear more of it?
The best places are probably either my Twitter account at Twitch.eh or any random conference I happen to walk through and give a talk at.
Perfect. And we will put a picture as well so that people know what you look like so they can stop you at random and share their opinions with you.
Great. Richard Richie H. Hartman, former Freenode staff member,
current Debian developer, conference organizer,
Prometheus core team member, and friend.
I'm Corey Quinn, and this is Screaming in the Cloud.
This has been this week's episode of Screaming in the Cloud.
You can also find more Corey at Screaminginthecloud.com or wherever Fine Snark is sold.
This has been a HumblePod production.
Stay humble.