Screaming in the Cloud - Episode 57: Building the Cloud: The logistics and practicality of going serverless in 2019

Starting point is 00:00:00 Hello and welcome to Screaming in the Cloud with your host, cloud economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. from June 10th to 13th. He's going to cover a lot of topics we've already covered on previous episodes of this show, ranging from Kubernetes and site reliability engineering over to observability and performance. The idea here is to help you stay on top of the rapidly changing landscape of this zany world called cloud. It's a great place to learn new skills, approaches, and of course, technologies. But what's also great about almost any conference is going to be the hallway

Starting point is 00:01:03 track. Catch up with people who are solving interesting problems, trade stories, learn from them, and ideally learn a little bit more than you knew going into it. There are going to be some great guests, including at least a few people who've been previously on this podcast, including Liz Fong-Jones and several more. Listeners to this podcast can get 20% off of most passes with the code CLOUD20. That's C-L-O-U-D-2-0 during registration. To sign up, go to velocityconf.com slash cloud. That's velocityconf.com slash cloud. Thank you to Velocity for sponsoring this podcast. Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined this week by Richard Hartman, who has decades in open source. We met originally back when we were Freenode staff,

Starting point is 00:01:50 and since then he's done a lot of other things too. You're a Debian developer, you organize a bunch of conferences, including but certainly not limited to PromCom, FOSDEM, and others that I don't care to think about. And you come from mainframes, now you're into networking, then you started building out redundant data centers as turnkey solutions, and apparently you're currently building a data center that I choose to believe is located in the middle of a swamp.

Starting point is 00:02:17 It's actually a Greenfield project, and we couldn't build it in the middle of a swamp, of course. We are going for the highest certification within EN 5600, which is security and availability class 4. Gotcha. So, among many other things, you're in town here in San Francisco, and terrifyingly

Starting point is 00:02:36 close to me, for Google Next, which, as of the time of this recording, just finished. You are a member of the Prometheus core team, but that wound up driving you out here to sit through effectively three full days of talking about Google's cloud. What do you think? It was nice. It was interesting. Many of the talks were a little bit sales pitchy, like a little bit too sales pitchy for my liking. They usually followed the model where initially,

Starting point is 00:03:05 like the first third or so maybe, they had some higher level technical details, like not really into depth. And then they segued their way into why you should be buying from them, which obviously makes sense from that perspective. On the other hand, it's not the type of conference which I'm used to, let's say. It feels like all of the major public cloud vendors have this problem once they hit a certain point of scale.

Starting point is 00:03:29 They have one big cloud conference every year. You have Microsoft Build, you have AWS's reInvent, and you have Google Next, where the conference is trying to do so many things that it almost loses a sense of itself, where you're trying to sell things to people and there's that sales piece of it. There's trying to articulate a vision for the next year. There's product announcements. You're talking to engineers. You're talking to corporate buyers. There are press in attendance. They have analysts that come through and start to ideally say nice things about them. And when you get all of that together, it's very hard to build any kind of cohesive narrative that

Starting point is 00:04:04 addresses all of those constituencies. So on some level, whenever you're at one of these, it feels like you're always in the wrong place, listening to the wrong story from the wrong people. And I've never found a good way to solve that. I don't think there is a good way to solve this. Of course, inherently, you have all those different priorities and all those different goals, and to juggle all of them just doesn't work at least not at that huge scale which they which they put together so i'm not actually complaining it's just an observation which i made that this seemed to be this way there were other things also minor but um one one other thing which i noticed the analyst lounge which is sitting right smack in

Starting point is 00:04:42 the middle of everything it has full catering and everything, whereas the speaker lounge is basically a coffee maker and some granola bars. So that gives you a little bit insight into the relative value which is assigned to this. But again, I'm not complaining. It's just I couldn't help but observe that this is happening. Credit where due. The press lounge was also super nice. See, that's my point. And to some extent, this seems like a bit of a departure from Google's historic positioning as engineers first, last, and always. And I think that you sort of have to once you grow beyond a certain user profile.

Starting point is 00:05:16 It's interesting to see how that's going to be maintained going forward. I mean, there have been enough jokes made about it, but historically sticking to things that are not core to what they've always done, namely search and ads, has always been something that Google has seemed to struggle with. So while they're saying the right things, I think people are mostly going to adopt a wait-and-see approach, at least for a time. That is probably correct. I mean, from my perspective, Google has absolute top-notch engineering, and this is an engineering-driven company by and large. So it just stands to reason that a lot of the internal culture is also engineering-driven,

Starting point is 00:05:50 which tends to disregard a lot of other needs of other people and teams and organizational units. So I fully agree this messaging needs to change for more traditional businesses to actually be able and willing to adopt their product. On the other hand, I do hope that they don't lose this striving for technological excellence. I would be very surprised if they lost

Starting point is 00:06:18 the pursuit of technological excellence. I would be less surprised if they lost their willingness to engage with large enterprises. So it comes down to fundamentally, I can see them reverting back to what their company was built on, their corporate DNA, as it were. I can't see them completely pivoting and abandoning where they've spent the last 20 years.

Starting point is 00:06:39 I'm not saying it won't happen, but I have a hard time imagining it. As of right now, I would tend to agree, to be honest. On the other hand, if you look at most companies, like the large ones, they had these huge growth phases and they were very, very engineering driven. And then at some point, what will you be promoted for? And at some point, this becomes more like enterprise stuff,

Starting point is 00:07:00 maybe marketing, maybe economics. So people with that kind of thinking tend to be promoted more and more as older as the company gets. So this will over time change things. Like I'm not an Apple user, but looking at Apple from the outside, this kind of seems to happen where this focus on engineering and on excellence

Starting point is 00:07:21 just gets a little bit lost and their edge also gets lost. It's an interesting problem. Changing gears slightly, let's talk a little bit about something you said back when we were preparing for the show, specifically that the cloud is nothing new, it's old again, and it's always been this way, except for the fact that it's somehow completely different. What do you mean by that? What I meant by that is that fundamentally IT stays the same while it completely changes every few years. If you look at any old monolithic application,

Starting point is 00:07:51 which is huge and horrible and everyone will tell you this thing cannot be maintained, blah, blah, blah, all these things, still you have functions in there. And functions on a very basic level are not different at all from a microservice. You change how the APIs, how the interfaces, how the service delineations are exposed. You change a little bit of the mix of how you do it and what you do. And obviously, you always try and raise the bar for tech as a whole. And it also comes a little into this thing where I like to say IT breathes, where things go in and out.

Starting point is 00:08:27 Like you go from one extreme to the other. You internalize and you outsource. You have your monoliths, you have your totally fine-grained things, and it just goes back and forth, back and forth. And every time you go towards this other extreme, you're trying to solve a or more problems. And once they've been solved, you will then have other problems. So you go back to the middle and you overshoot a little and then rinse repeat. This seems to be happening a lot. If you do it with too much fervor, you might be overdoing it. On the other hand, following this natural lifecycle of IT is pretty nice because

Starting point is 00:09:05 you're just raising the bar again and again. And when you look at cloud, all those issues which infrastructure providers have, like how to run a data center, I can tell you, running is even the small part. Building it is insanely complex. All these things just go away because you have a different service delineation and you just build on top of that. You were in town to give a talk. Tell me a little bit about first what that talk was. It was titled Prometheus, What the Hype is About. It was a mixture of the usual Prometheus 101 along with why people who are calling themselves cloud developers should care about this. And what is Prometheus for those who have not yet attended a Prometheus 101 talk?

Starting point is 00:09:48 Prometheus is a monitoring framework. It ingests time series data as in numeric data, which changes over time. You might think service latency, you might think user count, how many errors you have, temperature, whatever, just changes over time. It's not geared for events, so you can't put log lines or anything in there. It's purely for numeric data changing over time. And what you can do with it is you can ingest a lot,

Starting point is 00:10:14 a lot, a lot of data with relatively few resources. Like you can easily do on normal hardware, you can easily, or normal VM, you can easily do a million samples per second and more it comes out at roughly 200k samples per core like if you want more just put more cores and you're done basically so it's super efficient in ingesting that data and also exposing that data back to the user as you have these immense amounts of data you obviously need a way to to actually get this data again. So we have something called labeling,

Starting point is 00:10:45 which is basically key value pairs. And you are allowed to assign arbitrary key value pairs to your data to then be able to select and slice and dice your data through this N-dimensional matrix, which you're building up. So you could do by region, you could do by customer,

Starting point is 00:11:00 you could do by prod or dev. And all these things, which normally are stuck in a hierarchical data model are all of a sudden available to you as direct first-class things. But having those labels is only half the story. Of course, you obviously need some way to actually work with that data.

Starting point is 00:11:15 And that's another of the really nice things about Prometheus. You have this one single functional language which you have to learn. It's called PromQL. And it's basically doing vector math on your monitoring data. So instead of just having this one graph, which never changes and you can't really do anything with it, of course you encode stuff into a image file. You can actually take this data and two data signs on it and it's during complete language. It's super powerful.

Starting point is 00:11:40 It kind of takes some getting used to, but it's really nice once you learn it. And the next thing is you use this for alerting, you use this for analysis, you use this for graphing, you use this for dashboarding. You can use it to get your data out in JSON format. You have this one single way to access all the data and it's always the same as opposed to a lot of other systems where you have to think differently about accessing the data depending on if you want to do alerting or report. This might be something of a controversial question, or rather the question is not. The answer is probably going to be hotly debated. But at what point does it make sense to do something

Starting point is 00:12:15 like that or to implement something like that versus deploying one of the many, many, many monitoring vendors that purport to do not only what you've described, but everything else as well. When does deploying or building your own monitoring system make sense for an organization? Fundamentally, it's always the same make or buy question. And this is no other. Obviously, I'm biased, so I would tend to run things myself,

Starting point is 00:12:42 which works. And for small teams and such, it's super easy to just spin up a new instance and do some monitoring on whatever you want to do. Maybe you just want to do some hocking or whatever, and you're super flexible in what you do. But that's only part of the story. The other thing which Permissus enabled was it shifted the whole of IT monitoring. And again, I'm biased, but from my perspective, it actually shifted or uplifted a whole segment of IT

Starting point is 00:13:12 as in monitoring to a new level. So there's a lot of vendors which now support similar things. I mean, I do have personal opinions about a few of them, but fundamentally, unless they do something completely wrong, it's not a bad thing to use them. This ties in to some extent to, I guess, a past life and something you still dabble in from time to time of network engineering. Once upon a time, if a company wanted to do anything that even touched on IT, they needed to have someone with network engineering expertise in-house. Today,

Starting point is 00:13:42 it's debatable whether that's still the case. What do you think? You still need people who know how to do these things, but their daily workload will change, massively change. So you might not need someone who, or not a lot of people who are aware of the intricacies of Ethernet or whatever. Like VRRP setups tend to be somewhat icky, and if you can avoid them, by all means avoid them. But avoiding them usually means having either an overlay network or having dynamic routing, which I think is a perfect solution, but it's quite complicated. But again, cloud shifts the service delineation, and all of a sudden you have to do all those nitty-gritty details yourself.

Starting point is 00:14:21 You can buy this as a service. Still, you will need someone who is aware of how those fundamentals work so you might still need your vpn gateways you might need someone to connect a vpc from on-prem to your cloud or to your multi-cloud or whatever so you still need the knowledge about how things work but the actual day-to-day job will change and by extension obviously the actual skill set needed also changes along with it. But you still need domain experts, same as in anything else. Even if you have a hosted database, it still makes sense to have people who actually are aware of how things work in the background, so they can make good decisions about how to set this thing up. To some extent, people have been saying for

Starting point is 00:15:00 generations now, it seems like, that in the future, you'll never have to worry about the undifferentiated heavy lifting or the toil. You can only focus on writing business logic and doing things that move your business directly forward. I mean, in my own career, once upon a time, I started off as a large-scale email admin, and that was something every company needed. Today, almost no company needs that. It's click, click, done with a hosted provider or very occasionally you have a small central group that runs Exchange internally or something like that. I can't shake the feeling that to some extent the level of expertise required for most companies

Starting point is 00:15:36 who are not themselves deep into the, I guess, IT space as what they do need to have a strong grounding in network engineering, network theory, being able to handle complex routing situations, etc. It feels like that has been abstracted away by and large from a lot of, I guess, typical companies. Is that a naive approach?

Starting point is 00:15:55 I recognize we are sitting in San Francisco where everything here is a web app. There is an entire ecosystem out there of companies that that does not apply to. I understand that. I wasn't aware you had a career. There is an entire ecosystem out there of companies that that does not apply to. I understand that. I wasn't aware you had a career. My parents still believe I don't.

Starting point is 00:16:10 It's fine. Okay. Yes. Again, the subset of skills needed changes dramatically, and a lot of those details are just abstracted away behind a new service delineation. So a lot of the things you don't really need in your day-to-day anymore, like it still makes sense to maybe have one person,

Starting point is 00:16:31 it might not even need to be in the same company who just knows that stuff, of course, else you're bound to make mistakes from the past again and again. Of course, you will always need at least some knowledge of how things work. But I fully agree that this depth of knowledge fully moves to infrastructure providers. And it's probably a good thing because most people, like most enterprises, at least from my networking perspective, have a really hard time even getting networking people because they just don't care about this type of network. So hiding this behind a proper service, which is managed by experts, absolutely

Starting point is 00:17:06 makes sense, at least for those who can do actual cloud. You still have tons and tons and tons of legacy implementations. And you have fields and industries where IT is currently nice, but it's not essential. And those have completely different needs, completely different needs from anyone who's in cloud web app API world and just living a quite nice life, to be honest. I refuse to accept that here at Twitter for Pets headquarters. So on a similar vein,

Starting point is 00:17:41 serverless has been sort of taking over the world with similar promises, that the only thing you'll ever have to worry about in the future is application code, that it's going to be a magic coming of almost paradise where only pure developments matters. Everything else is handled for you by one of several cloud providers. And everyone's touting this as a new thing. Is it? Yeah, I think CGI has been pretty new. So on the one hand, again, it's old and it's newing this as a new thing. Is it? Yeah, I think CGI bin is pretty new.

Starting point is 00:18:06 So on the one hand, again, it's old and it's new at the same time. And it also ties a little bit into this toil thing. Of course, to some extent, toil is good because it lets you learn about how the underlying things work so you have a better understanding of why something might be happening in a certain way. But jumping back to serverless, the concept of putting a piece of code in some place and having this executed when an external event comes along is not new.

Starting point is 00:18:31 CGI bin is fundamentally the same. You have a web browser usually, and this makes a call to a thing, and this thing gets executed, and it returns with some data, and then it dies. So exactly the same thing happens in serverless. Like you have a lot more emphasis on different APIs. You have a lot more emphasis on events. You have this awareness that these events

Starting point is 00:18:55 will usually not be generated by a human or by a web browser, but by something else. So a lot of those things are evolving in a good and in a nice and more efficient and effective way. But fundamentally, it's the same as before. One of the misnomers that I tend to see from time to time when talking to people about serverless is that there's a belief that, well, I have some code and now it's going to take that code and it's going to run it for me. Yeah, I have a wristwatch that can do that. There's not a lot of value in being able to say that,

Starting point is 00:19:26 yes, I have a computer. What is more interesting to some extent is, yes, what you say, the event model, being able to impact when that code runs, what it takes in and what it returns. There are economic factors that feel different this time. And maybe that's a bit of a red herring, but the idea of not having to worry

Starting point is 00:19:43 in any traditional sense about scaling, that was always a concern with CGI bin. Not having to worry about paying for things to sit idle when they weren't being addressed. Instant on, consumption-based economic models start to be transformative for some use cases. What I think is also very interesting and differentiates this somewhat from CGI bin is that there's a thousand different ways to write serverless functions. Most of them are absolutely terrible, especially with things with custom runtimes, write it whatever language you want. To my recollection, CGI bin was mostly a Perl requirement, wasn't it? It started as Perl. There were other languages which were shoehorned onto it. The entire challenge that I see in, I guess, trying to view this as sort of the second

Starting point is 00:20:26 coming of CGI bin is, again, everything old is new again. What I'm wondering is, was there anything in between CGI bin and serverless? Because we haven't talked about CGI bin for 15 years in most shops, and serverless as a thing is four or five years old. What happened in between? Serverless might be the third coming of CGI bin. Of course, you have App Engine in between. This ties back to this

Starting point is 00:20:49 engineering-driven excellence thing where Google was kind of trying to tell people, hey, this thing exists and maybe you want to use it and maybe we also use it for our own services and have quite some good experience with it. But people didn't really care. It probably just wasn't the

Starting point is 00:21:05 current time of, like in the global market, the time of this shift going in this direction again. So as far as CGI bin versus modern serverless, one of the big benefits of modern serverless is elasticity, the ability to only have things on demand when you need them, and you don't pay for them when they're not hanging around. Back in the days of CGI bin, I still had to provision servers, care an awful lot about capacity planning, screw up an awful lot of capacity planning, and then resign in disgrace. How does that look today? I would argue that in the days of CGI bin, there was also this promise of someone else has taken care of scaling or of running servers for you, which is not very different from what we hear today. Of course, fundamentally, it's more or less

Starting point is 00:21:48 the same. But the thing which in both cases made people go back towards the other extreme or will which will happen with with serverless at some point, at least in my opinion, you still need to keep state like that's the dirty secret. You have super fluent complexity where people just add features course they can and it's just new and cool and they just do whatever. And you have the system inherent complexity and you cannot reduce this. You can put it behind different services. You can have different APIs, you can have different service delineations, but this complexity needs to live somewhere. And as a networking person person one of the main complex things is keeping

Starting point is 00:22:25 state for long term and to persist it in a way that you can still access all those pictures of cats or whatever long term and none of these questions are answered by serverless it's just okay it is someone else's problem but then when you scale up more and more and it's all the time someone else's problem which is super nice at point, you will probably hit that wall of, I need this to be faster. I need this to be more performant. And so you might be tempted to just bring your code and your data and your state more together again. And this is probably something which we'll be seeing in, I don't know, five years, 10 years, but it will happen. Of course, that's always the case. Mark my words, people who listened to this in 2030,

Starting point is 00:23:06 I've been right. And we'll probably be having the same debates in 2030 when that happens. Of course. And it's going to be different terminology, different buzzwords. My Twitter for Pat's reference will seem incredibly dated. And, oh, Google, the same way we talk about IBM today.

Starting point is 00:23:21 Because nothing's new again. It always seems that history rhymes. A question I have for you as someone who's building data centers in swamps in 2019, what is the story for data center economics in a world where, for most use cases, a cloud provider is going to have economies of scale that no traditional data center provider will have. They will be able to offer greater elasticity. They will be able to offer greater elasticity. They will be able to offer armies of people to fix relatively routine issues

Starting point is 00:23:51 that a typical provider would have to be concerned with. What is the case for a data center in these days? On a very fundamental level, the case is where do you think the cloud is running? Look at all the numbers which are being pushed out about capacity. I mean, you can play a bullshit numbers game and talk about how you use two Eiffel Towers of steel

Starting point is 00:24:12 or 20, which is a pretty arbitrary measure. Of course, you can build in concrete or in steel, so you can even change that. That was one of the things that surprised me at the Google Next keynote. I didn't realize when you ordered steel from a supplier that you ordered it in, the units you used were numbers of eiffel towers that that was strange to me yeah it's a totally it like it says business practice to order steel in in just fractions of

Starting point is 00:24:35 eiffel towers that's super common i'll take four eiffel towers and two-thirds of a titanic please yep but you'll have to dive for the latter one. So anyway, tons and tons and tons of energy are being poured into building data centers, into making data centers more efficient. So this cloud is running somewhere. So all those big providers also need to have data centers. That's one part of the answer. So while people might forget that data centers exist, and this is totally fine, because I mean, you also don't daily think about power plants. Yet, if you plug into a wall outlet, you have power, it's just something which exists,

Starting point is 00:25:10 and it's there and just works. And you have this clearly defined service delineation, which in the case of power hasn't changed in a few decades, or maybe 100 years, but still, you have this thing, and you rely on it. This is the definition of infrastructure. People don't think about it, it just works. And if it stops working, they're really, really upset and for good reason. So for smaller providers, building a data center still makes tons of sense because there's tons and tons and tons of industry and of customers who are not able or willing to go into the cloud just as of right now. It might be that they have certain legal requirements, especially in Germany, a lot of them are a lot harsher than anywhere else in the world.

Starting point is 00:25:50 So a lot of external people who need to okay how a company is run, especially when it comes to financial data or how it comes to health data or something like that, you can't really put this in the cloud, unless you run that cloud yourself, which is often called hybrid cloud. And obviously you can squeeze maybe two or three bucks out more if you go all in on a public cloud, but this gives you less control. So a large part of building data centers and running data centers,

Starting point is 00:26:18 if you're not one of those huge players these days, it would be those customers who need co-location, who need really top-notch service in those data centers, and who need them to be up and running 24-7 guaranteed. So this is the market we are chasing. And to be honest, we see quite some interest. There is huge interest. It might be part of the filter bubble that you're just not as aware of this, especially in the Bay Area, for obvious reasons. But there's a huge market still. The challenge that I see is that when I do leave the Bay Area, as happens from time to time,

Starting point is 00:26:50 it turns out planes do fly everywhere. I find myself talking to an awful lot of quote-unquote traditional companies who are in heavily regulated industries that are making at least partial shifts to cloud. They're still investing in data centers, of course, but that investment is now being made with an eye towards tapering off further and further over the next 10 years. 10 years is usually like if you have a medium to large size contract, 10 years would probably be a good measure for default contract time. So it makes sense that this is also the length of time

Starting point is 00:27:23 people would be talking about. I'm not fully convinced this means they will move fully away within the time. It might just be that that's their planning horizon. So that's how far ahead they can plan and do plan. We'll see what happens. For the foreseeable future, there's definitely no shortage in people who need this, who really rely on this. And I think that's part of the challenge everyone struggles with.

Starting point is 00:27:44 One of the things I love about these large cloud conferences is that we're able to talk to people who have very different use cases from our own. It's always nice to envision a use case we hadn't personally considered or talk to someone who is building a thing that you didn't realize existed. That's fun.

Starting point is 00:28:00 It's always neat to step outside of my Twitter for pets bubble. Yeah. If people enjoy what you have to say for some unforeseeable reason, where can they hear more of it? The best places are probably either my Twitter account at Twitch.eh or any random conference I happen to walk through and give a talk at. Perfect. And we will put a picture as well so that people know what you look like so they can stop you at random and share their opinions with you. Great. Richard Richie H. Hartman, former Freenode staff member, current Debian developer, conference organizer,

Starting point is 00:28:31 Prometheus core team member, and friend. I'm Corey Quinn, and this is Screaming in the Cloud. This has been this week's episode of Screaming in the Cloud. You can also find more Corey at Screaminginthecloud.com or wherever Fine Snark is sold. This has been a HumblePod production. Stay humble.

Your Ad Here

Screaming in the Cloud - Episode 57: Building the Cloud: The logistics and practicality of going serverless in 2019

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.