The Changelog: Software Development, Open Source - The Backstory of Kubernetes (Interview)

Starting point is 00:00:00 Bandwidth for the changelog is provided by Fastly. Learn more at Fastly.com. Welcome back, everyone. This is The Change Log, and I'm your host, Adam Stachowiak. This is episode 250, and today, Jared and I are talking about Kubernetes, K8S, as it might be better known. We're talking to Tim Hawkin, one of the founders and core engineers of Kubernetes, also a part of Sinha, lead product manager. We talked about the backstory of Kubernetes inside Google, how Tim and others got it funded. Tim also did a great job of laying out the infrastructure of Kubernetes, as well as how they've been able to succeed by focusing on the community. We have three sponsors today, Sentry, Hired, and Datadog. Our first sponsor of the show is our friends at Sentry, an open source error tracking application

Starting point is 00:01:18 that shows you every crash in your stack as it happens with the details needed to prioritize identify reproduce and fix each issue sentry also gives you information your support team can use to reach out to and help those affected and tools that let your users send you feedback for peace of mind head to sentry.io and start tracking your errors today for free. Get off the ground with their free plan. Once again, Sentry.io. Tell them Adam from the Chainsaw sent you. And now on to the show. All right, we're back.

Starting point is 00:01:58 We're talking about Kubernetes today, Jerry. Kubernetes. K-8-S. That's right. The coolest acronym. Yeah, and thanks to Bob Radinsky at Google for contacting us and saying, hey, you should do a show on Kubernetes. K8S. That's right. The coolest acronym. Yeah. And thanks to Bob Radinsky at Google for contacting us and saying, hey, you should do a show on Kubernetes. Now, we've done a lot of shows kind of about Kubernetes, kind of on GoTime. So if you're into containers and Kubernetes and Go and

Starting point is 00:02:17 all those things, definitely check out GoTime.fm, especially episode 20. Kelsey Hightower came on the show and that show was actually dedicated to Kubernetes, but most of the time it just gets brought up in passing, but never on the changelog, Adam. No, never, like zero. Zero on the changelog. So we're very excited to have a show about Kubernetes. And today we're joined by Aparna Sinha,

Starting point is 00:02:38 who's the Senior Product Manager at Google and the Product Team Lead of Kubernetes, as well as Tim Hawken, who's one of the OG software engineers on the project. Tim and Aparna, thanks so much for joining us. Thanks for having us. Seeming like being OG?

Starting point is 00:02:53 I love it. Who doesn't like that? I mean, come on. Howdy, Aparna. How are you? Very good. Thank you. So great to have you guys.

Starting point is 00:03:01 You know, who was it that started the software? Jared, was it Bob? Bob the software this year was bob bob love bob he's awesome yeah so one of the things we like to do especially with a project which is kind of as storied as kubernetes is first of all let's just state what it is uh for those who are uninitiated it's an open source system for automating deployment scaling and management of containerized applications one of those things, trying to make the cloud easier. It has a storied history, but most of that history, it sounds like, is from the inside of Google. Before it was open sourced.

Starting point is 00:03:36 In fact, on the Kubernetes homepage, it states that this results from 15 years of experience running production workloads at Google and actually beat out a few other systems, Borg and Omega, internal things at Google. There's a white paper on that, which we'll link up to as well to get where it is today. So, Tim, I think your best position to tell us the story of how Kubernetes won inside Google, why it was open source and all that good stuff. Sure. So I should clarify, Kubernetes is an offshoot of the Borg and the Omega systems. What we use inside Google is still Borg and Omega. Borg's been around since 2003-ish, when it was really a cheroot and that was about it. And nice. And so people were trying to find ways inside google to share machines because dedicated machines per service and production was pretty expensive and not really

Starting point is 00:04:31 efficient and so we started this thing called borg which was really there to schedule work for people and to try to pack machines a little more tightly over the course of you know well since 2004 so over the last 13 14 14 years, we've built out this Borg system and we've added a lot of really neat functionality. Things like cgroups in the Linux kernel came out of the Borg team. We've added all these extra capabilities to the Borg system, but it was very bespoke. It's very internal focused and it's tied to the way Google does things, right? We have the power here to send an email and tell everybody in the company, you have to recompile your application because there was a bug in the core library.

Starting point is 00:05:08 And so people inside Google, sort of myself included, couldn't fathom living without Borg in the outside world. And so we'd always toyed with this idea of, you know, what happens when we leave Google? Will I have to rebuild Borg? How will I live? And when Docker landed in early 2013, we looked at this and said, well, this is kind of Borg-like. This is containers. And we understood sort of viscerally that people were very quickly going to have a lot of the same problems they had in Borg. And we said, well, we know how to solve that. We've done it three times already. So that's when we started looking at Kubernetes as a way of rebuilding a lot of what we did in Borg, but building it in a way that wasn't tailored to Google specifics, that was really there for open source, for applications

Starting point is 00:05:58 like Apache and Nginx and MySQL, which aren't Google applications and don't use our Google RPC libraries. And so that brings a different set of constraints to the problem. So that's what we started building in 2013 and into the beginning of 2014. And this is what became known as Kubernetes. So Kubernetes internally isn't in great use, although we are seeing more and more teams now start to port their applications to our cloud product on top of Container Engine. So was Borg or any parts of Borg ever open sourced or was there just a white paper? I remember there being a few years back, you know, a lot of news around this Borg thing coming out of Google and that might have predated Kubernetes open sourcing. Can you help

Starting point is 00:06:42 me out with the history there? Was Borg or parts of Borg ever open sourced? So the way Google's code base works is we have a mega code base. And so parts of Borg have been open sourced in the sense that some of the core libraries that Borg uses are used by other open source projects. And so pieces of the system have been released, but not as a scheduler container system per se.

Starting point is 00:07:04 We did do a paper on Omega and then we did a paper on Borg. We also did some papers on performance analysis of Borg and using application traces through the system to model the behavior of Borg. And in fact, some of these are, in fact, what led to the development of things like Mesos. Very interesting. Oh, that does ring a bell. I do want to add one item here. First of all, you know, I think we're very fortunate that

Starting point is 00:07:32 we have Tim Hawkins to talk to us about this because he is originally from the Borg team, and he's been at Google for many years, 10 plus years. The other point I wanted to make is that, you know, what we found is that when you talk about open source systems, there's a fundamental difference between taking a technology that you're using internally and open sourcing it versus developing something as an open source platform from the get go. And Kubernetes is the latter. You know, when you take something that's internal and open source, it's not necessarily built for an external environment. And it's not, it's usually, you know, has a number of constraints built into it that may be specific to the company from which it comes. But Kubernetes was built differently. And I think Tim can speak much more about that. But it was really built for the external world with the external world from the start. On that note, though, we've heard the story of people being inside of Google and then,

Starting point is 00:08:28 as you said before, Tim, kind of stepping outside of the Google land and no longer having those Google tools. And what would life be like without those? We've heard that story before. So it seems natural to white paper Borg and provide that out there when Docker became to fruition, but then also feel like you have a problem to solve with Kubernetes and release that as a full-blown, open-source-focused project. Yeah, we like to joke that Google's been playing a long game

Starting point is 00:08:54 for the last 10 years by teaching everybody who comes to work at Google how to use Borg, and then when they leave Google, then they're out there aching for this thing for Borg, and then we produce Kubernetes. Right. Right. That's a heck of a, that's a heck of a long game for sure. Tell me about open source, even the white paper. So I feel like we may have been down this road a little bit with Will Norris or somebody else from Google, Adam, that I can't think of their

Starting point is 00:09:17 name, but you know, I'm looking at Borg as an insider. And I'm thinking this thing, this is a system that we can run our entire organization and people long for it when they leave. That means they don't have it other places. And so that feels like a very strong competitive advantage to an engineering culture. And so why even expose it at all? Why not just keep it all inside? So that's a great question. And in fact, it's exactly what Urs asked us when we brought it to him. And, you know, in the history of Google, we've done a bunch of these sorts of patterns where we release a paper talking about how excellent this idea is, MapReduce or GFS or BigTable. And we put out a paper that says this is how we basically built it and this And this is how it works. And it's amazing. And it's changed the world inside Google.

Starting point is 00:10:09 And then somebody outside goes off and builds it. And they build a different version of it that's not really compatible with what we've done. That may be technically equivalent or technically inferior. But it doesn't matter because when you look at something like Hadoop, that's what the world uses, right? And it doesn't matter if MapReduce is better or not. When we hire a new engineer, they know how to use Hadoop and they have to come in and they have to discard a lot of what they already know to learn how to use what we do internally. So the way I see it is the closer systems get to each other, the more energy you need to keep them from merging. And what we saw in 2014 is container orchestration is going to happen. The world is, you know, they're coming to their senses with respect to containers.

Starting point is 00:10:51 They see the value of it. It was, I mean, Docker happened faster than anybody could have predicted, right? And people were seeing this, but we knew that as soon as they run a couple of these things in production, they were very quickly going to need tools. They're going to need things to help them manage it. We could either be part of that or we could choose not to be part of that. And that was what the Kubernetes decision was really about. That makes a lot of sense. That's reminding me very much. Adam, can you help me out here on an episode that we did? And I feel like it was with Google, but it might have been with James Pierce at Facebook, where it was one of these inevitable things where this was going to

Starting point is 00:11:23 happen and we know it's going to happen and they're either going to do it around a set of tools and and processes and ideas that you know that we are in control of or that we aren't and by control i mean influential and so the the decision was really kind of made around that as opposed to keeping it secret because you want if there's going to be an ecosystem there's going to be a platform you know it's it's advantageous to have right uh especially once the white paper is out there like the toothpaste is out of the out of the what's it called i recall yeah it's out of the tube yeah i can recall some sort of conversation around this it may have been the public data set call we had with github and Google because it was around Google Cloud.

Starting point is 00:12:06 So that might have been part of that conversation, but I can't place it. If we can, if you're a listener and you know the show and we're idiots and we just can't figure it out, let us know. We'll put in the show notes. Yeah. I just say that because I'm having a very strong sense of deja vu, Tim, as you're explaining this to me. And I feel like somebody else has explained this to me and it made a ton of sense then.

Starting point is 00:12:22 It makes a ton of sense now. So tell us the story about the rewrite. So I cast improperly it's not that it beat borg it's that it's more like son of borg right so all the good ideas here uh let's let's let's recreate let's kind of a rewrite but a an offshoot with complete open source in mind from the very start so what does that start to look like? Huge undertaking. No doubt Borg grew organically over years, and now you probably have a lot of pressure to produce something great in a small amount of time. That's very much true. You know, Borg is something like 100 million lines of code, if you count the entire ecosystem in it. It's written in C++.

Starting point is 00:13:01 It has been written over the course of 14 years by hundreds of engineers. It is an incredibly expensive thing to try to recreate, but it's also alien technology. So as Aparna implied earlier, we couldn't just release Borg because it wouldn't be useful for people. And it's so big and so complex and so done that nobody would be able to sort of get into it. So we wanted to really go back to the beginning, start with some fundamental principles, start with some of the lessons that we learned from Borg, some of the mistakes we made, fix those, some of the things that worked really well in Borg and keep those. We had a bunch of decisions early on, right? When do we release it? The answer that we came to was well before we're happy about it because we wanted to build

Starting point is 00:13:42 community. We wanted to get people invested in it from the beginning. What are we going to write it in? Google is a C++ shop for performance things. Borg is C++. But C++ has no real open source community around it, not in the way that C or Python or Java even have communities. So we took a hard look at what languages were hot right now and fit the space that we wanted to go. And we, we chose go. Uh, and I was a personally, a very reluctant go adoptee. I was a C plus plus fan. I liked the power of C plus plus, um, in hindsight, choosing go is absolutely, absolutely the right decision. Uh, the community

Starting point is 00:14:22 around go is amazing. The tooling is very good. The language is actually really good. I get a lot of work done really quickly. So these were all the decisions that we had to make at the beginning, right? Do we build it like Borg where it's RPC and protobuf based? Or do we do more open standards? So we went with REST and JSON because we didn't want this to be a big incestuous Google love fest. It's sort of funny that once gRPC launched, it's got massive adoption. And now we're getting asked to add gRPC support to it. But these were the key decisions at the beginning of how do we start this over? And so when we put out this idea, we went to DockerCon in 2014. I hope I'm getting the dates right. 2014, DockerCon won. And we had a

Starting point is 00:15:07 conversation with some of the fellows from Red Hat, Clayton Coleman in particular. We showed him what we were up to. And I think it resonated for him. I think he got it. And I think we were early enough that he was able to say, we can get involved in this. We can actually establish roots here. This isn't a fully baked thing. And I think that was the groundwork for what's turned out to be a really fantastic open source relationship. And, you know, the project has obviously grown a lot from there. And I want to add one thing, you know, you talked about differentiation and the technology with Borg being a differentiating technology. Why did we open source that? You know, we actually view the community

Starting point is 00:15:47 that has developed around Kubernetes and the fact that it is a community-developed open source product as a differentiator for Google Cloud. Yeah. No, that's a strong point because you can imitate features, you can copy features, right?

Starting point is 00:16:02 But momentum and community and ubiquity is something that's very difficult to get from a competitive advantage. Yep. Good point. As someone who's on the receiving end of a lot of the pull requests from our 1,100, 1,200 contributors, it's amazing and overwhelming. And the project would be an entirely different thing if we had even half of that many people. While we're still here in the history point of this conversation, you mentioned in the pre-call basically that you were one of the founders of Kubernetes and you mentioned the funding process or getting this project funded by Google. What was that process like?

Starting point is 00:16:41 Can you take us back to what it was like, how you sold it, how you all sold it, whatever the process was? Yeah, sure. So early on, there was a prototype that was put together by Joe Beta, Brendan Burns, and some other engineers inside Google to just to prove out some of the ideas that we had pushed forward through Omega. So for history, Brian Grant was the lead developer designer of Omega, and he had some interesting ideas that were different from Borg. And so Brendan and Joe and Craig and Ville took those ideas and they sort of glued them together with some Go and some Shell script and some Docker and some Salt.

Starting point is 00:17:22 And they took all these open source things and they threw them together into a very rough prototype. And they showed us at one of our joint cloud infrastructure meetings and my jaw hit the floor. It was like, this is awesome. This is amazing. It's a little rough around the edges, but that's incredible. And then it sort of sat on a shelf while we tried to figure out what to do with it. Right. Docker was this brand new thing. We didn't know if it was really going to take off or not. We weren't sure how we were going to staff this. The organization wasn't really behind it.

Starting point is 00:17:52 Cloud was still very immature at Google. And we had a bunch of other sort of false starts on how to make containers and Google Cloud a better place. And it was sort of later in 2014 when, sorry, in 2013, when we said, like, we really think that the answer here is to build this Kubernetes thing, right? It wasn't called Kubernetes at the time, but it was to take this thing off the shelf and dust it off and make it into a real product. So we started on the process of that and we did an internal product requirements doc and we brought it to our VPs and we showed them the ideas and we made our pitches and we understood their concerns about what was giving away the golden goose and what was okay to talk

Starting point is 00:18:37 about and release. We had sort of our rules of engagement and that was how we got towards the Kubernetes announcement that we made at DockerCon 1. Initial release June 6, 2014. Does that ring a bell for you? That's right. There you go. So assuming that was your DockerCon 1. Very good. Well, that's a great history. I think we definitely have seen why and how it came out of Google. By the way, Borg is the best name ever. We've heard there's some argumentation around Kubernetes, how to pronounce it, what's the name, but Borg is an excellent name.

Starting point is 00:19:11 Actually, Tim, real quick before the break, can you give us a little insight on Kubernetes, the name and its meaning? And I think there was even like a seven of nine or something to tell us that. Right, so the initial name of the project was seven uh in reference to seven of nine which is following the borg star trek tradition um like all like all good geeks we name things after star trek and um borg really is the best code name ever like there is nothing we'll never

Starting point is 00:19:37 do better than that so it really is uh seven of nine was you know Borg. Oh, I like that. You got an angle on it. So we called it seven. Obviously, that was never going to fly with trademark. And so we had to come up with a real name. And honestly, there's no magic behind the name. We all came up with synonyms and translations, and we just dumped them all into a dock. And we threw out the obviously terrible ones, and we sort of voted on what was left and Kubernetes was the winner. It, you know, as a

Starting point is 00:20:10 word, it means helmsman or the guy who steers the boat, which is in keeping with the sort of nautical theme that Docker put forward, but also capturing a little bit of the, like, we're managing things, not just shipping things. So we liked sort of the connotation of it. It's fairly SEO friendly and, you know, it's memorable. And so it sort of fit the bill for everything we wanted, except for brevity. And so that's why Kubernetes became K8S. It's K, eight letters and an S.

Starting point is 00:20:42 Yeah. I think I want to point this out too, Jared, that he said it's SEO friendly. And I didn't ever really think that Google would care if something was SEO friendly. Well, they got to play by their own game, right? I suppose. They can't just point everything at kubernetes.io. Exactly. We never, ever, ever manipulate search results. So it had to be organic.

Starting point is 00:21:02 This is not an episode of Silicon Valley. I mean, compare, for example, Go, right? Right. Oh, big fail. The least SEO-friendly word in the world. Right. They had to invent a new word just to be SEO'd. Golang.

Starting point is 00:21:16 Right. Not to mention all the namespace clashes on Go, since it represents a game and a drug and many other things. Okay. And the logo. We should say the logo still retains that reference to the number seven. What's our logo? Our logo is a seven-sided, it's a heptagon, a blue heptagon with a seven-sided ship's wheel. Y'all are so nerdy.

Starting point is 00:21:39 I love you so much. I love naming and I love nerdy names and this has it in spades. And I actually think it's kind of cool. You know, like I 18 and took me a long time to figure out that was internationalization because there's 18 letters in between K eight S it's almost like a, it's almost like an inside baseball type of thing that once you're on the inside, it's kind of cool to have to have one of those. So even though it was too long, you won there. Although I mentioned there's controversy now, we'll be talking a little bit about KubeCon, which bothers me because it's Kubernetes, but it's not KubeCon. So

Starting point is 00:22:17 tell us about the controversy real quick, and then we'll take the break. Well, it's a non-English word that is being pronounced primarily in English. So as tends to happen when you anglify those sorts of things, they get changed, to be polite. So Kubernetes is the somewhat obvious pronunciation of the full spelling, but when you abbreviate it to just K-U-B-E, then the English word cube immediately comes to mind. And it's much more approachable to call it cube than kube. And so kubecon seemed to be the obvious thing. Now, there are plenty of people out there who call it kubecon, who jokingly spell it K-0-0-B.

Starting point is 00:23:01 And we even have heard Kubi. Uh, so, you know, and if you want to get even more into the argument about pronunciation, you can talk about how we pronounce the name of our command line tool. Oh, which please do. Well, we're while we're here,

Starting point is 00:23:17 we might as well. The command line tool is K U B E C T L. Right. So cube control is sort of the longest form. I tend to say cube cuddle. Other people say cube CTL. Other people say cube cuddle. Every variant that you can come up with has been tried.

Starting point is 00:23:37 So. Cubey cuddle. Yeah. That's a good one. What else would we have to talk about if we couldn't argue over pronunciations? It's the classic programmer bike shed and we all love it so much. Must argue about themes. Hey, friends, I'm dropping the background music on this break because I want you to

Starting point is 00:23:55 completely focus on what I'm about to tell you. I want to tell you about our friends at Hired. We've been hearing lots of great things about them and their process to help developers find great jobs. So we reached out to them and guess what? They were excited to work with us and we partnered with Hired because they're different. They're an intelligent talent matching platform for full-time and contract jobs in engineering, development, design, product management, and even data science. Here's how how it works instead of endlessly applying to companies hoping for the best hired puts you in full control of when and how you connect with interesting opportunities after you

Starting point is 00:24:33 complete one simple application top employers apply to hire you over a four-week time frame you'll receive personalized interview requests upfront selling information and all this will help you to make better, more informed decisions about your next steps towards the opportunities you'd like to pursue. And the best part is Hired is free. It won't cost you anything. Even better, they pay you to get hired. Head to Hired.com slash changelog.

Starting point is 00:24:58 Don't Google it. This URL is the only way to double that hiring bonus. Once again, Hired.com slash changelog. And now back to the show. is the only way to double that hiring bonus. Once again, Hired.com slash changelog. And now back to the show. All right, we are back with Aparna Sinha and Tim Hocken from Google talking about Kubernetes. We got a little bit of the history and the why it exists. But one question that I always have is who is this for?

Starting point is 00:25:26 And what are some places where Kubernetes or this kind of infrastructure could really shine? And Parna, it sounds like you have a couple of stories for us to help us kind of understand how companies are using this and how it's useful for them. So go ahead. Yeah, sure do. So Kubernetes is for a variety of different users. I think that it applies to most environments, actually. But I'm going to give you a few examples, both from our hosted Kubernetes offering on Google Cloud, which is called Google Container Engine, as well as from the open source Kubernetes offering. So the distinction there being that the open source Kubernetes offering is deployed on customers' premises and in other clouds that they may, say, AWS to Google Cloud, from a VM-based infrastructure to something that's a container-based infrastructure. One of the examples that we did a webinar on, actually, I did the webinar about a year ago, is Porch.com. And they're a good

Starting point is 00:26:39 example because they're kind of a startup. They have a web front end. They're in a business where speed is really important, deployment speed in particular, and how they respond to customer needs. And what they did is they moved from a VM-based infrastructure where their utilization was quite low, actually. It was less than 10% because of quite a bit of over-provisioning, and this was on AWS. And they really didn't have reliable autoscaling. And they moved to a containerized architecture here on Container Engine. And one of the immediate benefits that they saw was, of course, the increase in utilization because they did and they were able to use horizontal pod autoscaling, what we call HPA in Kubernetes on Container Engine. But the biggest benefit that they noted was how simple it was to deploy, you know, their

Starting point is 00:27:29 application. And they noted that they could go from commit to in production in less than three minutes and that they would do so 30 to 50 times a day with no risk, with no risk and no downtime. And this is very important if you have, you know, a web front end or any kind of, you know, end user facing application. We found that the benefits of deployment speed are critical to the business. So that's one kind of customer.

Starting point is 00:27:58 And I think you can apply that to many situations. You can apply that to, you know, e-commerce and retail. We have many users that are retail and e-commerce users that use the Container Engine offering in addition to, you know, some of the analytics offerings that are also on Google Cloud. Another, I think, favorite story that we have here, you know, in the Kubernetes community and at Google is that of Pokemon Go. And I think that that's also a great example, because it applies in general to gaming. We do have many gaming vendors that use Kubernetes. And there, you know, you don't know really whether a game is going to be a hit or not. And having flexible infrastructure that can scale up and, you know, as you grow and then scale down, you know, when you need to, is really

Starting point is 00:28:42 very cost effective. And also, you know, doesn't stunt your growth when you need to is really very cost effective and also, you know, doesn't stunt your growth when you need it the most. And I think Pokemon Go definitely it's, you know, it's the fastest growing application to a billion dollars that, you know, the world had ever seen until then. So Kubernetes is behind scaling that up. Yeah. I mean, Pokemon Go is beyond successful. It was a phenomenon last summer. And even with Kubernetes, they had some issues, Yeah, I mean, Pokemon Go is beyond successful. It was a phenomenon last summer. And even with Kubernetes, they had some issues, right? I mean, they were scaling, but at a certain point, your demand is just increasing at such a rate that all the preparedness in the world may not be enough. Yes, yes, there were definitely, you know, some issues that they had,

Starting point is 00:29:21 because they weren't really prepared for they had not it was well beyond what they had anticipated but from a technical perspective um you know we were we were able to help them meet that demand and in time to continue the growth were they coordinating with you before release obviously i mean there's some dev time there so how close was the relationship of like them planning the need for scale or even the potential need for scale and the infrastructure be built on yes there was communication beforehand um and you know the truth is that there was no it beat all forecasts and i think i think actually tim you want to add here you were also involved in the in the uh pokemon go story yeah i mean it was sort of funny because I'm not a Pokemon fan.

Starting point is 00:30:08 I'm just a little too old for that. And, uh, I came into work and I was all over the radio is all people were talking about all the interns were talking about. And I was, gosh, I'm so tired about hearing about this Pokemon thing. And I said, well, you know, it's running on us. Right. And I said, Oh, Oh, Oh, keep talking, please. So that was the same day that we got a request from our customer reliability engineering team, CRE, that Pokemon would like us to engage a little more and help them figure out how to scale better using the Kubernetes product. And so that

Starting point is 00:30:41 day we started getting involved and answering some of their questions and helping them figure out how to manage their deployments a little bit more effectively. They were absolutely pushing the limits of what we were offering right out the door. I mean, this was a pretty early Kubernetes release and they were pushing the bounds of our max cluster size. And, you know, there was a million things that they were doing that we said, oh, goodness, you know, we say that that works. And that's supported, but you are you are right up against the edge. And, you know, they wanted to go bigger. This was before the Japan launch, right? This was before most of Europe had been turned on. And so we engage with them to help figure out how to manage their their Kubernetes resources. And it was actually really an amazing experience because their team is also super, super talented.

Starting point is 00:31:27 But we would have all our calls late at night because their usage was relatively low and they were making live changes to the system. And at some point we had to ask them to slow down on the changes because it was just so easy for them to make changes through Kubernetes. And we had to let things settle

Starting point is 00:31:42 and figure out what the right ratios between front ends and back ends were. So it was an interesting opportunity for me to watch somebody actually use our product. Yes, that's a great story. Kubernetes, like you said, is being deployed. It seems like piecemeal is not the right word, but Google section by section are slowly into Google because Borg has been running maybe you know, maybe Google wide for a long time. But perhaps Pokemon Go was stressing Kubernetes, like you said, obviously a lot, but maybe more than you assume what Google's making this. So

Starting point is 00:32:15 this is Google, right? This is web scale or whatever the word is. Google, it could handle Google type traffic, but really hadn't been tested maybe as much as it was with Pokemon Go. Is that fair? That's absolutely true. I mean, you know, one of the hard things about working at Google is you can't launch anything until you're ready to handle a million users. And we don't have that same requirement with Kubernetes. And if we had that same requirement, we never would have shipped. And so, you know, we went out very, very early in terms of capabilities. Our first, the Kubernetes 1.0 supported a hundred nodes, right? A hundred nodes is barely a test cluster inside Google. And so by the time Pokemon came

Starting point is 00:32:56 in, we were supporting a thousand nodes and they were pushing the limits of what we could support. And we were worried about what they would need when they turned on Japan. How did that work out when they turned it on? Was it ready for it? Oh, yeah. And the Japan turn up was really smooth, actually. By the time Japan came online, we'd worked out most of our major issues.

Starting point is 00:33:16 We found the right ratios and figured out how to defeat the denial of service attacks. And in fact, I think I was at San Diego Comic-Con when they turned Japan on and it was just a non-event. Nothing happened. Wow. That's right. That's the way to be right there.

Starting point is 00:33:31 And we were at 2,000 node clusters at that time. And we are now, just as of our latest release, we're at 5,000 nodes. Okay. Wow. That's very cool. Well, it certainly ties into the three big features you have on the website.

Starting point is 00:33:43 You can't go to Kubernetes website, which is kubernetes.io without seeing planet scale, never outgrow, run anywhere. Right. So that's, this is like your promise to anybody who says, I want to use Kubernetes. And we can get into the details of how this actually plays out. How do you actually achieve these features? So, I mean, there's the scalability issue there. We promise that people won't outgrow it. Now, 5,000 is a lot of machines, but it's not,

Starting point is 00:34:09 you know, it's not a super lot of machines, right? It wouldn't satisfy the Googles and the Twitters and the Apples of the world, but those are not really the market that we expect to be adopting Kubernetes, at least not whole hog, not yet. So I think if you look at a histogram of number of machines that people are using, 5,000 is well into three or four nines territory. And so we're trying to address those people. So it is designed for that. And we offer 5,000 nodes with a pretty good service agreement that we test the API responsiveness and we ensure that it's less than our acceptable time so people can use it at scale and not be disappointed in it, right?

Starting point is 00:34:56 We could probably tell people that it works at 10,000 nodes. It just won't work well, right? So we test it, we qualify it at 5,000 nodes, and we have tests that run on a regular basis that tell us, hey, you're falling out of SLA because somebody has a performance regression. So this SLA, though, is at the Google Container Engine level, not so much Kubernetes itself. Is that right? Kubernetes itself, we say that for this particular archetype of workload, you will get this level of service from our Kubernetes API server, whether you're running open source or on Google Cloud or on Amazon or on your bare metal. This is what we're shooting for to say that we support 5,000 nodes. Yeah.

Starting point is 00:35:34 Gotcha. Google Cloud offers an extended SLA on top of that. Gotcha. Yeah, the 5,000 nodes is an open source and Google Container Engine number, and it's measuring two SLOs, one on API latency and another on pod startup time. And both of those are fairly stringent SLOs. We do have users outside of Google Cloud that run larger clusters. But I think one thing that I would point out is that when I've talked to the largest of large customers that are interested in using Kubernetes, usually they're looking at multiple clusters. And I think that's also part of the planetary scale aspect. They're looking at,

Starting point is 00:36:18 say, having multiple regions that they want to have a presence in because they are a global company. They have users. I mean, Google obviously is a global company and any one workload may be running out of multiple regions so that they can have lower latency to the users in those regions. And so that typically, certainly in a cloud context, involves multiple clusters. So you may have a cluster in the Asian region, you may have a cluster in the Europe region, you may have another one in North America, each of which could be as large as 5,000 nodes. Typically, actually, I see less than that, you know, one or 2,000 nodes for the large customers. And then, you know, spanning that workload across multiple clusters, which may or may not be managed together. So you

Starting point is 00:36:59 could manage them independently. Or we also have what is called a cluster federation where you can manage them from a single endpoint. Yeah, I was going to ask if that level of scale where you're not just scaling out nodes, but you're actually scaling multiple clusters regionally, if that's seamless with Kubernetes, it sounds like there's a couple different ways of going at it, but there's some configuration involved in getting that all working together. Yeah, we're working on the federation product. You know, we've got a team in Google and in the community who's working really hard to make Federation maybe seamless is two grand, but to make it really easy for people to say, look, I want my application

Starting point is 00:37:35 deployed on three continents and I want it in this particular ratio based on, you know, a customer base and you go manage it and make it so, and you know, if something fails in one zone, make sure that it, you know, overflows into another zone. Um, and that's what the Federation product does this today. And when it, you know, when it works, when you understand the bounds that it operates in, it's pretty magical. Like I've got to say, uh, it does things that actually we don't even do inside Google. A lot of this is done manually inside Borg because Google likes to have a lot of control. Again, this goes to like changing the constraints and for people in open source, just put me one in Europe and put me one in the US. That gives people a huge win in terms of latency, right? Yeah. Yeah. The other reason that I've seen

Starting point is 00:38:20 some of our customers, and I was going to mention Philips actually as we were talking about use cases. Philips is an IoT customer, the Internet of Things. They make these IoT-connected lights. And I was going to mention them because a lot of European companies have sort of a data locality, or they want to keep their users' data certainly for Europe, within Europe, and then they want to have another cluster in North America, certainly for lower latency, but also because they want to keep the data there. So there are multiple reasons why our users tend to want multiple clusters. So we're talking about Run Anywhere as one of your main features on the homepage. And this is Run Anywhere regionally or around the world,

Starting point is 00:39:06 but somebody pointed out, you know, it doesn't just mean that. Also run on any cloud infrastructure, right? So you can now, you don't have to cloud, you don't have to code specifically against Google Cloud or against Amazon. You're operating a level above this. And so now we've decoupled from our cloud provider.

Starting point is 00:39:22 It gives us choice. Doesn't that commoditize Google Cloud as a product? And really, you know, don't wouldn't you prefer vendor lock in as opposed to what you provide and everybody, which is the freedom of choice? Can I go first here? Like, I want to touch upon, you know, the run anywhere, because just stepping back, the promise of containers, one of the promises of containers is portability, right? Is that, you know, your application is no longer tied to a particular hardware or to a particular hypervisor or to a particular operating system, you know, to a particular kernel.

Starting point is 00:39:55 And so you can actually move it from cloud to cloud. You can move it from on-premise. You can move it from your laptop to the cloud. That is the promise of containers. However, if you don't have a way to manage containers in production, that also is portable. So the container manager also has to be portable in order for that promise to come true, right? Otherwise, that promise just sort of breaks down. And so when we say that Kubernetes runs anywhere, we really are referring to that aspect of portability, that your container orchestrator, the thing that manages your container environment and production can run in all of those environments.

Starting point is 00:40:33 It can run on your laptop. It can run in your public cloud of choice and in your private cloud of choice. say, you know, it's not necessarily zero work to move your system from one cloud to another, but your applications and your services that are designed to run on Kubernetes will run on Kubernetes anywhere. That's right. You know, we spend an enormous amount of time making Kubernetes not couple to the underlying cloud provider. And the reason we do that is we hear that this is what people want. This is what customers are asking for. And so something that was coupled to Google Cloud was just not going to be a winning product. Where winning here, I think, really means ubiquity. So to make it a really ubiquitous system and a thing that people can assume exists, it has to be viable in all sorts of places. So we personally spend time making it work on our competing clouds. We have partners in our other public clouds that work on Kubernetes.

Starting point is 00:41:28 We also spend time making it work on bare metal. We help partners and other companies do things like support enterprise customers on bare metal. The idea being that if you write your application to the Kubernetes APIs, then you can pick it up and you can move it wherever you want. And that's real choice. The flip side of that is it is a ton of work from an engineering point of view to make sure that all of our APIs are decoupled, that we don't add things that aren't practically implementable on other platforms. And so these are things that we consider every time somebody sends us a pull request. Yeah. And, you know, I think from a strategy point of view, you know, we want to be where our users are. And I think if you look at infrastructure spend today, you know, the vast

Starting point is 00:42:12 majority of it is on premise. And so we want to make sure that we're building a product that meets users where they are. And that's been the philosophy with Kubernetes from the start. You know, if we meet them where they are, we provide them the best infrastructure, then they'll naturally come to us. Well, we love it as end users as well, because what it does for us is it puts the, really the vector for competition where we care about it for these different cloud providers. So they compete on things like price and performance and reliability and all the things that we want out of a cloud, right? And they're not competing on this particular API, which the other one is lacking because we don't care.

Starting point is 00:42:52 Yes, that's exactly right. So I think the place where Google Cloud competes is we have the fastest VM boot times. We have a very impressive global network. We are doing deep integrations with the underlying network and Kubernetes, our, you know, our hosted container offering. So we have the best, I think, you know, the best price performance. And you can use preemptible VMs.

Starting point is 00:43:18 You can use custom VM shapes. You can use, you know, continue to use discounts and so forth. All of them on our container offering. That's a good place to take our next break then because we want to dive a little further into things like architecture, which is a long subject, I'm sure. So let's take this break. When we come back, we'll dive further into K8S's architecture.

Starting point is 00:43:42 BRB. So your application sits on layers of dynamic infrastructure, supporting services, microservices, everything. And our friends at Datadog bring you visibility into every part of your infrastructure. Plus they have this thing called APM for monitoring your application's performance, dashboarding, collaboration tools, and alerts that let you develop your own workflow

Starting point is 00:44:13 for observability and incident response. Super cool stuff. And Datadog integrates seamlessly with all of your apps and your systems from Slack to Amazon Web Services. So you can get full visibility in literally minutes. Go to changelog.com slash Datadog. Get started.

Starting point is 00:44:35 Integrate. Set that all up and they'll send you a free T-shirt. If you haven't tried Datadog yet at your company or your side project, once again, go to changelog.com slash Datadog. Start for free. Get a free t-shirt. Support this show. And, of course, huge thanks to Datadog for being the sponsor for the show. And show your support to us by checking out them.

Starting point is 00:45:03 All right. Back to the show. And we're back talking about K8S. And I said in the break, I was dying to say BRB and you can't follow K8S and go into a break and not say BRB. So I did it.

Starting point is 00:45:18 So thank you very much. But bucket list, check that off your bucket list. There you go. So Tim, a part of where we're back talking about Kubernetes and, you know, one thing that is probably hard to do audibly, at least on a podcast like this, is to describe architecture. So, Tim, how often do you get asked this and can you do it? You know, on pure podcast, it doesn't come up that often because I gesticulate wildly and like to sketch on whiteboards, but I'll do what I can.

Starting point is 00:45:45 Well, let's say this. We will include, so there's a nice diagram even just on Wikipedia. So we'll put that in the show notes, which does lay out a few of the pieces. Hopefully it's correct. It is Wikipedia, so it could be wrong. But as you're talking, we'll assume the listener can at least go look at the show notes and view that and get some visual while you go through. Unless there's a better version of it somewhere. You know, I think we've been working on a new architectural diagram, but I don't think it's ready for prime time yet. All right. Well, we'll have to use that one and you'll have to smooth over all the rough edges.

Starting point is 00:46:17 Go ahead. Lay it out for us. All right. So Kubernetes is fundamentally an API driven system. So at the center of our universe is this thing we call the API server. And it is a REST server. So it's a web server with REST semantics. And we talk in terms of REST resources or our objects.

Starting point is 00:46:39 And those objects describe the various bits of semantics of our system. Those are things like pods and services and endpoints and load balancers and those sorts of things. Each machine within your cluster has an agent that runs in that machine. And that agent is called the kubelet, following in the vein of Borg, which had its Borglet, and Omega, which had its Omlet. So the Kubelet runs on every machine, and it is responsible for talking to the API server, figuring out what that machine is supposed to be doing, which containers is it supposed to run, and then activating those changes. The API server runs on what we tend to call the master. It doesn't have to be a single machine or a set of machines dedicated to this, but it's the most common way that people operate. So on the master are some other components that run alongside it.

Starting point is 00:47:33 One is the scheduler. The scheduler is just a consumer of our API. So everything we do consumes our API. There are no back channels. There are no secret APIs. Everything uses the our API. There are no back channels. There are no secret APIs. Everything uses the public API. So the scheduler looks at the public API and it says, hey, is there work in here that hasn't been assigned to a worker? If there is, I'll go do that assignment. And that's basically all it does. And then we have this thing called the controller manager, which implements a control

Starting point is 00:48:03 pattern that we call controllers. And what a controller does is it says, I have a job. My job is to make sure that a load balancer exists for this service. And that's all I do. And I watch the API server and I wait for changes. And when things change, I wake up, I activate whatever the thing that I was asked to change, and then I go back to sleep. And periodically, I'm going to wake up and I'm just going to check the state of the universe and make sure that it exists in the form that I expect it to exist. And if it doesn't, I'm going to drive towards the declared state. This is the sort of declarative model. And so the controller manager implements all of the most common controllers that we have for our system. These are things like resolving endpoints for services

Starting point is 00:48:47 or managing nodes, making sure that nodes are healthy and doing health checks there, making sure that, I mean, the scheduler itself is a controller. And so all of these pieces wake up and they're always talking to the API server constantly. So if you were to watch the API server's logs,

Starting point is 00:49:03 you'll see that it's constantly busy just answering these requests for get these objects, put these objects, update this, patch that. When you use our command line tool, kubectl, it does the exact same thing. It talks to the API server, it gets objects and it prints them for you. It takes objects from your local files and puts them to the server, which then creates new things for the server to do. So, for example, the most common thing people want to do is they want to run a container. So we call that a pod. And a pod is a small group of containers.

Starting point is 00:49:34 So as an end user, you can say something like cube cuddle run dash dash image equals Nginx. And I'm going to just go run an Nginx. And I'm going to just go run an Nginx. That will generate a blob of JSON or protobuf that we tell the API server to post to a particular URL. And the API server says, okay, it validates the input. It writes it there and it says, I now have a pod that I've created. Here's the unique ID for that pod. Now the scheduler wakes up and says, oh, hey, look, pods have changed. Let me figure out what to do wither wakes up and says, oh, hey, look, pods have changed. Let me figure out what to do with this pod. It says, okay, well, I'm going to assign that to node number three. And it adds a field to that object that says, now you're bound to node number

Starting point is 00:50:14 three. Now the kubelet on node number three wakes up and says, ooh, my information has changed. I'm supposed to go run this container. So it pulls that blob of JSON or protobuf. It looks at it and says, oh, I'm supposed to run NGINX. Okay, cool. I will do a Docker pull of NGINX. It will then run NGINX and it will watch it. And if NGINX ever crashes, it'll restart it. And there's a million other things that are built into the pod specification, like health checks and readiness probes and lifecycle hooks.

Starting point is 00:50:43 And we'll do all of those things. But the basic idea of running a container is pretty straightforward. You post what you want to happen to the API, and all these other pieces wake up and collaborate and make things happen. And I think the really cool part of this architecture is that it's always easy to figure out what the user wanted to happen because the state of things is declared.

Starting point is 00:51:06 I want there to be a pod, and I want it to be running Nginx. And I can then wake up periodically and say, is that not true? If it's not true, let me make it true. And you don't have to worry about, well, a command got lost or a command got sent twice because it's declarative. Well, first let me say that was excellent. You sold yourself short because I'm tracking everything. Of course, I'm staring at this diagram, which makes it very easy to track. So it's at least currently still up to date. So check out the diagram as he explains

Starting point is 00:51:36 this, if you're still confused, rewind and stare at it. But interesting, I was wondering, because you said, I want an Nginx. And so that I was like, was like okay how do you know how does it know what an engine x is and you said it pulls from docker so the images are are all docker images underneath that's right that's right okay and they could be in any repository i mean there's a public repository or you could be using google container repository gcr all right or you could or you could use a private registry or you could use the ones that amazon ships or you could use quay.io and there's there's dozens of these offerings that are docker registry compatible and we'll work with all of them we have a way to specify credentials so that you can pull from a private registry and use private images yeah and then i also noticed

Starting point is 00:52:21 that you know we're talking about the architecture kind of the underpinnings here and i just love to see when there's other open source things that are involved because it is, you know, nobody's building these things in a vacuum. And you have etcd being used for service discovery, which is a highly lauded tool out of the core OS, which is very cool. So you're pulling together things, Docker, etcd, of course, all this custom stuff as well. At the end of the day, it makes very much sense from a command line, but surely there's some sort of declarative way that I can define. Similar to a Docker file, is there a Kubernetes file where I can say, here's what I want it to look like, and I can pass that off and it builds and runs? Yeah, absolutely.

Starting point is 00:52:59 The specification of our API objects, in the end, is just a schema, right? And you can use things like OpenAPI, which is a specification, the follow-on to Swagger. You can look at that for sort of a meta description of what the schema is. And in the end, you can write yourself a JSON or a YAML file that you just check into source control. And that is literally the exact same thing

Starting point is 00:53:21 that, you know, kubectl run does, is it generates that same blob of JSON and sends it. So the command line gives you a bunch of imperative sort of commands for humans. But if you were really running this as a production system, what I would do is write the JSON or the YAML blob, check it into your source control, and then activate it on the server. So there's a separate command called kubectl apply, which says take this blob of JSON in a file or, you know, you can actually have one file that has multiple resources in it. You can also point it at a directory. You can also point it at a URL and go apply this configuration to my cluster.

Starting point is 00:53:55 Make it so. I wanted to call it kubectl make it so, but the Borg analogies ran dry. It's like a like a wish machine, basically. And hopefully the wish can be commanded. And that is how most customers run in production. Right. Yes. Well, it makes sense to kind of feel things out.

Starting point is 00:54:13 Yeah. Feel things out with the command line, write tests and develop. And then once you have it figured out, then you do it with the apply method. Right. And in fact, we run our own website and some of our own tools on Kubernetes and we publish our YAML so that people can look at how we run our own stuff. And I think that's sort of interesting for people. And we've, you know, it has given us a sense of exactly what people are up against when it comes to things like certificates and it comes to things like

Starting point is 00:54:39 canarying. And so we've tried a bunch of different techniques for you know how we think best to use our own stuff yeah you know what would be really cool is if we could actually do a demo on the podcast which is going to be much harder yes that is much harder but hopefully we're at least uh getting people's interest peaked enough that they can go out to kubernetes.io and watch videos or demos or um join a i'm sure you guys have web web axes or whatever whatever those things are called uh web webinars webinars thank you uh a part of saying she just gave one a year ago but do you give them yes we have we have many in fact there are many that tim has done he has several there's a whole webpage, a YouTube webpage of his demos, but we

Starting point is 00:55:28 have, we have hundreds. Gotcha. So do you keep a sort of like a log of, I guess, YouTube is your log of past done webinars? More or less. If you YouTube search for Kubernetes, you'll find me, Brendan, Brian, Joe, who are really the founders of the project, but you'll also find luminaries like Kelsey, who have done all sorts of really cool talks, generic 101 level sessions, all the way down to deep dives on networking and storage and these other topics. They all tend to get posted to YouTube. I also post all the slides for talks that I do on my speaker deck. So people who, if they want to follow along, they can go look at my speaker deck and click through the slides on my speaker deck. So people who, you know, if they want to follow along, they can go look at my speaker deck and click through the slides on their own pace.

Starting point is 00:56:08 Very good. Send us that link and we will put that in the show notes as well. Lots of places to get more information for sure. Real quick, before we get into community and kind of the getting started, we talked about scaling up, of course. That's what Kubernetes and nodes and clusters is all about.

Starting point is 00:56:24 Planetary scale, web scale, whatever you want to call it. The ability to not have to rush and have the ability to scale when you need to, which on the web you rarely know when that is. How well does it scale down? So what I want to know, we talked about some of your great users, but when is it too small for it to make sense? It's too much overhead or too much work to use Kubernetes. Or if you haven't learned it yet, if I'm running like a WordPress site with maybe one or two servers, maybe a database and an Apache or something, is that too small for me to take the time with Kubernetes? Or are there still wins at that small scale? So there's two things to scaling down.

Starting point is 00:57:03 So, I mean, obviously, I think when we're talking about auto scaling, I think you were saying it auto scales up very well. It also auto scales down just to clarify. Um, but then your main question is around what's a, what's the smallest scale deployment that you would recommend. And we, you know, we even have users that are using one node. Um, typically you have one node, you don't really need an orchestrator. But, you know, two to three nodes, you know, you start to need, you know, depending on how many pods you're going to have, it makes sense to have an orchestrator or makes sense to have a management framework, especially if you're running things in production. You know, if you just have one node with some number of containers, then yeah, then maybe it doesn't make sense to use Kubernetes.

Starting point is 00:57:46 But I think anything beyond that. So from my own point of view, I actually do run my personal stuff on a single node cluster. And I do it because the command line tools are convenient and easy to use. And I know them very well. And so it just makes sense for me. But yeah, I mean, one node is sort of the boundary where it maybe doesn't make as much sense. Not just OneNote, but do you ever foresee being beyond OneNote, right? So if it's like OneNote forever and you don't know Kubernetes yet, then maybe don't.

Starting point is 00:58:15 But if you know it, it's easy enough anyways. And if you see it be growing beyond that, then there's at least some sort of advantage of getting that set up. Now, whether it's worth the time tradeoff, that's up to the individual circumstances. But that makes a lot of sense. Maybe a reverse version of this question might be not so much how small should you be, but how much do you need to learn? So if you have just one node, sure. OK, great. But do you know about containers?

Starting point is 00:58:38 Do you know about this? What are the components that you should know about or have be willing to learn about to run Kubernetes? Good question. Yeah. Yeah. By the way, I do want to say that there are many users that are using OneNote. And by the way, on Google Cloud, OneNote is, you know, we just introduced the developer tier. And so getting a OneNote cluster is actually doesn't cost you anything.

Starting point is 00:59:02 So you can set that up for free. There's a win. Yeah So there's a win. Yeah, there's a win. I think in terms of learning curve, there are two things. If you are deploying a cluster by yourself, you know, on premise, there is, I would say, you know, a set of things you need to figure out, you know, how to deploy it, depending on how you do the deployment, it can be very easy. Or if you're doing something custom, it can take more time. As far as using it, if you use it in, say, a hosted environment, or you're not the one who's actually doing the deployment of the cluster, I would say that it's quite easy to use. You need to understand the concept of a service, which is going to be part of your application.

Starting point is 00:59:48 And you need to think about, OK, how do I containerize my service? But that's something that you need to do if you're going to use Docker containers. So either way, you need to do that. But beyond that, and I guess I'm biased, but the set of concepts in Kubernetes are fairly straightforward. It's, you know, it like, I've got my application. It's made up of a few services. The services are going to run in pods. And I'm going to write a set of declarative YAML that is going to say, how do I want to run my service?

Starting point is 01:00:19 So it's fairly straightforward. I mean, I agree. When I decided I was going to move my family stuff over to a single node Kubernetes cluster, I spent a couple of days thinking about how am I going to take this running VM image that I had from years ago and decompose it into containers in a way that made sense for me to manage it, right?

Starting point is 01:00:39 But I didn't actually have to go that far. I could have just taken everything and jammed them together into one container and called it done. But I spent a couple of days and I ended up with a pretty neat design where i've got on my sql container and i've got a you know a php container that runs a little php site that we have and i got a i wrote a little container for web trees which runs some genealogy software and put them all together and now when i want to know what's happening on my server i have the same command line tool that I have available at work.

Starting point is 01:01:07 And if I want to update it, it's really just one single command line. There's no more SSH in and edit config files. It's a different way of working that I think works really well. That's interesting too, because anybody who's ever run WordPress knows that if you use MySQL, then that often can be the thing that falls down whenever your site gets a ton of traffic. So it's often somebody sees the page that says can't connect to MySQL server or something like that.

Starting point is 01:01:31 So having that in its own container and then obviously having that auto scale up or down or restart if it fails or respawn if it fails totally makes sense to have that kind of architecture. While it's overkill, it makes sense. Technically, it of architecture while it's overkill it's it makes sense technically it's not overkill right i get away with a very small vm and a very small mysql container but if for some reason lots of people wanted to start looking at my family genealogy uh then i would move it to a bigger vm on its own and none of the front end containers would even care let's talk about getting started then i mean we've talked quite a bit about how it works, the architecture, which again, great example of it because we followed and listeners, again, if you didn't go and check out the diagram, you should. But let's talk about getting started. What's the best way to get started? Is it simply, you know, is it a get pull? How does this work?

Starting point is 01:02:21 Well, I think Aparna and I are a little biased, but I think the easiest way to get started is to get a Google Cloud account and go to Google Container Engine. And with about six clicks, you have a Kubernetes cluster up and running that you can use the command line tools that you can download as part of our cloud SDK. You don't need to look at the code. You don't need to get checkout anything. You don't need to compile in order to use the thing. And, you know, that's the same environment that you get there that Pokemon's using. And I think that this is far and away the simplest way to do things. Now, in exchange for managed services, you give up some amount of control. And so if you feel like you want to dig into it more, you want to go under the covers, you can just get check out the current tree.

Starting point is 01:03:06 And you can run one of the scripts that's in our repository called cube up. And the default target there is Google Container Engine, but there's equivalents for other environments. Or you can go to a doc like Kelsey Hightower wrote this wonderful doc called Kubernetes the hard way, where he takes apart all of the things that our scripts do and lays them out for people to follow step by step by step. And you can take that and you can apply it sort of to whatever environment you want to run it in. Yeah, I think the best way to get started is on Container Engine. And I think I already mentioned the OneNode free tier. But the other way, maybe you're about to say this, Tim, is Minikube. Exactly. Minikube is our local development environment. So you can run sort of a mock cluster on your laptop and you get the same command line tools, the same API server, literally all the same code that's running.

Starting point is 01:03:55 But it runs on your laptop and presents you a virtual cluster. That tutorial you mentioned from Kelsey was actually talked quite a bit about on GoTime episode number 20. So if you're listening to this and you want to dive deep into some of the behind the scenes on that, Kelsey was on that show that day and kind of covered that and lots of other fun stuff around Kubernetes because he's obviously a huge fan, but it's called Kubernetes the Hard Way. It's on GitHub. We'll link it up in the show notes. Okay, so moving on from Getting Started, is that what we want to cover there? Is there a bit more on the Getting Started part? I've got to link out to Minikube as well, which is so funny that it's cube there, but it's not cube.

Starting point is 01:04:32 Minikube. Minikube. Minikube. Whatever. We're stuck on it. Yeah. Let's talk. You mentioned this is community run.

Starting point is 01:04:39 Okay, so this is a Google thing, you know, son of Borg, so to speak, but there is community behind it. How does community transcend the Google-esque stuff of this? Like having Google fund it, back it, support it. How does community play into this? Well, you know, it started as a Google thing. We initially, you know, we gave it the seed engineers and the seed ideas, but we had engagement, heavy engagement from guys like Red Hat, you know, very, very early in the process. And that's been really instrumental to the project. The project from the day it was launched was intended to be governed by a non-Google body. So we donated the whole thing to the Cloud Native Compute Foundation,

Starting point is 01:05:24 which was created to foster technologies like Kubernetes to bring the cloud native world to fruition. So Google does not own Kubernetes. Google still has a lot of people working on Kubernetes, but we do not own it. And we don't get to unilaterally decide what happens to it. Instead, we have a community, we have a group of special interest groups that own different areas of the system, networking and storage and federation and scheduling. And they make decisions sort of independently with respect to their technical areas. And the whole thing is community-centric.

Starting point is 01:06:00 We don't make decisions without talking to the community. And net, as of right now, I think Google is less than 45% of the net contributions. Wow. We actually might have something in the future around the Cloud Native Computing Foundation. We work closely with parts of the Linux Foundation. So we've got an opportunity to talk a bit about that. And it's interesting to see that Kubernetes is a part of that. Yeah.

Starting point is 01:06:26 Yeah, we're excited to be joined by lots of other cool projects there like Prometheus and Rocket and ContainerD that are really embracing the idea of this cloud native world where things are dynamic and the idea that you know concretely what everything is doing at any moment in time sort of fades away and the system runs itself. It's also interesting to see that you care enough or you're diplomatic enough, maybe might be a better word, to hand this over to essentially the community in that sense of the word, you know, like being a Linux foundation sub thing with the Cloud Compute Foundation. It shows that Google wants to play a role but not own.

Starting point is 01:07:11 I think this was part of the ubiquity argument too. We had to do this. If this remained a Google thing forever, there would be a lot of mistrust. You know, Google has something of a track record for changing directions midstream. And we wanted people to be comfortable knowing that they can bet their business on this. And that even if Google did walk away from it, that the thing still exists and has a life of its own.

Starting point is 01:07:36 And I'd say that part of it has been fantastically successful. The fact that we're the number one contributor, but we're not the majority. And in fact, the number two main contributor, I think, on the pie chart is unaffiliated or independent. So, yeah, I think this was just a requirement of being in this space at this time in the industry. Yeah, I would say this is not all altruistic, right? We actually believe that in order to reach our audience, again, the audience is all the users on premise and across clouds. Being open source is extremely important for that and developing a community that works with users on premise and bare metal, you know,

Starting point is 01:08:21 across the world is instrumental, right? Like the, the way that we develop the product, um, is not just from Google. It's through the community that is working actively with the users. And often it's with the users themselves. So, you know, box is a great example. eBay is a great example. New York times. These are all users that also contribute back to the project. And so that makes the product stronger. And that's not just, you know, that's not altruism on the part of Google. That's, I think that's, that's strategic. I like the two, that your perspective is about trust, which is super important, obviously, to put the thing that runs your thing, that that be the thing that is most trusted. You need to

Starting point is 01:09:03 have that full out, you know, 100% trust, beyond 100%, 110% trust. That's an interesting perspective, but this is a good place to leave it. This is a great show. History, how it works, architecture. We couldn't have asked for more

Starting point is 01:09:15 to come out of this show. So thank you, Aparna and Tim, for joining us today. Any final closing thoughts before we close this show out? Kubernetes is an open system. I always end every talk I do with the same slide. We're open. We're an open community. We want to hear from people. If you have an idea, don't think that you can't bring it

Starting point is 01:09:35 to us. Please come and share with us what problems you're trying to solve and we'll see what we can do. That's awesome. Well, thank you again so much for joining us today. That was an awesome show. Thanks. Thanks guys. The change log is produced by myself, Adam Stachowiak, and also Jared Santo. We're edited by Jonathan Youngblood. Our music is produced by the mysterious break master cylinder.

Starting point is 01:10:03 You can find more episodes like this one at changelog.com or by subscribing wherever you get your podcasts. Our bandwidth is provided by Fastly. Head to fastly.com to learn more. Of course, huge thanks to our sponsors,

Starting point is 01:10:19 Sentry, Hired, and also Datadog. Thanks for listening. We'll see you again next week.

The Changelog: Software Development, Open Source - The Backstory of Kubernetes (Interview)

Tim Hockin and Aparna Sinha joined the show to talk about the backstory of Kubernetes inside Google, how Tim and others got it funded, the infrastructure of Kubernetes, and how they've been able to su...cceed by focusing on the community.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.