Programming Throwdown - 157: Kubernetes with Craig Box

Starting point is 00:00:00 Hey everybody, here to talk about a really, really important topic. We've talked about Docker in the past and containerization. But one thing that's really important is how do you run this container? How do you run it at scale? How do you get these containers to talk in concert with each other? How do you replicate a lot of these things? And so your Kubernetes is something that Patrick and I both use day to day. I think I probably have to use it a lot more than Patrick does. It tends to be more of a cloud thing and less of an embedded thing. But it's extremely, extremely important being able to do all of that, orchestrate all of that and know kind of how that works. It also there's a large tie in to these big cloud providers.

Starting point is 00:01:13 They do a lot of heavy lifting for you. And so that makes it kind of a bit of a steep learning curve. And we're extremely fortunate to have Craig Box, VP of Open Source and Community at Armo, and one of the people who was there on day one of Kubernetes at Google here on the show. So thank you so much, Craig, for coming on the show. Thank you both for having me. Cool. And so before we do a deep dive into Kubernetes, why don't you give us a quick background? What did you study in school? You know, what happened after that?

Starting point is 00:01:51 And what was sort of the path that led you to being on the initial kind of Kubernetes team? Yeah, so my dad bought home a VIC-20 when I was under five, I guess. And back in those days, obviously, it was if you want to play a game, you get a book out and type the game in. And that never really led to a love of programming for me is an interesting thing that I found is that as I moved on through computers and PCs and so on through my teenage years, I was always interested in making the computer do things and tying things together and running bulletin boards and all that

Starting point is 00:02:19 kind of thing. But I never really had a love for programming, for actually asking the computer to do new things. It was something that I studied to some degree at university. I did a computer science degree, but I always found that the amount of effort that it took to really learn and remember the things that you needed to know to get really good at something, it just wasn't really my strong suit. And I was much more interested in being in the outside world and seeing people and so on. And so I ended up working in IT consulting roles, doing a bunch of system administration stuff, getting early on into what I think would become DevOps, so using programming concepts and having that kind of background and applying it to systems administration and so on. Moved to Canada and worked at a software

Starting point is 00:02:59 company there for a couple of years doing installation and deployment. So where did you grow up then? Probably somewhere in Europe? No, no, somewhere in New Zealand, right? The other end of the world. Oh, okay. Wow. Very cool. Through a turn of events is where I end up again today, but in the down under. Now, I seem to remember, I mean, it's probably not true anymore, but I seem to remember that New Zealand, that Oceania, if I'm saying Oceania, but it has a really poor internet and so you have high latency and lag spikes and all of that. Is that true? Is that kind of like a whole different experience being on the internet? And what's it like today? Is it just homogenous now? The university that I went to, the University of Waikato, was the first place

Starting point is 00:03:43 in New Zealand to be connected to the internet, I believe we had a 24 or 4,800 BPS connection for the entire internet and country at one point. It has got substantially better, but fair enough to say it's far away. It's a long distance and there are cables now. There are decent fiber cables running from New Zealand to Australia and Hawaii and so on. But there definitely used to be a sort of a culture of like, what's mirrored locally. You can download stuff within New Zealand and that's free, but international traffic costs a lot of money. But as for today, it's basically we're chatting over the web today. We've got high def video and it's all flat rate. And so we've caught up, but it is in a lot of ways, New Zealand's distance from the rest of the world can, it's made it its own economy.

Starting point is 00:04:26 One thing I remember was my father worked for, he's, my father's recently just finished 50 years working, or he hasn't finished, he's still working, but he's been 50 years at the National Telco of New Zealand. And so I used to go around with him when I was young and he'd go and look at the little telephone exchanges and things. And there was a rack of equipment in one of them that I remember asking him about when I must have been under 10. But it was what we call over here EFTPOS, which is basically debit cards

Starting point is 00:04:51 and being able to do electronic payments and so on. And that was actually tested in New Zealand because they say New Zealand's got this population of people that's sort of a bit disconnected from everybody else, but they're really forward-looking and they're really interested in trying new things. So sometime in the 80s, that was when they were rolling that out they they tested that first

Starting point is 00:05:07 in new zealand just to see if the technology would work well that's super cool you know patrick and i worked at a place on on the east coast where um when you were there for 50 years you got a gold badge um was that was that actually gold like was there even like a laminate of gold right there it's just colored gold no i think it was just a color okay i asked my dad if they gave him a gold watch and they've given him a him and mom a weekend away so that's uh they're gonna do that sometime probably more useful after 50 years he's earned it yeah these days well like if you gave me an analog thing with hands and said what's the time on this i can't do that That's a skill that I never really mastered.

Starting point is 00:05:46 I can work it out. I can look at it and say, which hand is it or something. But our parents' generation, they'll look at it and say, it's quarter past 12 or whatever. It's an instant thing for them. But because we had digital clocks growing up, it was never really necessary, I found. Yeah. I mean, before the show, the pre-show, we were talking about our kids and and my uh my older oldest kid is uh studying time and they still use analog clocks to you know teach to teach time and so i had to go back and

Starting point is 00:06:11 kind of remember oh yeah when it's you know a quarter of the way through the clock that's 15 minutes and all that stuff i had i had one of those shower thought moments a while back i thought well if if humans had evolved with six fingers on each hand rather than five, how much easier would time and math have been? Because we've got all this stuff that's time and so on is divided by six and 12 and 360 and so on. And then everything to do with math is divided by 10 because that's how many fingers we've got. And if only those two things have been the same, what more could society have achieved? That's true. I wonder if the base 60 thing was arbitrary. I wonder what we'd have to do some epistemology to figure out what caused that. I have two different thoughts on that,

Starting point is 00:06:54 but we'll save them for another podcast. All right. Sounds good. Okay. So New Zealand, and then you went to Canada from New Zealand. And so your first IT job was in Canada. No, no. I worked in New Zealand for a few years. The thing about New Zealand is because it's such a small country, most of the people here work in a small business of some sort. There's not a lot of big companies over here. People who want to start big companies tend to go live overseas and not come back like I did to some degree. But they say that something like 95% of the population in New Zealand work for a company with 50 or fewer employees. And so there is an awful lot of small businesses, an awful lot of farming and agricultural and

Starting point is 00:07:35 tourism kind of stuff. Some tech these days, a few tech companies and startups, but it's definitely not a thing that people do. You do tend to go overseas if you want to go and work in the big industry. Got it. And so how did you make your way into Google? Was this like someone kind of reached out to you and said, hey, I know you're at IT Company X, Google's hiring. What did that look like? Yeah, it looked like a series of sort of, let's say, coincidences and so on. But I ended up working for canada for a couple

Starting point is 00:08:07 of years we moved there because we didn't want to go to england or new zealanders always go to england and so after two years in canada we're like right it's too cold here it snows we don't get hockey let's go to england and so i went over there worked for what was called the symbian foundation it's a play do you guys remember like the old Nokia phones and Symbian and S60 and so on? It was a very good operating system for low-powered devices that was overtaken by the fact that all of a sudden devices got powerful enough to run real code and you could just take Linux, Unix and so on and run it on a smartphone by around 2007, whenever the iPhone came out. And what Nokia had tried to do around that time was basically

Starting point is 00:08:45 open source the whole thing and put it into a company whose goal it was to share it with all the other vendors and to make it available to people almost as a non-profit. And that all went well until, of course, the iPhone and the Android came about. But at that job, I started doing a lot of cloud stuff. So I was getting all of our hosting infrastructure, setting that up in Amazon. All of our email and so on was with Google. We had Google search appliances. And so I was quite early on in doing public cloud things. And that led me to work at a consultancy for a few years, helping other people do cloud things. And we were partners with Google cloud at that point. So I knew a lot of the team there. And then after a few years in that role,

Starting point is 00:09:26 the Google business had sort of moved on to the point where it had a public cloud first of all it was the email the google apps and so on and then as they built out more cloud things it became sort of more in line with with what i was doing and it was sort of ready for me to join cool it makes sense so symbian so they open sourced everything. And just to follow that thread, what's the longest lasting thing to come out of that? Has someone taken something from that and put it into something else? Did parts of it make it into Android or anything that you know happened there? It was a strange sort of ahead of its time process, because I remember that it was a thing called the beagle bone which is like an

Starting point is 00:10:05 embedded board a little bit like a raspberry pi or a little bit like a the fpga development boards you get and so on but like the idea was that the release team there was going to try and take the symbian stuff and port it to a thing that everyone could play with and i didn't really get that at the time because it predated the raspberry pi and the whole industry of stuff that's popped up around that but it was really just how could we build the stuff so that it would be able to be used by Nokia and largely Asian manufacturers. There are a lot of people in Japan, Sony, Ericsson were still using Symbian at that time and so on. But then all those people started using Android effectively. So the entire market fell away. Nokia used it for a couple of years time.

Starting point is 00:10:49 Nokia sort of bought everything back in house. they didn't own the trademark anymore but they still for the last releases they did did their own stuff in-house and there isn't really any history to that or any legacy in some strange way it's odd because it just it exists it's still out there someone has mirrored the code that was released on github but i have no indication that it's useful today in a way that's perhaps different to things like the the scion pocket organizers that predated symbiont symbiont actually derived from the software that powered those and there is still a cult following of people who who like retro computer things who care about that but i guess that just the way mobile phones have evolved and so on, like you hear a story about someone buying a shrink-wrapped original iPhone

Starting point is 00:11:29 for $60,000 recently, but you're never going to hear of people going and using them for fun. People and the way that they'll go and play with their Commodores or Ataris or whatever, there's not really that for mobile phones. So it is sort of a bit of an evolutionary dead end. Yeah, that makes sense. I think the form factor, you know, it's incredibly convenient for when you're on the go, but for your leisure time, it's not that convenient.

Starting point is 00:11:52 So yeah, I think it totally makes sense. Maybe, you know, we're just not ready yet. You know, maybe 10, 20 years from now, there'll be like a huge group of folks who love like retro phones in the same way we love retro computers. So it might come. I'll hold out for that. Yeah, you've really got to think about the backend and the things that are required. I'm sure that's something that'll come up as we talk more about what propelled Kubernetes

Starting point is 00:12:16 success. All of those old computers I mentioned before are self-contained. You get a disk or you get a modern disk emulator system and you plug it in and the software is there. In most cases, it doesn't need to connect out to any kind of external service. There's nothing else you have to worry about. Whereas today, if you could get an old phone running, even if you get the old iPhones, some of them don't support the modern TLS standards required to connect to modern websites. Some of them don't have the APIs that are required to talk to the modern iCloud backend.

Starting point is 00:12:46 So you could get out your original iPhone, iPod Touch or whatever, but you probably can't use it as a web browser unless you go through some kind of proxy that's deliberately made to support that. It'll be even more true of the older devices. Yep, that makes sense. And then that's not even getting into the App Store and whether all of your apps now require a certain API.

Starting point is 00:13:07 And so I don't know how possible it is to even get an older version of an app. It's probably not possible or at least not easy. I will say where there is a will, there's a way. There was a device made by a Canadian company, which was sort of a cable modem attached or a cable TV network attached microcomputer in the early 80s. And they kind of resurfaced recently because someone had been hoarding hundreds of them, new old stock and boxes that they were selling now. And so all of the people who are interested in old computers are like, right, we're going to buy one of these. And so it turns out that they found some of the original people who built the backend service and were running it at the time.

Starting point is 00:13:48 And a museum in Toronto had done a bit of work to figure this out, or it was either a museum or university. But now there are people who are sort of reverse engineering everything required to make these things useful again and to get them going. So even these weird, obscure things, and I have to say, this is largely propelled by communities around places like YouTube. There are the YouTube people who are interested in old computers. They get these things, they unbox them, and then two or three of them, turns out, are interested enough in programming to find the right people, connect them together. And then all of a sudden, they move

Starting point is 00:14:18 on from being a museum piece to a useful thing. Not something you're going to use every day, but something we might talk about on a podcast every now and then. Yeah, that makes sense. Very cool. So Google is spinning up Google Cloud and you're kind of an early adopter there. And so I guess at some point, was it the kind of thing where you were such a power user that it got their attention or was it more of a personal connection? what kind of kind of led you into google i knew people who worked there and had worked there for years personally but then i also was as a working at a consulting company i was helping companies adopt google so i was helping customers get on board largely around email migration so you've

Starting point is 00:15:00 got someone who's had 80 000 users using mic Exchange, and there is some process and IT needs in order to get them migrated over to Google. And quite often there's a bit of programming and scripting required to migrate all their systems that were using those old APIs and so on. So the company that I was working at, I was leading the development arm. So it was, people were doing these workplace migrations and then they'd need to have utilities written. They'd need to figure out how they were going to integrate some new piece of software because the thing that they used to have with Exchange obviously doesn't work anymore if they're moving away from on-premises email and moving to cloud email.

Starting point is 00:15:38 So I worked in London at this time and I worked with the, there was probably only five or six people who were in the Google Cloud go-to-market team, the sales and sales engineering and so on. And there were a few more in other countries in Europe, but it was all still enough that we could probably have been on the same bus at the time. And I had, I don't remember if it was a recruiter reach out or anything, but I sort of knew that they were looking to hire and had some conversations with them and ended up finding a role there, which worked out well for me. Cool, that makes sense. And so, yeah, kind of walk us through,

Starting point is 00:16:10 you know, your experience with Kubernetes, you know, on day one. So what was that whole thing like? You know, what kind of led to that? And how did you get kind of roped into that? Yeah, so again, depends how you define day one. I got involved with Kubernetes when it was a internal Skunkworks project at Google.

Starting point is 00:16:28 And as I involved, so the first trip I did as a person working on the Google Cloud team in the UK, I went for a team meeting that was over in Seattle. And I remember thinking two things at this point. The first one was, we've got all these hexagon logos. Someone should really make a version of the game settlers of katan that uses all these cloud logos as hexagons and bring it out and so on no no one ever did that it would have been a great thing but the second thing was there were people around the

Starting point is 00:16:53 corridors talking about this thing called project seven this sort of container cluster idea thing that google was looking at releasing i think i should should look in on that so i spent some time looking in on that figuring out what was going to happen. And that was all development was all happening on the other side of the world for me. But the thing that we were doing to sort of help prepare customers and people to use it at that point

Starting point is 00:17:13 is as we see people, and the role I had was really talking to existing Google customers and figuring out how we could get them to do more with our products. And in large part, what we would need to do to make the products actually good enough for them to large part, what we would need to do to make

Starting point is 00:17:25 the products actually good enough for them to use, what things we would need to develop to make it suit their use case as we were building out the cloud. And so at that point, I was going to people and saying, hey, you should go and look at this Docker thing, but I can't really tell you why. Six months from now, hey, it's going to get really interesting. But it was really telling people that this was the pattern. This was sort of how Google would operate things internally. A lot of the thing that drew people to Google at that early time was companies who wanted to be like Google, especially around the email and communication thing. A lot of these companies saw Google as this great multicolored university campus, land of milk and honey or so on. It had what they perceived to be really high employee satisfaction and was this really cool

Starting point is 00:18:04 and innovative place. And a lot of these more state enterprise companies thought, well, hey, if we just change our email system to the same thing that Google uses, maybe we'll feel a little bit like Google and have a bit of that rub off on us. So technically I think it was sort of the same thing as like a lot of people thought, hey, we're going to use the same system that Google used. And I would not recommend that to people, obviously. There is a reason that these big organizations grow their own internal system that fits exactly the space that they have. But the thing that Google was doing was taking lessons that learned from that and trying to build something that was general and could be applied to everybody else. And that was really

Starting point is 00:18:38 when you hear people talk about Borg, which is the internal system at Google at the time, it wasn't just open source that thing and make it available to everyone it was build something new that met people where they were at the time so let me let's let's like i pause a little bit on this so i uh i vaguely remember borg so i remember there's uh something called borg control language it was like a dedicated language it was very similar to python i mean this is going back a decade so my knowledge is really fuzzy but yeah don't they like when you get assimilated they wipe your memory or something so you may not remember yeah borg was developed at the time when people had just obviously come off watching a very particular series of star trek

Starting point is 00:19:19 which had been on tv and so then when it's been in use for 10 or 15 years you've got all these new students coming in like what what the hell is this borg thing like i know what the software is but i don't get the joke yeah yeah exactly yeah and so i think yeah you'd specify everything in this borg control language and then it would go off and do that thing and so you know now that you know i'm kind of putting it all together it does it is similar so so was was kubernetes uh it was probably a total rewrite it wasn't the kind of thing where you forked it. No, Borg was written in internal dialect of C++ that Google used to its standards of a time and so on, and to fit a very Google-shaped problem.

Starting point is 00:19:56 It was an evolution of things that came together. A lot of the people who did that work and who worked on the subsequent system, Omega, which was sort of a second system attempt if you want to to rewrite it second systems don't go very well we've all read the book i hope and so a lot of that work basically went into making borg better overall so that's that's another interesting evolutionary dead end we can talk about but the some of those people were involved so you you hear names like um brian grant who was one of the leads of Omega, and Tim Hocken worked on the low level stuff in Borg, and Dawn Chen, for example, people worked on very early days in the Borg

Starting point is 00:20:29 system, were one group of people who were looking at how Google could solve the problem of making something like this available to cloud customers. And then on the other hand, you had the cloud team and people who had worked on various things in search and so on internally and moved on to the cloud project. And they were looking outside at the rise of Docker. They were saying, well, people are all of a sudden starting to do things a little bit like Google is on one hand. So they're using these container principles and so on, which were largely popularized, if not invented in part by Google to make Borg work.

Starting point is 00:21:04 And then there were also people on the outside who had built Borg-like things. People like yourselves who had been at Google gone somewhere else and said, hey, I missed that system, which made it really easy for me to deploy things at scale. I'm going to build a version of this. One of the most famous ones is Mesos, which was built by people who had interned at Google or had worked there and then studying at UC Berkeley, gone on to work at Twitter and so on. And they're like, well, we need something like this. And collectively, I think the Twitter people found the Berkeley project and said, all right, we're going to work together on this and build something that was

Starting point is 00:21:35 very similar. And so the team of people who were working on the cloud side were people like Joe Bader and Brendan Burns, and they were building proofs of concept and sort of sample projects and so on to say, all right, well, how could we do something based on Docker that worked in this kind of way? And really it was when those two teams came together, when those two groups of people said, all right, we can do things, we know how it was done at Google at scale, and more importantly, why it was done, then that can be made useful. And also I think the input of people who didn't ever use that system or people who knew what was going on outside. And like to say, Google at that point as a new entry into the public cloud infrastructure market, they really needed to make things

Starting point is 00:22:15 compatible with what people were using. And they couldn't just determine, here are the terms or dictate the terms if you want to say, you to use these apis and do things in this way that people like amazon could do at the time yeah that makes sense so these sort of great minds came together and formed project seven and at some point i guess was there like a meeting that you went in like a dark room shadowy figures you know altar in the center of the room and they they introduced your project seven no no the um as again you you two may have seen like the way you learn about things at google is you just find the design document and you follow along and and read what's going on and you join mailing lists and see what happens one thing i will say is that i was always very

Starting point is 00:23:02 separate from the teams at Google, especially in the cloud division who were doing all the engineering of this stuff. I was based in the UK for almost all of my career at Google. And the work that I did on this was largely like whenever people wanted to find out what was going on, how real customers wanted to do things or real use cases of this kind of stuff. But quite often the engineering teams would come out to Europe and they'd say, right, we want to talk to a bunch of customers. And I would know who I'd been working with and so on and connect these people together, bring a lot of feedback, do a lot of demonstrations and public events and so on and bring my own feedback. And that's sort of how that led me into developer relations and developer advocacy kind of roles. But then I ended

Starting point is 00:23:44 up, as we productized that, as we moved it from being this open source thing to being Google Kubernetes Engine or GKE, ended up in sort of go-to-market lead roles and helping out how we can productize some of this stuff and how we can really figure out what customers want and need and put together events. At KubeCons, for example, we used to do these customer advisory boards. We would get the really early adopters who were really keen on this and talk to them and figure out what we needed and then get the things which the next group of people would just find standard in a year's time and keep that process running yeah got it that makes sense so just so i understand the timeline so there there

Starting point is 00:24:20 is like docker compose did that come out out after Kubernetes or what's the story there? No, Docker Compose, I'm going to say it came out before Kubernetes. And Docker Compose is a way of saying, I want these three or four containers or however many to be co-located together on one single machine. So my application, my three-tier thing is made up of these three things, but they're in different containers. They all run together in one place. The thing that actually distributes them and runs them across multiple machines, there was a thing called Docker Swarm. And I have a suspicion that that came out at or around exactly the same time Kubernetes did, because in the early days, it was sort of announcements were made at Docker's conferences. There were three or four systems that

Starting point is 00:25:02 all came out around the same time some of them were in-house systems from various vendors spotify had one called helios for example that were again like hey we've got this great docker engine which allows us to put things together in containers now we need a way to manage the life cycle of them across multiple machines in the kind of way that we know that borg does this is before it was ever publicized or a big deal was made never to mention the thing by name outside for example but again lots of people are coming on through through google yeah i don't know if we're more worried about google's lawyers or star trek's lawyers at that point that's true got it cool okay that makes sense that kind of i've seen

Starting point is 00:25:41 docker compose i've used it occasionally but I've never written anything in it. It's more just if I'm looking at some open source GitHub project and they're using, I guess, Docker Swarm or Compose, whatever it is, but definitely I've spent the vast majority of time on Kubernetes. So yeah, I think maybe now's a good time to kind of explain to the audience, what is Kubernetes according to Craig?

Starting point is 00:26:04 What exactly is a kubernetes well yeah so kubernetes well for starters it's it's this weird greek word that no one can even agree on the right way to pronounce wait really but i didn't know that i mean it kind of makes sense yeah it kind of makes sense you know now that i looked at the phonetics so what is kubernetes the greek word actually is it a person's name or uh no it is this is going to blow your mind jason but it is if you think about the the word cybernetic or if you think about governance and governing something the word gubernatorial so these all come from the same greek word kubernetes which

Starting point is 00:26:45 means helmsman that means the person steering a boat so someone like you think of the command and control sense like governing something or or someone that is where the idea of cybernetics came from as well and that explains the uh the icon as well because the icon is if i remember correctly the icon for kubernetes is like a let's see if i can look it up it is like a a wheel like thing and yeah yeah i'm looking at it right now i mean this is terrible radio but take our word for it or go look it up the kubernetes icon is like a uh a seven-sided polygon is it's a It's the helmsman, like the helm of a ship, the steering wheel. No, that's the wheel, but I'm saying the blue part.

Starting point is 00:27:29 Heptagon. Okay. I didn't know that word. Heptagram would be the star. Yeah, so for folks out there listening, it's a heptagon, a blue heptagon with like a, yeah, exactly, a ship's wheel in white. Yeah. So the helm logo, the wheel, ship's wheel on the logo obviously refers back to the name. And the seven sides is the seven, the Project 7 codename of Kubernetes from before that.

Starting point is 00:28:00 Oh my gosh. How did I not put that together? So yeah. So actually the project seven where there are six other attempts because i mean there was borg there was omega where there are like four other attempts before they went to kubernetes do did you two watch star trek in the 80s and 90s i am not a trekkie patrick are you a trekkie yeah i did always understand the reference to borg but i missed that other folks missed it. Oh, wow. That was double. Oh, sorry. I never paid attention that people probably didn't know what it meant.

Starting point is 00:28:28 So that was probably an astute observation. It's not only the Borg thing. It's that, and I didn't watch it myself, but I think it was Deep Space Nine. They had this Borg character called Seven of Nine. Oh. So that's, that's where the seven thing comes from. The, the idea was, it was a, I think they pitched it, and I apologize for this, is a prettier version of Borg. Oh, I see. Got it.

Starting point is 00:28:51 Interesting. So now folks at home, the reason why Kubernetes logo is a heptagon is because of 7 of 9 from Deep Space Nine. That's amazing. Okay, so we got as far as the name. So sorry, I got us totally sidetracked there. So what is is it manages things that run in multiple places that are disconnected from each other.

Starting point is 00:29:28 And it works in the model where you submit to an API server what it is that you actually want to have the state of your cluster be. So it's a declarative system versus what you may have used before, which is imperative where you say, please make this thing happen. You say to the system, I would like this to be the outcome, and you leave it to decide. That was one of the things that really blew my mind about Kubernetes. And to this day, I wonder what happens when it can't diff something? What if you provided a YAML file?

Starting point is 00:30:04 Actually, just to backtrack a little bit so the way that this is actually implemented is you you provide a yaml which is similar to json you know one of these structured files you provide a yaml file to to the kubernetes api and say this is the state of the world and so it looks at the current state looks at your yaml file and the thing i was always wondering i'd love to get your take on it, is how does it know how to go from A to B? What if you just do something really, really different and it just can't seem to get you there? Well, if you think about the common use case again, so you have a Kubernetes cluster, which has an API server, and then it has a number of worker nodes. So it has

Starting point is 00:30:45 these places where it can ask computers to do things. You will give the API server an instruction that says, I want to run a web server. And you give that to the server and it says, all right, well, I know I've got this many machines because they will report to me regularly and say, here I am, and this is what I have running on me and so on you have asked for a web server you've specified that it needs to have two cpus and it needs to have 256 megs of ram so probably not going to be hard to find somewhere in the cluster to put this but you can get larger workloads and more of them and so on and then knowing what it knows about the state of the environment it'll say right i'm going to ask this node to start it over there and it'll send that instruction out and then

Starting point is 00:31:25 it will update the status to basically say I have scheduled this thing, I've asked it to run. It passes, actually puts an object there. When the node checks next time, it says there is new work available for me. I have been asked to do a thing. It will then start that running. If it can't start that running, it will update and say I couldn't do that. Perhaps you asked with a container image that you spelt wrong, and so it couldn't do it. It enters the state that it calls a crash loop back off, which is the most fun thing for anyone involved in Kubernetes to have to debug. But that's basically saying, I couldn't do this. I'm going to try again. And it waits a little while, and it tries again. And if it fails, then it waits a little while longer, and it tries again, and so on. so you can get into situations where things don't work but basically

Starting point is 00:32:07 it updates the status of the object to say i couldn't do that thing yeah yeah that makes sense yeah i think the you know the the schema of the yaml is is kind of like uh i'm sure there's some way where you can trick it or get it confused but but in general you're right i mean most of the time someone says here's a yaml and so it just creates a bunch of stuff and the next time around here's a yaml and like now i'm using two cpus instead of three cpus and so it's like a relatively straightforward diff yeah so there's three different parts to an api object in kubernetes so you have the metadata which is things like the name of the object, and then any labels that are used on it. Labels are important because that's sort of a way you do queries across things that can be more than one of. And then you have its spec and its status.

Starting point is 00:32:56 So the spec is what you actually want. And in the case of you want to run containers on an environment, it'll be wrapped up in a thing called a pod. A pod is the minimum unit of scheduling of stuff in Kubernetes. It can have one container in it, but it can have more as well. So you might want to say these two containers always run together. This is my web server, and this is the agent that processes its logs, or it gets the files to serve from Git, for example. Those two things are always required together. So the pod is the minimum unit of thing that you schedule. So for a pod object inside Kubernetes, you will say what the specification is, the spec,

Starting point is 00:33:31 and it might be that it has this particular container image and it has this much CPU and RAM and it has these security settings, for example. And then the status object should effectively say the same stuff, but it might turn out that you've made a change and that hasn't updated yet. And so if you make a change, it will do some magic diffing and so on and say, all right, well, I'm submitting a new object with the same name as the last one. What are the things I need to change?

Starting point is 00:33:57 I need to make a change to its labels. That's just an API server thing. I need to give it more CPU. Well, in that case, what I need to do is tear down the old one and instruct a new one to run and so on. So there are possibly some cases where it can get confused and it'll probably just mark it as, again, crash loop back off or something and put a log entry as to why it is. But it's quite uncommon because of the structure of only a few different things that matter inside each object that you can get it so confused that it wouldn't be able to do anything. I've never really heard of that yeah that makes sense one of the things that blew my mind with with kubernetes

Starting point is 00:34:30 or one of the things that was really surprising was i kept seeing m core everywhere i was like what is this finally i looked it up and it was milli cores you can actually say like i want you know this process to use you know uh you know one four 400th of a core or something. You can get very, very specific, which is really good because you are running these things at really large scale and you need to be able to coordinate and orchestrate. All of that speaks to the reason that Google wanted this kind of system and the reason that it thought Kubernetes would be relevant to people.

Starting point is 00:35:05 And this is something which I find very interesting. The reason that Google wanted to do this kind of thing is it's not running one instance of a web server. It's running 20 million. It's running whatever big number you can think of. They used to talk in public conversations about a small cell would be on the order of tens of thousands of machines. And you might deploy something like the backend for Google Calendar, and you're going to need tens of thousands of replicas of that. And you just think, well, if we assume the backend for serving any application takes a couple of hundred megs per user or something, and there's millions of people concurrently hitting it from every country in the world all the time, you need an awful lot of stuff for this kind of scale. And you also need to have that for anything new that launches on day one. Google never had the luxury of saying,

Starting point is 00:35:49 all right, well, I'm going to launch a thing quietly. And then unless it was on some kind of wait list, everyone in the world knew about it immediately and everyone would come and hit the things. You need to scale things up pretty quickly, but that's very expensive. Like you've got to run millions of replicas of things over the course of time. And so they wanted to be able to pack more things into the fewer number of machines. So it's not so much, I have 10,000 calendar backends and I need one machine each to run them. I need 10,000 machines. It's, hey, I can schedule them on a smaller number of machines or in the space that I have left over when these things aren't doing anything, I can run batch workloads and so on. So the real workload, the real reason that Google had for having Borg was

Starting point is 00:36:30 to get more out of the number of machines that they had using this API system that drove it all. And that was the thing that we thought people would want to use Kubernetes for. People would want to do more with their infrastructure. Famously, we talked about the reason that we launched it at the time was when the people who had been building Google Compute Engine, Craig McClackie, who was one of the PMs on that, who went on to launch Kubernetes, they would go to the Google backend team, basically,

Starting point is 00:36:58 and say, all right, hey, we're selling out of this compute stuff. People are buying it. We need some more, please. And they'd say, well, hang on. You're only using 5% of the CPU. Say, well, that's fine. We sell these CPUs to people.

Starting point is 00:37:10 They buy 100% of a CPU. They only use 5%. They're paying for the whole thing, but that's what they're using. And that was just so anathema to how Google thought about things. Like, we need to pack more, and you should be able to get 70%, 80% utilization out of that CPU, because you need to do that at that kind of scale. But it turned out that the thing that people actually couldn't do back then, and remember, this is around 2014, is they just couldn't deploy software. They couldn't reliably say,

Starting point is 00:37:34 hey, I want to run the thing and have 10,000 copies of it running and start and upgrade them and so on. Simply just having an API that enabled doing that, you could be as inefficient as you want in the first instance. That's a problem we'll solve later on. We just can't make our thing run. And that was the thing that really got people interested in Kubernetes, was it was just a really well thought out

Starting point is 00:37:54 declarative API for managing software at scale. The fact that you could then later on at scale say, I'm going to only use 400 millicores for this rather than a whole cpu or something that's a nice to have that people can get to later on yeah i mean one of the things that i remember a story i'm not gonna say any specifics here but uh there was a time where um you know we were running something there's a team running something at Google. And basically, every time someone made a request, it created a thread, like an OS level thread, and it never closed it. And so after maybe what 10,

Starting point is 00:38:35 20,000 people hit one of these endpoints, it would crash because the OS would just run out of handles. And it ran for years like that. like that because Borg or Kubernetes under the hood just said, okay, well, this one crashed, let's spin up another one. And it always had enough running that we didn't have to worry about that bug for a while. Yeah, there are many different ways to succeed. Yeah, and that is something that I think, you know, it ties into a conversation I was having a little while ago with somebody where you can make the case that what really

Starting point is 00:39:13 matters to your company are your KPIs, you know, your key metrics for your company. You know, yeah, I mean, as engineers, like it pains us to see like, you to see unclosed threads. It was eventually fixed. But the time in between when people were fighting other fires that were really making big gains in the company metrics, that was made possible because Kubernetes was able to throw them that life preserver you know and i think uh to your point it's like just out of sight out of mind of this these like massive replica sets uh allows you to really prioritize things that you wouldn't be able to otherwise yeah there's a lot to be said about like you can be inefficient but you need to be up. And when people started thinking about SRE concepts, which were publicized by Google

Starting point is 00:40:09 around the same time as Kubernetes, it enabled the industry to understand the difference between an SLO and an SLA. A lot of people in enterprise think, oh, I've got this SLA, which says basically, all right, well, the company will give me back some money if they're not online for the amount of time that they say they will be and normally measured in sort of percentage of uptime but the thing that actually matters is being able to say here's my objective i hope to be up for this amount of time and the sla is really only the monetary piece on top of it is to say right this is what i will give you if not and so then you need to know how you're going to measure those

Starting point is 00:40:41 things you need some sort of indicator and your internal teams can say, all right, well, hey, if we are meeting these goals, we might be burning money to do it, but we're meeting the goals and that's important. And it might be that how much it costs to fix that costs more than the amount we'd have to pay out in the course of an SLA breach, for example. So you need to factor these things in holistically and you need to look overall at the costs of running things, especially when they're in a sort of pay-as-you-go cloud environment, if you want to be a successful business. Yep. Yep. That makes sense. And yeah, to your point, people get so bogged down

Starting point is 00:41:14 in deploying software that they couldn't focus on the part of the business that makes them successful. I mean, yeah, before Kubernetes, I mean, just so many horror stories. Let me have a quick one. This is not from Google. This is from another company. This person told me that they had a Java jar file. And for people who don't know, a jar file is literally just a zip file

Starting point is 00:41:36 with Java bytecode in it. And the way that they were deploying their software was they were taking their Java bytecode, their.class files, and they were unzipping the jar file putting them in there and then re-zipping it on the production server um and so at no point did anybody know what the code on the server was because everyone was doing this concurrently and so yeah i mean this is what people were doing even in you know 2014 i mean i think uh you know it mean, I think we forget so easily with Docker and Kubernetes, we forget how hard it was to deploy things and what sort of wonky things people did when they didn't have any other way. That said, I run a website which is running PHP code from pre-year 2000 probably and largely hasn't changed in that time.

Starting point is 00:42:27 And I'm not going to tell you what it is because I'm sure it's full of security holes and you'd really be able to get offline very easily. But it still works. And there's a lot to be said about it. I'm never going to put it in a container and I'm never going to change anything about it because it's just a thing from the past

Starting point is 00:42:41 that pleases my heart that it's still there. And there's a lot to be said about things that were really easy. Like the whole idea of things like PHP made it possible just to FTP a bunch of stuff to a server and have it work. And simple is useful on one hand. And there's a whole bunch of complexity that's introduced. We could go on for hours about the trade-offs here. It's like this Kubernetes is complex and it's got all these bells and whistles and so on. And there are easier tools. And what is the right thing to do? The answer depends entirely on your use case. And it's an

Starting point is 00:43:09 investment question as well. It's like, do you want to be in the ecosystem where you can hire people who understand Kubernetes and can work on stuff from day one? Or do you think that by building something in-house for you that you perceive to be simpler, then you're going to be able to win in the long run? You have to make these, and they really do come down to being business decisions. Yep. Yep. That makes sense. One of the things that I think Kubernetes saves people a lot of time and ultimately money is the way that the different containers or the instances of the containers can talk to each other. So I mean So that was one of the things that really is very intuitive to me is in your YAML file, you name this replica set,

Starting point is 00:43:51 I guess is what it's called, or this process. You name this database. And then in your actual application code, you can say connect to data, like HTTP colon database. And so under the hood, Kubernetes knows, oh, database actually means this other part of your deployment. And I thought that the whole networking part of it was extremely well done for something that's really complicated.

Starting point is 00:44:16 This is another huge difference from Borg. The way that Borg worked is it ran a thing called the Borg name service, where everything that started up your 10,000 calendar instances would all register with the Borg name service, where everything that started up your 10,000 calendar instances would all register with the Borg name service. And then the load balancer or whatever it is that needs to know where these things are would have to have a client library that knew how to talk to that service. And then it would say, all right, where are the back ends for this?

Starting point is 00:44:39 And it would get back its 10,000 requests and so on. And it could, in theory, this I think was implemented a bit later on, but the system basically could say, not only here are 10,000 requests and so on. And it could, in theory, this I think was implemented a bit later on, but this system basically could say, not only here are 10,000 replicas, but here's how busy they all are. And so it could then be used to make load balancing decisions based on utilization, for example. And that worked, but it required you to write custom software for everything. The internet as a whole has a way of finding out where a service is running for better or for worse. It's called DNS. And so the difference in how Kubernetes operates is it basically has a programmable DNS server baked into it. You create an object, a different object inside Kubernetes called a service. And then we go back to those labels we talked about

Starting point is 00:45:19 before. We've got deployed labels on top of everything and say, this is a calendar, name equals calendar, version equals 2.0, samples equals dev or something. And so you can make a service that says target all of the things that are the calendar, or you can make one that says target all of the things that are the dev version of the calendar or the dev version at two or version three or so on. So you can target which those things are and select that group. And then you can associate a name with that. And so you might say, connect to dev calendar service, and then it will return you the IP addresses for that whole list of things. And then your software

Starting point is 00:45:54 can decide, does it run Robin between them or whatever it wants to do, but it just does that with DNS. And the benefit is that if you have software that has no idea it's running on Kubernetes, it just works. You say engine X is is my load balancer, balance between these services, like it knows how to do that. You didn't have to rewrite your front end, you didn't have to compile on any kind of Kubernetes name server thing to work, everything kind of worked the way you'd expect it to. Yep, yeah, super clever. I mean, one thing that on the flip side, one thing that at least I found as a user of Kubernetes really complicated was ingress. So, you know, Kubernetes, because it has its own DNS server, it's almost like its own, you know, intranet. And so then when, you know, you, Joe Schmo on the outside wants to talk to a Kubernetes

Starting point is 00:46:40 service, now you get into like cluster IP and node port and and load balancer and which do i pick and why and and oh it doesn't work and and you know why this is a problem it's because we haven't all moved to ipv6 yet yep that's right if it was ipv6 we would just that was not a serious suggestion well but i mean yeah i actually love ipv. I don't know enough about why we don't have it yet. I mean, I'd love to get your take on it. You probably know a lot more than I do, but it would solve a lot of problems. I think the answer is because IPv4 works and people are incentivized to keep it working. That said, Kubernetes now does support v6 largely and so do all the cloud vendors.

Starting point is 00:47:22 But I think that while v4 is good enough then it's going to keep being the answer but i will explain what i meant by that so first of all you mentioned this a bit like an internet it is a cloud in and of itself like it has apis it configures you can start things running it is to all intents and purposes as complex and does all the same things that a cloud vendor does that open stack didack did, that Amazon or Azure or Google do when you say create me an instance of things. So you have to think about it in terms of that level of complexity. And then you also have to think about it in terms of the complexity of how am I going to layer Kubernetes on top of a second cloud and so on. In order to solve these

Starting point is 00:48:02 naming problems and make everything work with DNS like we talked about, one of the very early decisions that was made with Kubernetes is that every pod, those things I mentioned before that are one or more containers together, every pod in a Kubernetes environment needs to have its own IP address. To give it a DNS record, you need to have an IP address, for example. So there are not enough internal IP addresses in some companies, I'm sure Google are included, to do things at that kind of scale. And there are not enough RFC 1922 addresses to not conflict when you have enterprises who are running lots of different clusters and so on. So you need to have these little islands of netted environment, if you will, like many

Starting point is 00:48:48 businesses already have. And to get from the outside internet to that, you need to have some sort of gateway. And that's effectively what an ingress was. It was the idea of being able to define a programmable gateway to say someone on the internet is going to call into this. They're really only going to need to hit one or two ip addresses like they were before in an enterprise situation but then that needs to fan out and be aware of everything inside and be able to make calls into my NATed environment so when you think ingress like for anyone in the old world

Starting point is 00:49:15 you think NAT it kind of works the same way it would be great to say there was a world where all of those ip addresses inside kubernetes were globally routable or were available everywhere. Some people may choose to run their Kubernetes environments like that. That sets you up to all of the problems that you have with other networking where you think, all of a sudden I have to worry. All these things I was running internally are now possibly internet addressable and you really have to think about security. You've got a lot of free security by running gateways and that and so on.

Starting point is 00:49:44 So it's possible. It would be more possible with v v6 but it's something you really have to pay attention to that's a real actually i want to pause there for a second you're totally right i mean even if we had ipv6 you know for security reasons you'd almost certainly still have a nat layer and then you're not you're back to the same problem, you've got to have a firewall layer at least. But I think v6 was designed in the 80s, back when the internet was still very academic and the idea of everyone running services on their own local machine and so on was well accepted. But we're very much pivoted to a world where you put your servers and your hosting in a server hosting environment. And anything I'm running on my local machine is more likely to be something I'm developing

Starting point is 00:50:26 or something that's insecure. And I don't like the idea of anyone on the internet being able to route to things I'm running locally. There's the trust situation, the threat model has changed substantially since it was defined. And maybe that's one of the reasons why people think, well, hey, that's a huge advantage of v6,

Starting point is 00:50:42 which actually turns out not to be relevant in 2023 yeah that makes sense okay yeah so that makes sense so basically uh you know you don't want to make every pod accessible from the outside and so the ones the services that you do want accessible you have to set up an ingress rule and that's where you get into those different variants node port and cluster IP and all of that stuff. Yeah. And all of that basically relates to how does the machine, the pod that's running the ingress on, how does it talk to, how do you expose those things? And it might be that you have an external load balancer that's sort of unaware of the Kubernetes environment. And it just

Starting point is 00:51:20 has to say, here are the cloud ip addresses of some machines do i make that service available on a named port on the sorry a numbered port on the machine that i can then route to or does the ingress run inside kubernetes and it can just use the the objects like it was harder over time now the ingresses and the load balances that power them and so on are generally more aware and the cloud providers are able to expose the internal kubernetes ip addresses to their load balancers so these were things you had to worry about a lot in the early days they were just how do i route things and deal with the fact there might be four or five different things on my physical server machine four or five different kubernetes pods who all think they're running a service on port 80. And only one of them could run on port 80 of the underlying machine as far as the external network

Starting point is 00:52:09 was concerned. Got it. Yeah, that makes sense. Cool. So if somebody wants to, you know, get started, let's say I have, you know, maybe we'll use your example, you know, I have a PHP site and I have a, you know have Postgres or MySQL backend. And I've decided I want to take the plunge, move this onto Kubernetes. How do people, what do you recommend for people? How do they learn Kubernetes and then how do they get started quickly? It pains me to say that I don't know. And I think the answer might be I don't know. And I think the answer might be, I don't know anymore,

Starting point is 00:52:46 because like one of the things that I did in the early days at Google was help contribute to producing courses on how to learn Kubernetes. And they basically start by saying, here's, here's a pod and here's how you run it. And here's how you expose it and so on. And they leave a lot out at the beginning. And at the end. The bit they leave out at the beginning is, how do I actually get my stuff into a container? If I have a PHP application today, I upload the files or I edit them on the server and so on. They are there.

Starting point is 00:53:16 They are not, they're on the same machine as the web server, but they don't come packaged with it because I just use the server's package management system to install PHP and Apache or Nginx or whatever. But so then you need to think, well, how am I going to get the engine? And then how am I going to download the files that need to be with that to serve it? And how am I going to group those things together and then deploy them? And so the first thing I would sort of suggest to people is, is this a thing you need to do? Like, are there services like you can use Google Cloud Storage and Amazon S3 and so on to do static hosting really easily?

Starting point is 00:53:52 Hosting files without running a server to do it yourself is largely a solved problem. And there are also functions, containers as a service, solutions and so on that if you just have a container that serves stuff that you want to upload they can handle all the scaling of things for you so one thing i would say is if you have that sort of desire to do a thing and learn it that's great and if you know that your problem is going to scale and need this kind of thing that's great too but the first thing i would say is figure out if you have a kubernetes shaped problem because you don't need to use that for everything can you double click on that so So if there's a service that takes your container, runs it at scale for you, where's the gap there between that and Kubernetes? So there's a value gap. First of all, the vendor who does that will

Starting point is 00:54:35 charge you more for that because they are solving a problem for you. And then you lose a bunch of control. You don't know where it runs you can't necessarily say i would like this container and or this part and that one to run on the same physical node because i value the fact that they are close together as far as latency is concerned and so on so you you get some trade-offs but then it's sort of a convenient it's like heroku versus running on your own server or anything like that it's you do these things. You can operate in a shared environment. You can get some economies of scale, but you lose the control.

Starting point is 00:55:08 So what is that service called on GCP, the one that runs your container? Yeah, Google Cloud Run. Google Cloud Run is sort of like a second version, if you will. There was a thing called Google Cloud Functions, first of all, which was basically upload some code.

Starting point is 00:55:23 And that in itself was, you may remember App Engine. There's been an evolution of sort of upload code and it'll start a server and run it for you to upload code and it will sort of build a container and run it for you somewhere to upload a container and then it will run it for you wherever got it okay and so we have kind of a like set of stones here. So, you know, and again, you're totally right. People should not put everything into Docker and Kubernetes like wrote. Right. I mean, there should be, you know, reason.

Starting point is 00:55:54 But let's let's let's say it's an academic exercise and someone's kind of motivated to do it. And the first step is, you know, get your thing running in Docker and basically stop your web service, Docker run your container and make sure it does basically the same thing. That's maybe, you know, step zero here. And then step one would be run on Google Cloud Run. Yeah, let's back up a little, if you would. When you talk about any kind of migration you quite often talk about lifting and shifting a thing and then improving it to make it work so in our example before we've got a postgres backend and the php web service or something so let's look at

Starting point is 00:56:37 the database like you can just say all right i'm going to stop my provider my app get install postgres i'm going to shut that down and then i'm going to Docker run Postgres instead. So now all of a sudden I just have to worry about a slightly different life cycle and how I deal with the data and so on. But largely I'm still doing the same thing that I did before. So that's the lift and shift part, especially if you're moving from say on-premises to cloud while you're doing this. Then the next piece is, all right, well, could I just use some sort of magic back-end hosted database service you still end up with a database that's running on a single machine or you're clustering it so it runs on with replicas on many different machines but there are now services that

Starting point is 00:57:15 basically have magic distributed back-ends and can speak the postgres protocol and so on so you may eventually decide the management of applications on Kubernetes is hard enough, that the management of data where you have to worry about like this is a disk image and it may be big and it may need to be reattached to a different machine if the scheduler says, hey, this node's coming down, I'm going to run somewhere else. We didn't really touch on the sort of redundancy features and high availability and so on. But you need to be able to handle your workload moving around. And that gets more complicated when you're dealing with it, having static attached data. Kubernetes has principles for dealing with this,

Starting point is 00:57:52 but you may decide again from business reasons that I just don't want to worry about that. If you are running a big enough scale or you want to run on your own premises, you don't have that kind of cloud service and so on. Yes, you will want to investigate that, but I don i don't think there is a simple like there is obviously here's how we do it in a lab environment but there's not one answer and it's it's so frustrating we used to have the same problem people would say to google like i want to store some data like great we've got and again i don't work at google anymore so please excuse me if i get this wrong but we've got big table and spanner and cloud sql and cloud storage and firestore and whatever it's like well you've got to go through a decision tree and say well

Starting point is 00:58:30 what kind of data is it how frequently is it going to be accessed how frequently is it going to be rolled and do people want to query it by column or by row and so on you've got to think about these things whereas a lot of people who are just dealing with fun stuff at home it's like well my sql can deal with all of that as long as you've got under 100 megs of data. Don't worry about it. Good problems to have. Yeah, good call out. I think, actually, double click a little bit on that.

Starting point is 00:58:53 You touched on something there with the persistence. I think a Docker container, just for the audience, is read-only. It's not like while Docker is running your container, you can go and modify the container itself. It's like you created this frozen snapshot. And so if you need to persist anything, it has to go somewhere else. And if you're a web server, that's usually a database.

Starting point is 00:59:19 But if you're a database, that's not an option. So yeah, how does that work? Yeah, so what you how does that work? Yeah. So, so what you said is true enough. It's true in the sense of like a VM image that you instantiate that image is read only. But once you've instantiated it, you've basically copied it to a disk and so on. But the, you, you can attach writable volumes to a running container. And so the way that you handle this in Kubernetes basically is you say, I need to have a certain amount of disk

Starting point is 00:59:46 and the abstractions that Kubernetes provides say like, I would like to have a 20 gigabyte volume or two terabyte volume or whatever. And then because Kubernetes knows what it's running on, it's configured to run on top of your cloud or your OpenStack or VMware or whatever environment, it calls out to the underlying system and says, give me two terabytes of whatever your flavor of disk is. And it also has storage classes that you can configure to say it can be SSD backed or it

Starting point is 01:00:15 can be spinning disk or whatever. And then it will attach that volume to the machine. Because remember underneath this all is still VMs or computers basically. So it does whatever network attachment tool it has, or you might say, I want a local disk. And this all is still vms or computers basically so it does whatever network attachment tool it has or you might say i want a local disk and this thing is like useful for cache data but it's not a network disk and that's there straight away and then it maps that through so that is accessible at whatever path in your container so you just have to say as the person who's written the app write to slash slash data. And then you trust that Kubernetes will have mapped something there to you. How does that work with replicas? Like, let's say I have, you know, yeah, three replicas and they're all going up and coming down at the same time. Like, yeah, how does that work? So what you want to do is to have the data persist,

Starting point is 01:01:00 but you also now have to have, like, it's not just enough to say this data, like you have replicas one, two, and three, and you actually have disks one, two, and three, and they have to stay associated with each other. You can't just move the things around. It'll confuse things. So this was something that got added early on in Kubernetes. So you talked before about replica sets and so on. That's a way of saying, I just want n copies of the same thing running wherever. If you want to say this one is important and it's ID number zero and the next one is ID number one and it's the first replica and so on, there is an object called a stateful set where you can basically say all the things in this have

Starting point is 01:01:37 an order and they have disks which will be assigned in that order and if things get moved around this disk and that particular volume will, that pod will always stay together and then we move together and if it is shut down it gets rescheduled somewhere else the system underneath it will say to the disk controller i need that named volume not just any two terabyte volume i need that one because that's the one that's got my data on it uh makes sense yeah that's a huge amount of complexity i mean with docker locally you know you can mount paths on your local machine and that gets you pretty far. But it's the same thing if your machine dies and you're out of luck. With something like Kubernetes, you're protected from that. gone to talk about security as well. There are so many different knobs here that you can turn to do things, but it's a toolkit that you can use. And some people have published some patterns as like, here's a good way to manage databases. You might hear people talk about a thing called an operator

Starting point is 01:02:38 in Kubernetes. And that basically is a piece of software that knows how to manage another piece of software. And so, for example, it might be, let's talk about Postgres again. You might say, I want a Postgres database. And then you create a custom Kubernetes object that the operator has provided for you and says, give me a database of this type and with this many replicas and so on. And then it knows how to translate that into instructions to create this stateful set and to set up permissions and so on and deal with that. And it also adds application specific things like you might want to know how to back up an application. If you just say, hey, back up to this particular operator, how you back up

Starting point is 01:03:15 Postgres is different to how you back up MySQL or etcd or anything. So the same kind of patterns, especially around databases, exist and they're out there but then again there are also five or six different people who make postgres operators so you have a a world of choice and there's not always a golden path cool makes sense so is an operator like a macro for kubernetes is that is it the right analogy or is it something else we've talked a lot about controllers in the sense that you upload an object to kubernetes and then the the controller is the thing for a replica set or whatever that takes that object and actually does the thing an operator is the combination of a custom object and a custom controller that relate

Starting point is 01:03:56 to a certain thing so you could sort of think about it as a macro in the the really old sense of word processor macros or so on but it is a convenience thing that's sort of custom to take those things that are not just for, like the built-in controllers operate containers, the operators operate databases or things that you might want to stamp out more than one of a thing. Cool. Yeah, that makes sense.

Starting point is 01:04:20 Oh, you know, another thing that I found useful when I was learning about Kubernetes was the local Kubernetes. There's Minikube and Kind and these other things where you can run sort of like a very lightweight Kubernetes on your own machine. What's your take on those? Is that a good pattern for people to follow when they're getting started? Yeah. Again, there are lots of different ways you can do development kind of work. You can say,

Starting point is 01:04:47 I'm going to run a local Kubernetes cluster and test everything against that and then do integration somewhere else. Or you can say, I'm going to run a development Kubernetes cluster for my team and every member of my team gets their own slice of that cluster to do work on. It is pretty easy to do a combination of those things. If you are using or paying for the Docker Enterprise or the Docker Desktop product, sorry, they have built-in Kubernetes. You've got Minikube and so on, which are VM-based things that can run locally. They give you effectively a single node Kubernetes environment, but that's enough to test a lot of things and to say, what does it look like? Especially if you're taking an application that wasn't Kubernetes aware and putting it

Starting point is 01:05:27 in this environment. It is a good start. It's not going to deal with the fact that moving from monolith or local application to a distributed system, all of a sudden, things that were function calls that would guarantee to succeed are now RPCs that might fail or might time out or anything like that. There are lots of things that you have to think about that even running Kubernetes locally isn't going to replicate, but it's a useful tool in the tool chest. Yeah, it makes sense. And you touched on this a little bit earlier. I wanted to go back to it, securing Kubernetes. And you have a NAT, so that's providing

Starting point is 01:06:01 you some level of network security. But what are the things that folks should have on their radar when they're standing things up? Yeah, the analysts and the people who make money, they're scaring people. They say that the misconfigurations are one of the biggest causes of outages, one of the biggest causes of security breaches and so on. There are so many different settings and things

Starting point is 01:06:24 you can configure inside kubernetes not generally kubernetes isn't secure by default like you can say run this thing an agent that is root on the machine then starts the thing running so you can be root on the machine if you don't say otherwise so you should say i need to run this thing as a lower security user and a lot of people will say i'm not going to do that because I'm lazy and it works if I'm rude and it doesn't if I'm not. And that's true of every kind of software delivery. So the first thing you need to do is be aware of what you're running. And the best way to be aware of it is to have software do that for you. I work now on an open source project called

Starting point is 01:07:00 Cubescape, which has just joined the CNCF sandbox. It's a tool that basically can validate the state of your cluster or can validate the things that you're going to deploy for your cluster and lends it against a bunch of different rules, which you can define based on either external guidelines of the NSA and various other government organizations that publish things to say, here are some best practices for how you should manage your Kubernetes environment. And then you can validate and say, hey, that matters to me, that doesn't, and so on. That handles the easy case of the configuration of things. Now we also have to deal with the fact that we are running software, which in large part

Starting point is 01:07:37 comes from third-party sources. You're not generally building your own Postgres container. You're taking it from someone else. And it's so much easier to run software than it used to be. You now have all the same problems that you had, but they're scaled up. What are you going to do with vulnerabilities? What are you going to do with the fact that this particular,

Starting point is 01:07:57 so it's easy to keep them all at the same version, but if you have a CVE now, you might have 10,000 containers that can be exploited. And so then you also need like cubescapes functionality for handling vulnerabilities and looking at what's on each machine and what needs to be remediated and we're doing a bunch of work at the moment in that to make sure to make it possible to tell people like all of a sudden you do any kind of security scan on a machine and it's going to say everything is broken. There's 10,000 vulnerabilities in this machine or something, but it turns out maybe only a few of them are really important to deal with straight away.

Starting point is 01:08:30 And some of them are exploitable or some of them are in a code path that gets run frequently. So we're putting a lot of work at the moment into how we can prioritize what things you need to fix. So you do need to be aware that's true of any environment, but it's a lot easier to do it programmatically and know what's going on in an environment like Kubernetes. Yeah, that makes sense. One thing that I want to make sure I kind of get this right. Somehow I ended up spinning up a service where people could access it from the outside. And that wasn't my intention at all. I wanted only employees to

Starting point is 01:09:05 access it. You are one of today's 95% misconfigurations. And yeah, the thing that impressed me was not that I messed up Kubernetes, that was very obvious. But the thing that impressed me was that the IT or cloud platform folks at my company reached out to me in maybe like five minutes or something. I mean, it was unbelievable, like ambulance level turnaround time. And they're like, Hey, you have this service? And I was like, Yeah, pretty proud of it. Like, Oh, yeah. So it turns out, you know, anyone can look at your website right now. I was like, Oh, okay, let's let's turn that off. And I was just so impressed with how quickly, you know, so I started asking questions

Starting point is 01:09:45 and they were saying that they have, you know, basically they have this monitoring system and it goes to this database and then there's a trigger and, you know, DM them on Slack and then they DM me on Slack. The thing was just phenomenal. And it also, it kind of showed how important security is and how you could do it right and with the value there. My company, Armo, builds such a system on top of the KubeScape engine. But the specific things that I think make Kubernetes more of a concern in this kind of environment is if you think about deploying your software on machines in the old days, like if someone had found a way to get a shell on your machine 20 years ago when you were

Starting point is 01:10:27 hosting something, it's unlikely they would be able to influence any other machine on the network. Whereas today, if you're in a machine that runs Kubernetes, it's very easy to say, right, I've got onto this machine. Chances are that somewhere on this machine, there's some network credentials, which I'm able to use to, like I use to call into my cloud provider to pull an object or something. If those credentials are not adequately scoped, then I can instruct the cloud provider to spin up more machines and start mining Bitcoin. Or if the Kubernetes API objects are not correctly configured, I can say, connect to the

Starting point is 01:11:00 Kubernetes server, tell it to start running 10,000 pods to start mining Bitcoin. There are so many ways that your, I have a simple breach and someone can read my web server, can now turn into some other kind of problem because we have this automation. And it's the fact all this automation is here that means you have to pay more attention to securing these environments than you did when they were less a concern. Yep. Yep. That totally makes sense. Yeah. I mean, that's an attack that I hadn't even thought about. But you're right. I mean, next thing you know,

Starting point is 01:11:29 you're spending like 100K a month or something on cloud so that someone else can get a bunch of Bitcoin. It's always Bitcoin. Not Monero or anything. It's always Bitcoin. I think that Monero should diet Bitcoin as far as I'm concerned.

Starting point is 01:11:45 Yeah, that's the only other one. Oh no, there's Ethereum. I don't know as much about that. It's such a shame because the cryptocurrency has such practical use cases and can make banking available to so many people

Starting point is 01:11:58 for whom it was not otherwise available. But we've just ended up burning a whole bunch of electricity, pushing up the price of graphics cards and enabling crime. And it's hard for regulators to see past all of those things to the good in this. And I don't know how we can separate those two things and make a good out of it. Yeah. I feel like I can say this now. It's like the body has been cold long enough for me to say this, but I always felt like, you know, web 2.0 is awesome. It's like all this interactive media, all this cool stuff, you know, and you're seeing like underground

Starting point is 01:12:31 people coming out from the underground and going viral and everything. And then in my opinion, money kind of ruined it. And now it's like, oh, you know, I have to make my show title in all caps so that I get more clicks or, oh, I need to like, I have to be on this platform or that platform. You were constantly getting emails like, you need to be on TikTok, you know. And so it's just become very commercial. And so I always thought Web 3.0 was like, let's take the thing that ruined Web 2.0 and let's start there. Let's like build a castle around it it's like let's start with the big money again i could spend i could fill another podcast with opinions

Starting point is 01:13:12 on this and so on but uh in deference to your listeners at this point i'm just going to make the the face the oh my god i'm a youtube thumbnail face which they won't be able to see great radio here that's the trouble with podcasts yep yep but uh yeah that um yeah the mining bitcoin thing is hilarious it's a tragedy though it can get extremely expensive yeah it's made it possible to profit from cybercrime in a way that wasn't hugely relevant years ago well i was just gonna say on the other side of the fence what happens when someone gets a hundred thousand dollar bill i mean i don't know if you're on liberty to talk about this but i always was curious you know someone someone gets exploited in this way and it's someone you know like us we don't have a

Starting point is 01:13:54 hundred thousand dollars sitting around and then google comes like trying to collect like what actually happens there if you could say again i don't at Google. I did for some time. I wasn't involved in anything like this explicitly, but I have seen a lot of stories with cloud vendors, not just Google, but all of them basically say, if it's an honest mistake or if there's something that could be tracked down to a bug or something that probably shouldn't have happened and it was detected soon enough, then in a lot of cases, the vendor will write the bill off. And that's generally a good calculation as well. If you as an individual find that someone has exploited you and done a certain thing, and you don't feel you did anything wrong, and you have a big enough

Starting point is 01:14:32 megaphone on the internet, you can probably make enough noise to put people off that vendor, such that the vendor will say, I'm going to eat that loss myself. That makes sense. I bet they also have like certain pattern recognition things. So if they see, you know, 10,000 pods mining Bitcoin, they probably can even reach out and tell you something fishy is going on. I mean, I wouldn't count on it. I wouldn't make that your security plan, but they probably have something like that. Yeah. And again, there are some distinctions that may have applied at Google. I'm not 100% sure, but you don't necessarily want a vendor to know what you are running in your VMs.

Starting point is 01:15:11 There is a trust boundary and so on. You can do whatever and know that the people aren't looking in it. There are lots of reasons, good and bad, why that is the case. But in general, if you're running something on a cloud, the vendor knows how much cpu you're using but they shouldn't be able to see what is running unless of course you do something like install an agent which exports logs or whatever and you make a choice to do that so some information is available and they can use that and again pattern recognition perhaps if all of a sudden nodes start saying spending out sending out network connections that they weren't before. That is

Starting point is 01:15:45 something that you may not necessarily want the provider to be looking at what is in those connections, but they could ping you and say, Hey, all of a sudden your machines are spraying out traffic that they weren't before. That's the thing you should go and look at. So somewhere in the middle is the answer. You do have to have some responsibility there yourself. Yeah, that makes sense. So we talked about Armo and, you know, CubeScape. So definitely folks should check this out. So in terms of Armo, the product is there, you know, we do have a lot of folks who are working professionals who are, who are, who are Kubernetes users, who I think could benefit greatly from this. We also have a lot

Starting point is 01:16:20 of students who are learning and they want to, you know, for them, it's like job simulator 2023. And so is there like a free tier at Armo? You know, what is the what are some of the options for them? Yeah, so so kubescape is an open source project. It's the thing you run inside your cluster, and it generates reports and so on. And that's entirely open source. We do see a lot of people learning and students and so on coming and wanting to get involved and so on and that's entirely open source we do see a lot of people learning and students and so on coming and wanting to get involved and hack on the project as well which is fantastic the back-end service which visualizes things and shows you here's the state as it was yesterday and today and tomorrow and so on that is the the ammo platform that is free for up to 10 worker nodes

Starting point is 01:16:59 and that's free forever so if you're running at a small scale or experimenting or so on you're you we're more than happy for you to connect up there and visualize and see what's going on. And it also lets you see the configurations of access control inside your clusters and so on. And then obviously the bigger you get and so on, we'll talk to you about pricing plans. But overall, the Cubescape project

Starting point is 01:17:23 is trying to build out security tooling and we want to be a sort of an end-to-end platform that helps you secure your kubernetes environments and we saw at armo the number of security vendors were doing stuff in this space but it was really just sort of an afterthought it wasn't open source or it wasn't the key business and the difference we're doing is we're committed to making everything we have open source. We want this to be a platform that's adopted just like many other add-ons for the Kubernetes ecosystem are today. And if we get to a point where it turns out that you're happy enough running just the open source things, we get a lot of people who are running it in their CI pipelines or as a GitHub action or something. They never touch

Starting point is 01:18:02 the service that you can choose to pay for. And that's great too. We really want this to be accessible and available to everybody. Cool. That makes sense. And what about the company? What is something kind of interesting that kind of sets Armo apart from other places to work? It could be certain outings you do, it could be a travel policy, be your mascot you know what is something that sets you apart yeah well um almost everyone is in israel and i'm in new zealand i think that sort of sets us apart enough to some degree yeah is that uh wait how did that happen how how that happened was uh i did have an intention to be back in england and a family situation has sort of made that unlikely at this point this year at least but but overall it was a case that they're building a remote company and this is sort of

Starting point is 01:18:50 a way to to demonstrate that like they have had a core of people and everyone when you say Israel and cyber security in the same sentence everyone sort of builds their own opinion in their head but there's a lot of work gone on in terms of training people up in that part of the world to, to be experts in this space, but then obviously interacting with the Kubernetes ecosystem. Like I have, I understand security, but it wasn't my professional background.

Starting point is 01:19:13 My background was in Kubernetes and my background was in DevOps and so on. And those are the people that they're targeting. And I was the best person to help them with that. And there wasn't a problem to them that I was going to be working remotely and attending events and doing online stuff and so on. And they've been really great about that. Cool. And so we do actually have a lot of listeners in Israel, which is awesome. But for folks who aren't in Israel, that's right. For folks who aren't in Israel, are there more remote opportunities? You know, is there a careers page that they should check out and what does that look like and also is there are there internships for students and what does

Starting point is 01:19:49 that look like the kind of two separate questions there is a careers page i am hiring at the moment and may still be at the time you're listening to this podcast but i don't know whether or not the job description is up but i'm looking for another developer advocate or someone for our team who's writing and telling people about interesting stuff in the Kubernetes ecosystem. So find me in the show notes and ping me if that's something that you're interested in. In terms of internships and so on, we have been as a CNCF project participating in a thing called the LFX Mentorship Project, which is the Linux Foundation, basically pay three or four times a year a stipend to a bunch of students around the world to contribute to open source projects. It's a little bit like the Google Summer of Code, if you've heard of that. We're looking at possibly

Starting point is 01:20:34 participating in that too. But that's a way we basically give a chance to people to get involved in a paid capacity in an open source project and build up their skills working with the community, working with the maintainers of a project and so on. We are currently working with three different mentees and we're getting some really good results and hopefully we're helping them in their journey as they work towards becoming professional programmers. Very cool. That is awesome.

Starting point is 01:21:00 Yeah. I mean, I'm fascinated with Kubernetes and security. I think that, you know, that incident that happened to me and the fact that, you know, they turned it around so quickly at my company showed me, you know, how dangerous it is, also how important it is. I think it's a great area. Folks out there, if you are graduating soon, or if you're looking for internships, other opportunities, definitely check out the show notes. We'll have all the details in that. And yeah, I just wanted to say thank you, Craig. This is an amazing interview. You kind of covered really Kubernetes start to finish. We talked about how people can get started. There is a lot of complexity. We didn't have time to dive into like Helm and all these other things, but there's a rabbit hole there. And maybe there's a future show there. But we gave folks the right ingredients to get started. And I can tell you, I kind of wish these tools were there. I always did distributed machine learning. My whole PhD was on that. And

Starting point is 01:21:57 I really wish that these tools were there because I was manually copying C++ binaries around and using Slurm and MPI and all these painful things. Unzipping jar files. Unzipping jar files, yep. So definitely, I think it behooves people to really learn Docker and Kubernetes as early as possible. Just like it behooves people to learn source control. And so, yeah, definitely, if you have any questions about anything you heard, don't hesitate to email us, programmingthrowdown at gmail.com. We can also kind of pass your question around. Otherwise, you could also reach Craig on Twitter.

Starting point is 01:22:39 You can follow Craig and see what he's talking about on Twitter, twitter.com slash Craigbox. And we'll put all of that in the show notes. I used to host a very popular podcast on Kubernetes. And since leaving Google, I have pivoted to writing a mediocre newsletter, which has my thoughts on the Kubernetes ecosystem every now and then. And until such time as I am more permanently

Starting point is 01:23:00 behind the mic again, I will encourage people to catch up with the Kubernetes ecosystem and news in the industry by reading the substack, which you'll find a link to on my Twitter page as well. Great. And we'll put that link on the show notes as well to save people a click if they want to go straight there. One last kind of administrative thing. Ironically, we are actually uploading all the episodes to YouTube. I talked about this on my Twitter. The reason we're doing it is to get captions. YouTube is amazing at giving captions in a zillion different languages.

Starting point is 01:23:34 I'm constantly getting emails, you know, hey, do you have captions? Do you have captions in Russian? Do you have captions in Spanish? And Transistor, which is our hosting company, recently offered, or maybe they've always had it, but I recently found the YouTube integration button. I hit the big red button. Now all the episodes are up there on YouTube. Definitely give it a listen, especially if you're wanting to read what we're talking about. Thank you again, Craig. It's an awesome time. I really appreciate you coming on the show and I really look forward to seeing how things go at Armo.

Starting point is 01:24:05 So thank you so much. Thank you both again very much. It's been an absolute pleasure. Music by Eric Barndaller. Programming Throwdown is distributed under a Creative Commons Attribution Sharealike 2.0 license. You're free to share, copy, distribute, transmit the work, to remix, adapt the work, but you must provide attribution to Patrick and I and sharealike in kind.

Your Ad Here

Programming Throwdown - 157: Kubernetes with Craig Box

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.