The Peterman Pod - The Co-Creator of Kubernetes On Convincing Google, Building It, and Scaling for LLMs

Starting point is 00:00:00 There's going to be an open source one. Do you want it to be ours or do you want it to be someone else's? This is Brendan Burns, the co-creator of Kubernetes, and I asked him about the stories behind building it. The hardest part actually of the project was actually articulating that. How long did that initial MVP take to build? I don't know, a little under a week maybe. He had interesting career advice grounded in his career story. I believe you can hide order 10% of your effort from your management.

Starting point is 00:00:29 Is there any upper bound where Kubernetes just cannot handle that load? Every time you change an order of magnitude, the problem moves. Here's the full episode. I mean, let's start with Kubernetes because that's super interesting. I don't fully understand the business motivation. Like, let's say I was your director or something like that, and you came to me at this and you said, hey, let's do this for everyone. I don't fully understand what would be in that strategy doc or that, what would you say in

Starting point is 00:01:03 meeting that would say, here's the impact for Google if we invest in building this for the industry. Yeah, it's funny because like the hardest part actually of the project, I would say, in those early days was actually articulating that. And I think it was really clear in our heads, but like figuring out how to convince people was tricky. And, you know, I think there was a variety of different ways that we articulated why it was important. One of them was related to the MapReduce white paper. So MapReduce, at the time, especially like Hadoop and Big Data were a big deal. I think that, you know, other things have kind of replaced them at this point. But like, map produce was a big deal and the big data revolution or whatever they called it. And, you know,

Starting point is 00:01:56 Google had written the original white paper. But Hadoop was an open source project that Google had nothing to do with and got no credit for. And they just read the white paper and they re-implemented it. And it's not the same. It's similar, but it's not the same, right? Because anytime you re-implement something, it's not the same. And so part of the argument was like, look, like, we have this cloud and we want to be influencing the technological landscape. If all we do is kick out white papers, we're not.

Starting point is 00:02:29 Like, if it doesn't run, if it's not something that people can run, they're not, we're not going to be in the driver's seat. And so that was one of the, one of the arguments. I think, you know, the other couple arguments were like, why containers, why not, you know, why people are using VMs, why containers? And, you know, a lot of that was talking about, like, look, the demands of writing software. I mean, we know internally, we know from doing this internally that the demands of writing reliable software necessitate, having systems that are sort of like autopilots for your, you know, for your application. And we know that this is something that as software becomes more and more critical to more and

Starting point is 00:03:11 where businesses, this is something that they're just going to have to have. And so that's sort of the Y containers part. And then, you know, I think that the third part was the like, why open source, right? And in some states, that's like the most interesting conversation because people are like, wow, this would be, you know, you've convinced me, right? Like you've convinced me we should build it. We should make it available to the world. It's something that they're going to find useful.

Starting point is 00:03:34 But wouldn't it be so much better if they could only use it on our platform? And you're like, yeah, that's absolutely the case. But you can't win if you make it only on your platform. Why is that? Well, because there's other platforms out there, right? And so if you make it an exclusive, then the people who, for other reasons are on other clouds or on premise, they're shut out. And so they're going to just go build an alternative. It's sort of like the Linux, you know, the reason Linux won, right? It's because

Starting point is 00:04:13 Linux could go anywhere. The whole reason that open tech and open ecosystems win is because you, the majority of people are going to be not on your platform. Like if you're, if you're not the leader, and we, you know, GCP was not the leader, then the majority of people are not going to be on your platform. And so if you make it such that the majority of people can't use your thing, they're just going to ignore you. And then they're going to go build their own, right? Whereas if you go and build it and you build it for everybody, but you make sure that it's awesome on your platform, then you have a chance of attracting people, attracting more people to your platform than otherwise. And then, I mean, also in some ways, it's just like the aesthetic of the time also, right? It's like if everybody's using Linux, and everybody using Docker, and everybody's using these programming languages that are all open, like if everything else is open source, you don't want to be the thing that's not.

Starting point is 00:05:06 Right? And there's only a few places where, like, where that hasn't been true historically in technology, where you could be different and still succeed, and you have to be so differentiated that, and I don't think that we were different, that differentiated. You have to be so differentiated such that people are like, oh, actually, I want that thing so bad,

Starting point is 00:05:25 I don't care that it's closed. So at the time, just thinking what was the competitive landscape, I guess if I, if I remember it was, I mean, AWS was dominant. They were there first and doing very well. GCP at that time, probably an up-and-coming company or I guess offering. And so my understanding then that the idea is let's pull market share away by giving kind of open source distributing Kubernetes to more and more developers. and then they'll be more open to kind of migrating or using GCP because you'll make a... They'll pay attention, right? They'll pay attention. And also like you change the dialogue too, right? Like, like, taillight chasing is hard. Right. Like if someone else has built VMs and everybody's using VMs and all you're doing is saying like, well, we're building the same thing that they have only maybe it's a little bit better because we do something else over here or, you know, whatever. Like, that's a hard market strategy to articulate. But if you create a brand new playing field where you, you know,

Starting point is 00:06:25 you are the thought leader, suddenly, like, people are listening to you. Even if they're not using it on your platform, they're listening to you. And that gives you way more voice. You get a lot more voice in the market. It changes the narrative. It changes who people are listening to. And so that control of the story is an important aspect of how, I think, how you break through that dynamic. Or you try to break through that dynamic.

Starting point is 00:06:51 Obviously, like, you know, they're still in third at this point. So, like, didn't work out. But I mean, I wouldn't say it didn't work out. I think it worked out in general, but it's still hard. Like overcoming those kinds of market dynamics is hard. And, you know, I think the other thing that happened is everybody in the cloud consolidated around it. And so now Kubernetes is just sort of a utility everywhere.

Starting point is 00:07:12 You mentioned that, I guess, that perception benefit of being the dominant offering, which, I mean, if you look at what happened in hindsight, it makes a lot of sense. I mean, that is what happened, and it's a wonderful benefit. But I guess when you were looking forward and you were talking to leadership, were you cognizant of those benefits and saying, we need to do this because we're going to kind of become the dominant offering and it's going to have all of these optics benefits? I mean, I think we absolutely wanted to make sure that we were front and center

Starting point is 00:07:47 in terms of thought leadership. And we definitely articulated that, right? Absolutely, right. Like being in a thought leadership position is valuable. It's interesting because I feel like it's hard to quantify. I mean, if I was in that meeting and we're trying to make a call, how much is that worth? Yeah. Well, I think you have to also realize though at the time, like it was pretty cheap, right?

Starting point is 00:08:09 It was like eight or nine engineers. And we kind of, and this is in some sense. I mean, it's both a blessing and a curse. Like part of the reason why we articulated and argued for having such a distinct brand, where the Kubernetes brand was separated from the Google brand, was that it kind of gave us freedom to fail also. It was like, hey, if these eight people go off and, you know, do something and it turns out to be stupid, like, we'll just kill it off. And it won't have, like, it won't, you know, it won't, it won't damage the broader perception of the cloud.

Starting point is 00:08:43 And so I think there's that benefit of the open source part of it, too. It helped with adoption, right? Like it helped us, and especially as we went to like the Linux Foundation and things like that and truly established like an independent entity, it helped ensure that, you know, people like Red Hat or Azure or AWS

Starting point is 00:09:00 could take a bet on Kubernetes and feel confident in that bet. But it also was an insurance policy against failure. And also, to be honest, like, it simplified a lot of things too because, you know, we were competing against startups, right? Like at the time Docker is a startup,

Starting point is 00:09:15 they can be way more agile than a big company. And so by virtue of sort of being a separate entity, we could be a little bit more agile also. I mean, the earliest conception of this project was you and two others kind of hacking something together. Yeah, it was a demo. I mean, it was sort of a demo almost. It was like, look at what we can do

Starting point is 00:09:35 if we just smash a bunch of existing open source tech together. What did that demo do? I mean, it was basically like a basic cube control. It was like, hey, here's a container I built. I mean, at the time, you had to explain Docker to people. You were like, hey, here's Docker. I used it to, like, build this container image. And then you could run it and deploy it and see that it had gotten distributed across a bunch of machines.

Starting point is 00:10:01 And that you could load balance to it because you'd hit a single endpoint and it would go, I'm Replica 1. And then you'd hit reload and it'd go, I'm Replica 3. And, you know, so it like showed that it was replicated. And then basic health checking. So like if you killed it, it would come back and a V2 to V1 to V2 upgrade. That was about it. How long did that initial MVP take to build? I wrote it in, I don't know, a little under a week maybe, something like that.

Starting point is 00:10:31 I mean, like, I don't work on the weekends, so maybe five days, four days, five days. And did you drop all of your existing work? Because I'm imagining you had existing project work and this is kind of extra credit stuff that you were working on. Yeah, well, I mean, I wouldn't say I dropped it, but like in a time scale of that time scale, like, you can kind of like slack on it a little bit, you know, like, like you could be sick for a week. I mean, I guess the thing is like, you could be sick for a week, you know? And I'm not saying that that's not what we did. That that's not what I did. But like, but there was enough flexibility in the system that like you could hack it together. And I mean, believe me, it was hacked together, right? Like every possible shortcut to take. And I think one of the things I've been good at historically is integrating other open source projects together, seeing how you can take stuff off the shelf and put it together. And so, you know, a lot of the nuts and bolts were pieces that we could take from other open source projects and kind of combined together with glue and glue code to, you know, to give the feel of it. So that helps too. I think a lot of software engineers, when they hear this kind of story, they think, oh, I have my existing responsibilities and I can't necessarily go off and build this thing, even though I think it's a great idea. Do you have any advice for someone who has that opinion?

Starting point is 00:11:59 Yeah. Well, I mean, I think that what I would say there's two things. I have two answers to that. One is advice that I've always given to every single person that's ever worked for me, which is I believe you can have. hide order 10% of your effort from your management, right? So like, you know, there's there, you have slack. You have the ability to slack no matter what, right? And, you know, as you get a bigger and bigger org, actually the percent that what you can do with that 10% actually increases. And a lot of really, of really influential good ideas that I've had have come out of that. I mean, it's another, I mean, it's kind of a flip way. of saying, I want to empower people locally to make local decisions that they think are optimal for the business without having to consult up the chain, without having to ask permission, right?

Starting point is 00:12:53 And you tell people, when you tell them that, you're like, by the way, you're also going to make a bunch of bad decisions and you're going to waste a bunch of time. And so, like, you have to be comfortable with this idea of like, I'm going to try some ideas. Some of them are going to fail. Some of them are going to succeed. When I look back retrospectively, the ones that have failed were effectively a waste of time. you know, and it might be the difference between a exceeded expectations and a met expectations, right? Like you don't want to drop below meets expectations, but like it might be the difference between

Starting point is 00:13:23 an exceeded expectations and it meets expectations. And you have to be comfortable with the notion that you're going to bet five times and the payout from one of them hitting is going to be way better than the grinding to get that exceeds every single time. And, you know, I mean, I, I think there's equally valid paths where you don't do that. And I think you have to be the right kind of person is willing to take that kind of chance. So, and then that's not everybody. And that's okay.

Starting point is 00:13:54 And I think the other side of it that I, but I always remind everybody also, though, is like, you know, people say things like that to me sometimes. And I'm like, so do you play Call a Duty? Do you watch Netflix? Do you watch YouTube? You know, because like I'm pretty sure there's probably 10, 15, 20 hours in your week at least when you're doing something that's not work, right?

Starting point is 00:14:16 And I can tell you in that time period, I wasn't doing anything except for this and work, right? And a little bit of family, like, and sleeping, right? And so, like, sometimes it's also about saying, like, well, what are you willing to give up, you know, to have the space to do that? And, you know, I'm not a big, like, I'm not a big, like, work all night kind of person, but, like, it does mean, like, maybe not going to watch YouTube for a while. be not going to watch, you know, sporting events for a while. That makes sense. And I mean,

Starting point is 00:14:45 on that second point, in this case, I mean, it was such a, the returns of this project were exponential, obscene. I mean, if you put in two times the time for a year, but you, you get 20 times the impact. So it just kind of makes sense in terms of investment of time. Well, and also I think, I mean, I found personally that like, the, it was addictive, right? I mean, I think I benefited from two things. One is I really like to write code, right? Like, I, I enjoy it as an activity.

Starting point is 00:15:24 And so, like, if I'm choosing between Netflix and coding, I'm actually pretty happy just coding, you know? So that's a benefit for me. And then for me, anyway, like, once people start using it and they're excited about the project and they're putting issues on GitHub and all of this kind of stuff, like I'm just addicted to it. Like I just want to close that issue.

Starting point is 00:15:45 I want to help that person out. I want to like, I'm going to go till I'm falling asleep on my laptop. And that's just because that's, I enjoy that. Like, that's why I'm in the industry. So even in the moment,

Starting point is 00:16:00 I wasn't, I mean, I was definitely not thinking about like, oh, here's this payout that I'm going to get for the rest of my career. I was definitely in the like, wow,

Starting point is 00:16:09 I just want to keep this thing going. I want to keep this, you know, I want to keep this rush going. Because it sounds like it took a while for you guys to get buy-in from leadership. I think there was a solid six months of going from a very hacky prototype to something that like legitimately we thought somebody could take and use. And laying down the right kind of round work for that, you know, there's a lot of little details that you have to get right along the way. and it's always nice also because, you know, a lot of, a lot of the people who we brought in in the early days had built similar systems before.

Starting point is 00:16:52 And so they were having this opportunity, this kind of clean, it's rare in your life as an engineer to get a clean room opportunity to rebuild something that you have ideas about how it could be better. It's like getting a second chance, you know? And so that was also, I think, really attractive to people because it's suddenly like, oh, wow, like this is a clean room. We don't have any users right now. So, like, we don't have to be fixing bugs because some, you know, big company who pays us a lot of money is asking for something or whatever. We've got this clean room time. And we've all spent a lot of time thinking about what the system could look like. And so now we get to, like, go and just build the thing that we imagined.

Starting point is 00:17:33 You mentioned that first point on hiding maybe 10% of your bandwidth from management. And I mean, that's super interesting to me. What does that look like in practice? Well, I mean, I think that what it means is that you should always have sort of a side project that you think is relevant. Right? Like, you should always have something that is, that nobody told you to do, but that you think is important that you're working on. I see. It's kind of like, I remember Google, I don't know if they still do this, but at the time there was a 20% time.

Starting point is 00:18:07 Yeah, it's similar to that kind of idea. Yeah. Yeah, exactly. Okay, so when you say hide, you don't mean hide. You say manage expectations so that your manager is also okay with you working on another project. Oh, no, I actually do mean hide, right? Like, don't ask permission, right? Like, sooner a lady will show it to them. But like, it's pretty, like, you need a solid, I don't know, a couple months or whatever to get to like a, a place where it's something you could show to somebody. Right. And so like it's all about saying like, I'm not going to ask permission. I'm going to go build something that I think is important and useful. And then obviously when it comes time to launch it or whatever or put it out there, like then you do have to ask permission. Right. And and so then yeah, you say like, hey, I built this thing.

Starting point is 00:18:55 But you've had that time to like get it from because I mean, I don't know. I feel like it's, hard to articulate the value of something, like in a dock or in a PowerPoint. It's way more effective if it's like a running thing that you can like somebody can interact with. Right. So getting that time to basically build it up into something that's real and could ship. Because also like in some sense, your manager is always assessing like, well, should you spend your time on that or should you spend your time on this? Right. And by building it, you kind of like force their hand. Because it's no longer, like, should you spend time building this or should I do tend time building this?

Starting point is 00:19:36 It's like, I've already built this. Like, do you want to ship it? And that's, in some ways, a much easier decision. Like, I don't know about easier, but like, it's like, it's not an either or. It suddenly becomes just sort of about, like, is your idea good? Right? Because the work is done. I mean, in the happy path, I feel like it's a great idea.

Starting point is 00:19:55 You launch this thing, has impact. It's great. But what about in the case that you work on this thing and you didn't tell anyone about it? and then it, no one cares when you launch it or it's not as good as we thought. Yeah. And then like that's the flip side, right? Like, you have to be comfortable. I mean, as I said, like, you have to be comfortable with that idea that, you know,

Starting point is 00:20:17 you're going to waste some time. And maybe they'll be waste time in the sense of like, wow, I could have been like watching Netflix or, you know, whatever. And I, instead I wrote a bunch of code that nobody liked. It could be wasting time in the sense of like, wow, I could have gotten promoted. I could have done enough work to get promoted. And I didn't because I thought this was the great idea that was going to get me over the home, but it wasn't.

Starting point is 00:20:42 And I think you just have to be comfortable with that. Like, you know, it's taking, I mean, it is taking a risk. It's not unlike in some sense, like doing a startup or something like that. It's taking a risk. So, you know, I think, and you can't assume, I think sometimes people are like go into any of these sorts of things. And they're like, oh, I will have idea and it will be amazing and that it will hockey stick. And like, I think if that's your mindset

Starting point is 00:21:06 when you go in, you're probably setting yourself up first in disappointment. You know, you have to go in with that mindset of like, I think this is good. I'm going to try, but I'm okay if I fail, right? Like, I know that I'm making an explicit choice here. I also imagine at some point at the highest levels of engineering ladders, you, you need to take that risk to get promoted to higher levels. For instance, like, if you're a staff engineer or senior staff engineer and you ask your manager, how do I get promoted? They'll often tell you you need to figure out what that project is because I don't, I can't just hand this to you because it's starting to become more ambiguous. Oh, yeah, for sure. Yeah, I could totally see that this becomes a necessity

Starting point is 00:21:50 at some point. If you want to kind of grow in, I mean, this project also got you promoted as well at Google, right, from staff at some point? Yeah, I know. I mean, certainly my career absolutely benefited from the success there. Absolutely, right? But, and I think you're right that like there's also this aspect of at a certain point, like, you're just expected to be the person who knows enough to come up with the really good ideas. And that's just the expectation. Like, it's no longer about like, can you execute on the ideas that other people give you? You know, and that's a big part of it as well. And I think there's also like disproper. I mean, I was the other thing I tell people sometimes when they're thinking about getting promoted is if you create the idea yourself entirely,

Starting point is 00:22:34 it's like blindingly obvious that it was you who did it. If you succeed and have impact in something that is a bigger project or someone else's idea, you can still have a very successful career, but it's a little bit harder for it to like be directly attributable to you. right um and but again i mean i think that again it's it's a role of the dice right and so like at some level like there probably are people out there who've tried over and over and over again and just have never had the right idea right or just had the right idea at the wrong time i mean i think one of the other things that is interesting about innovation that is disruptive is that it's a

Starting point is 00:23:16 combination of being the person who has the idea and being in a time in which the idea can take off. And so you could have the idea, but it could just be the wrong time. And it won't do the same. And it won't go in the same direction. That point on promotions, I think, I mean, if you create the scope, not only is it obvious that the credit should go to you, but it also feels kind of permissionless.

Starting point is 00:23:45 Like you don't need to wait for, I guess, management or someone to give you the opportunity. you can kind of create it. And so you have a lot more control in that process. One thing on the business strategy, I guess before we leave that for Kubernetes, is at that time, Borg, to me, felt like a competitive advantage for Google, like some secret infrastructure sauce. I would have thought people would be kind of worried about giving away any part of that to the industry. So what was the thinking there? How did you convince people that, hey, it's okay. Yeah, I mean, I think that there was a little bit of that. And I think, you know, sort of to be a little bit, make it into a little bit of a joke or whatever, I sort of, one of the things I said to people was, you know, it's not like you men in black flash people as they leave Google, right?

Starting point is 00:24:36 Like, you know, it's not like everybody comes to Google and it just stays there forever, right? And so, and in fact, as we talk to people at Facebook, as we talk to people at Twitter, we talk to people at other, you know, scale out tech companies. Like, they were all building this stuff. it wasn't really a secret. And there was also, I mean, like, Mesos at the time was, you know, not the same, but similar. And like, you could just see that there was going to be an open solution. And so in some sense, part of the argument was like, look, there's going to be an open source solution. Do you want it to be one that we can influence or not?

Starting point is 00:25:09 It's not like, do you want there to not to be an open source one or not? It's there's going to be an open source one. Do you want it to be ours or do you want it to be someone else's? And it's just reframing the choice, right? And making it clear that people understand that that is the choice, right? That you don't get to choose the proprietary option because it's just not viable. When you were building that MVP for the orchestrator, well, how'd you decide, because there were no customers or anything like that. So how'd you decide this is the minimum set of features that we need for this to before we launch it?

Starting point is 00:25:42 Sure. Well, I mean, I think absolutely, you know, we benefit from the fact that there were three of us working on it, right? So Craig was a great product manager and Joe was a great engineer and fantastic at API design. And I could write code fast, basically, I think, is sort of the, if I had to sort of stereotype all of us, you know, that's the, like, Craig was the product business guy. And Joe is the like, I know how to design, really good at design kind of person. And I was basically like, I can hack prototypes like there's no way to like no tomorrow. And I think we reflected a lot about our own experiences. And also we had seen the pain of people deploying into traditional EVM infrastructure.

Starting point is 00:26:33 And so we had that kind of knowledge of the pain that people are going through. And then, you know, at the time, there were people like Netflix who were talking about immutable infrastructure. And they were kind of advancing some of the similar concepts. And so there was also kind of like a broader movement happening that we were taking part in. And so, you know, obviously there were no, there were literally no customers. But in some sense, there were customers. They just weren't customers yet. So it's not like creating something brand new, I guess, in some sense.

Starting point is 00:27:08 Like, like I feel, I really don't feel like Kubernetes was something that was brand new. I feel like it was a coalescing of a lot of ideas that were kind of. of circulating in the industry at the time. And it just became an anchor point and a really good expression of those ideas. So when you talk about, I guess, you wrote a lot of code quickly. Did you write most of the code, I guess, for this initial MVP? Yeah, I don't know what the number is, but high 80s percentage, maybe more of the original code.

Starting point is 00:27:42 And I think I'm still a number. I mean, I haven't contributed significantly to Kubernetes in a while, and I'm still, I think, number five on the overall contributor list on the GitHub commits. And I was number one for a long time. I mean, after writing that much code for Kubernetes, which part of this system was the hardest to build? I think I'm going to say, like, because I don't think any of the specific code was that hard.

Starting point is 00:28:08 I think that the hard part was the decision that we made early on that it was going to be a really loosely coupled system. And so it's very, which is great for resiliency. Like we made this decision around very loose coupling, a lot of independent actors taking actions. There's all these control loops running all over the place, which is really good for resiliency. But when things go wrong,

Starting point is 00:28:38 it's really hard to figure out why it went wrong. Because you've got 15 different processes that are all having to work together to achieve an outcome. And so you can see that the outcome wasn't achieved. But then you're like, okay, but like what happened? And now I have to sift through a bunch of different logs and a bunch of different operations of executables and sort of reconstruct in time what happened. And especially early on, we didn't have very good, we didn't have very much consistency around log. we didn't have very good consistency about events and things like that. And so I think the hardest part, I mean, this is the hardest part of anything, I guess,

Starting point is 00:29:18 but it was when something went wrong, figuring out like why it went wrong. Because your logs are all distributed everywhere. Yeah, and everything's out of time sync. And I mean, and hopefully you logged the right things, but a lot of time early on, like, you didn't log it. Like, you know, and because it's an interaction effect, it's hard to reproduce. oftentimes it's hard to reproduce the problem. Like if the problem reproduces easily,

Starting point is 00:29:43 then it's pretty easy to fix, even if you don't have the logs, because you just go and add the logs, and then you do it again and you see what happens. But for problems that are transient, because of it's just a race condition between two or three different things happening, you can go in and add the extra logs,

Starting point is 00:30:01 but then you have to figure out how to make it happen. Right? And that can be pretty tricky. And also, I mean, I'll say the other thing is like we were all learning go. So maybe the other thing was like there's some gotchas in Golang. And we were all kind of like learning all the gotchas on the fly. I would have thought there's something that's controlling everything, right? Like there's some leader or maybe some nodes that are looking at one of them to kind of coordinate everything.

Starting point is 00:30:30 Like leader election, if the leader goes down, I would have thought would be some really challenging problem. Yeah, I mean, well, I think what the reason it wasn't that big a deal was because we relied pretty exclusively on SCD to do that for us. Oh, so at CD is another framework or open source. Yeah, at CD is an open source. I mean, it's kind of really part of Kubernetes now, but at the time it was a, it's a raft based consensus. system, key value store.

Starting point is 00:31:09 And so it was a pre-existing piece of software that CoreOS had written that implemented the Raft protocol. Because at the time, because Paxos was the original for this. And Paxos is really hard to implement because the algorithm is really complicated. And nobody understands it. Well, probably somebody understands it, but not a lot of people understand it. So then, but it's provable, right? And then right around that timeframe, people,

Starting point is 00:31:36 people had come up with RAFT, which is a provably correct consensus algorithm, but it's way easier to implement. ETCD implemented the RAFT protocol and then gave you this consensus reliable store so that you could do, it had multiple replicas and it would do the consensus there. And it doesn't do leader election exactly, but it allows, but it gives you enough primitives that doing leader-electives, that doing leader-elects, is relatively straightforward. And I guess I would also say that, I mean, two things about that. One is we decided to force all of the access through an API server. So nobody had, and this was actually something I pushed really hard initially.

Starting point is 00:32:26 But we, I think the general agreement, but I definitely pushed for it, was that nobody got to use store. Everything had to be a remote store. Like nobody got to write stuff to disk themselves. every piece of the system had to use the API server and had to use the LCD behind the API server as the way that it did any sort of persistence. What's the main benefit of that? The main benefit of that is that everybody gets to restart all the time and they just

Starting point is 00:32:53 come up and they work. So you don't have to worry about corruptions. You don't have to worry about schema changes. You don't have to worry about any of the, like everybody was effectively stateless except for the database. Like there was there's the LCD consensus. algorithm database. And that's the only place where there was state. And so as a result, the whole system was just a lot easier to make stable. The downside of it is it leads to this like loosely

Starting point is 00:33:24 couple, like loose coupling, right, where it's a bunch of independent loops mediating everything through this storage layer, and which made the debugging part harder. So those are the sort of the the trade-offs is like if you have a complete log of like, I'm in, you know, if you think of it as a state machine, like, it's much easier to understand where you are and where you got to if you're in a state machine. But state machines are a nightmare to make reliable. And so like they're easy to debug, but they're hard to make stable. Whereas the system we built was like designed to be stable, but hard to debug. The trouble with the state machine is a state machine says the world looks like this. And unless you get it exactly right, sometimes the world looks like something

Starting point is 00:34:12 you didn't imagine. And at that point, you're kind of screwed because like your state machine doesn't know what to do, right? Whereas because we had these control loops that were based on a desired state and a current state and trying to drive the current state to the desired state, like no matter where you woke up and found yourself, you kind of knew where you were supposed to drive to. And that, And that's the stable part. That's the stable part of it is that like it didn't really matter what where the system got itself. It was always trying to drive towards the desired state, you know, inspired honestly a lot

Starting point is 00:34:50 by like control theory from robotics actually. Oh, like PID. Yeah, the same same idea, right? Like, you know, you can imagine if you tried to write a PID controller to balance a beam with a bunch of if else loops, like it just doesn't work very well. And like this kind of reminds me. because I was reading in some of the design, like a large, I guess it's a feature of Kubernetes,

Starting point is 00:35:11 is that it's declarative rather than imperative. So you kind of just say, I want this to be true. I don't care how you get there instead of saying, just do X, Y, and Z. I'm curious, like, the pros and cons of that design decision. Well, yeah, I mean, and that was something that was happening broadly in the industry. Like, that's a part of the whole, like,

Starting point is 00:35:32 infrastructure as code movement that was happening at the time. so we're not the only ones who said that, but we definitely embraced it. I think the benefit is you know, you have clarity about the way you want the world to work, right? It's not like if you execute a bunch of instructions, start this, run that, do this. Like, you've done a bunch of stuff in pursuit of some objective, but you never wrote down what the objective was. right there's no record of like what you were trying to achieve i'm trying to create a website well you didn't write that down you just took a bunch of steps right with a declarative with a declarative approach you actually write it down you say like i'm trying to create a reliable website and here's what a

Starting point is 00:36:18 reliable website looks like hey system could you take the steps to get there right and so you have that record um and it obviously makes it easier for things to be self-healing because if you've written it down, now I know where I'm supposed to go back. Like, if I get perturbed from that state, if something fails or something restarts, well, I know where I'm supposed to go back to. And similarly, like, it has side benefits of, like, once I write it down, well, I can apply code review to it. I can apply unit tests to it. Like, there's a lot of, like, the mechanics of how we do software development that apply once you write down that declaration. So, like, a lot of those are the benefits. I think the downside is probably just complexity, right?

Starting point is 00:37:06 You know, in comparison to going click a click through a wizard or whatever, you know, in a GUI learning and, you know, everybody complains about the YAML and I have to learn all this stuff and, you know, like it does introduce a learning curve. Now, I think fortunately at this point there's enough education material out there that it's not and Gen A.I, too, for that matter, that it's not that bad a learning curve. curve. But certainly in comparison to what people have done before, that's probably the biggest downside. But I don't know. I think that the upsides are like up here and the downsides are like way down. There's not a lot of downside. I see. Yeah, I could also see that being helpful for,

Starting point is 00:37:47 I guess if you want to optimize anything under the hood, because you're just making a promise to people that this is going to happen. But if you want to do it in a more efficient way or something like that, then I guess it just gives you all the power to do so. Yeah, well, I mean, and that does make things like machines failing a lot easier, right? Because people don't say, run this on this machine. They just say, I want three of this to be somewhere. And if a machine fails, well, it just move somewhere else, right? Because the application isn't tied. Because I can't, in some ways, I don't know your intent, right? If you log into a machine and you start a process on that machine, is it because you wanted a process?

Starting point is 00:38:27 Or is it because you wanted a process on that machine? I don't know. You didn't tell me. And so if that machine fails, what should I do? Well, I don't know, right? But if I know you said, hey, I just want three replicas, well, then I know it doesn't matter that it's on that machine. It could be on a different machine.

Starting point is 00:38:45 You'd be just as happy. I want to shift a little bit to kind of when Kubernetes was scaling. And it sounds like a large part of this was getting buy-in from other companies and other people. And so, you know, how did you get the buy-in from, I know OpenShift was an important part of, yeah, Red Hat, and other companies that joined on? Like, how did you sell those companies that Kubernetes is what you want to use? Well, I think for a lot of them, especially in the early days, it was kind of that quote around undifferentiated heavy lifting, right? Like, they had some other objective, like OpenShift was trying to build a platform as a service. Um, or,

Starting point is 00:39:27 or, you know, they were, you know, a lot of our early users who were also contributors, you know, they were trying to build some sort of reliable web service or something like that, right? And so it was like, well, we're going to have to build this thing anyway. Why don't we all build it together? And we don't really care because we don't think that's our value. Like, we don't think our value is tied up in that layer. So we'll go contribute to your thing because we're going to get more value out of the collective than out of trying to do it ourselves.

Starting point is 00:39:54 And so for a lot of the early partners, that was a big part of the argument was like, hey, we'll let you in. And part of that is making sure that they understand that they're going to like that they're going to be equal partners. Right. Where it's not like, because it's one thing to take a dependency on something, but then you're kind of like taking dependency on someone else's roadmap. And so it was really important also to say, hey, like, you can come take a dependency on us, but also will give you a seat at the design table. So when you need new features, you can contribute those features.

Starting point is 00:40:31 And here's what we're trying to achieve. And it matches up with your roadmap and, you know, that kind of stuff. So I think that's how we approached it. And then over time, you know, people became more and more interested in being part of it because there was a growing ecosystem. So when you look at like networking providers or storage providers, you know, as their users were starting to become Kubernetes users,

Starting point is 00:40:53 users, they were motivated to make sure that their networking system worked well with Kubernetes or their storage system worked well with Kubernetes and things like that. So that was sort of a secondary layer of partner discussions that we had. Right. And that's downstream of becoming the dominant player. And I guess that's validating the open source strategy, which is you become dominant. Everyone's kind of got to integrate with you and all of that. How did you prevent Google from dominating in the roadmap?

Starting point is 00:41:22 or I guess controlling what Kubernetes would be, given that it started at Google, funded largely by Google. Yeah, I mean, I think that was really important. And I think it was a critical part of gaining adoption, right? And again, becoming the industry standard was giving it independence. And I think, you know, there's two pieces to that.

Starting point is 00:41:44 The first is getting it to foundation. So the creation, it was only a year in that we created the cloud. Native Compute Foundation that we donated all of Kubernetes to the Cloud Native Compute Foundation. And so getting the project, the logos, all of the legal stuff, trademarks, all of that stuff, into an independent software foundation with the Linux Foundation was critical, right? Because it's hard to partner if, you know, somebody else has trademarks on the Kubernetes logo or whatever. And then I think the other piece that was important that came a little bit later was writing down the governance rules.

Starting point is 00:42:28 So, you know, for the first time, for a few years, Kubernetes didn't have any really governance rules written down. It was a mistake, I would say, right? Like we didn't realize how it was really, it was something we should have done earlier, but we didn't. And so we sat down in 2016 to write the governance rules. And I think all of us were aligned on this idea that we didn't want any one company to be able to take control of the community. And we really built the community and the rules of governance to be democratic. You know, we'd never, I mean, I think that's an aesthetic from Craig and Joe and I. we never set out to be like a benevolent dictator for lifestyle project.

Starting point is 00:43:16 We always set out to be a distributed, you know, distributed ownership, democratic kind of project. And we codified a lot of that into the governance docs that, you know, continue to this day. So I think both those things together really helped make sure that it was an industry standard, and not any one particular company standard. But also I think we're critical to its success. I think they're they're duels of each other, right?

Starting point is 00:43:44 Like you can't have one without the other. People wouldn't have come on if they didn't see that it was governed well and open. Yeah, I mean, because obviously, like you were thinking about adopting it or you're thinking about, you know, putting it in your service. Like the thing you're worried about is like, who's roadmap am I betting on? When you said governance, does that, I mean, is that literally like when I think of government, like there's a constitution somewhere? That's literally what we wrote. Did you write that yourself or is that something that like lawyers do? No, no, we wrote it ourselves.

Starting point is 00:44:15 Yeah. And the span of about, it was a couple three, couple three fairly intense meetings amongst like six or seven of us. We got together and just kind of talked it through and looked at a bunch of other communities and kind of like what had worked, what had not worked, what were we worried about? what were we trying to achieve? Some of it was codifying stuff that already existed. So we had some loose organization stuff that already existed in sort of a de facto way, but didn't exist in an explicit way. Some of it was, you know, we created this steering committee that it never existed before, right?

Starting point is 00:45:03 And we just basically, and we're lucky, I think, that we were able to gather so the people came together. We called it the bootstrap committee. We were lucky in the sense that we had enough people who kind of were not, who the entire community would look at as being leaders. And we weren't, we weren't fighting with each other. You know, we weren't in fighting. So we were all kind of aligned. And we kind of got everybody. So we didn't have to be like, oh, you know, we grabbed this side and not that side. We kind of, we grabbed, we were able and it was like seven people. It was like seven people. I think, seven, and eight people, we were able to pull together a group of people that really represented everybody and that everybody kind of all respected each other and respected each other

Starting point is 00:45:50 as leaders in the space. And a lot of credit there, I think, goes to, I mean, everybody who is involved deserves a lot of credit, but Sarah Novotny, who is the, our community leader at the time, deserves, I think, a ton of credit for bringing that thing together. When you look back on Kubernetes, with an open source project, there's obviously the read aspect, which is everyone can use and duplicate this code and execute it. But there's also, I guess, the writing part, which is people making contributions. What percent of the contributions actually come from the community and what percent is actually just the main stakeholder companies just putting in their code?

Starting point is 00:46:33 Yeah, I don't have the specific numbers for Kubernetes, but my experience in open source says it's like 80, 90 percent the core contributors and like less than 10 percent to other people. It's hard, I think, in general. It's really hard to get people to contribute. Part of it is companies, honestly, right? Like, you know, companies like Microsoft, we make a commitment to contributing to open source. And so, you know, at a leadership level, we've decided that this is something that we want

Starting point is 00:47:05 to invest in. and so we're willing to have teams of people who specialize in working in upstream open source projects. But for a lot of users of Kubernetes, they're a retailer or they're a banking industry or they're like, tech isn't their core thing that they're doing. Tech is a means to an end to deliver an app for their user. And in that world, it's pretty hard to justify, well, I'm going to take 10% of my people and I'm just going to do upstream open source contributions. right and especially if the leadership is like not a technical leadership and so they didn't necessarily grow up in those communities and if you grow up in finance it's hard to explain like

Starting point is 00:47:45 what's the value of contributing to the I mean the value of taking the open source is very clear right it's free but the value of contributing back it's harder to explain and or legally like we ran into people even today it's getting better I think but like even today I've run into people who say, like, we would really love to contribute. Our engineering leadership is aligned that we would want to contribute. But our legal team is worried that if we contribute, will be liable if we introduce bugs. Right. So, like, if...

Starting point is 00:48:17 Would someone sue them? Yeah, I think that's what they're worried about. I don't... It doesn't hold water legally. And I think the Linux Foundation can give you plenty of, like, case law and things like that to show why it doesn't hold water. But sometimes that's a nudge. enough to block someone from contributing.

Starting point is 00:48:37 I never thought someone would get sued for adding a bug. I mean, everyone adds bugs on accident. Well, but I mean, but on the other hand, like if you write a proprietary piece of software and you sell it to somebody and it has a bug and it causes your house to burn down, like you can imagine you're going to sue the people, right? So like, it is sort of a legit worry at some level of like if I wrote the JPEG, open source JPEG library that ended up in the smoke detector that caused the house. You can sort of imagine the chain of logic that gets you there.

Starting point is 00:49:08 I don't think it's true. I don't think it would hold up. I think a lot of the licenses, you know, a lot of the open source licenses include indemnification language that basically says, if you use this, you're using it under your own risk and like you can't sue us if it burns down your house. But I think that that's a worry for, I mean, I've heard not, I don't think. I've heard from people that. that their companies do have that worry.

Starting point is 00:49:34 And again, in some sense, it's because they're like, well, what's the value? If I see this potential risk and I don't necessarily see the value. And I mean, and also, like, again, if it's not a core thing, if you're not a tech company, you know, is that developer really capable of, like, arguing with legal, arguing with legal about what you can and can't do? Probably not, right? Like, they're probably just going to fade away, right? So, you know, there's that aspect, too.

Starting point is 00:50:01 I remember, this is many years ago, I read this blog post that Open AI put out before, I think Open AI was kind of huge. And it says, here's how we scaled Kubernetes to 7,500 nodes or something crazy. On Azure. Yeah. Yeah. And so I want to know, you know, there's this new workloads coming in for AI. There's training, which is this huge, I guess, all at once workload.

Starting point is 00:50:30 and then there's inference, which is latency sensitive, and you kind of need it to come out instantly. How has Kubernetes adjusted over the years to handle these kinds of workloads? Yeah, I mean, I remember when, you know, like we couldn't really handle more than about 100 nodes. So it's definitely been a lot of optimization in the core systems,

Starting point is 00:50:55 and there's places where the APIs were pretty noisy. And we needed to reduce the noise level or we needed to extract components into another component so that you could scale. Etsy in particular actually is one of the, could be the main bottleneck to that kind of scale. And so figuring out how to run Etsy really well is a core part of figuring out how to run Kubernetes really well at scale. I mean, I don't think it's that different than like learning how to run a database or anything else like that at scale. Like large scale is just weird, and you just have to, you know, run it, see where it breaks, figure out how to fix it, rinse and repeat, you know. And I do think what's interesting is that while, you know, AI training as an example is a really large cluster or large scale kind of thing, you know, I think by virtue being in the cloud, a lot of our users actually have much, much smaller clusters, but lots of them. So hundreds or thousands of clusters where each cluster itself is a little bit smaller.

Starting point is 00:52:03 And I think that's not something we anticipated because we came from a world of like physical data centers where, you know, you only want one because like you don't want to have to set it up a bunch of times, right? You just want to set one up for the entire data center, call it a day. But because the cloud, because AKS, you know, you press a button pops up in two minutes, right? Like it's really easy to get yourself a cluster. So people create lots of clusters. And so I think we've also invested a lot in the community community and in Azure as well on managing lots of clusters. How do I manage clusters at scale? I think one of the jokes we sort of like we spent a lot of time talking about containers is replacing Snowflake servers.

Starting point is 00:52:47 Not Snowflake the company, but like, you know, especially handcrafted servers. And now we just have a bunch of Snowflit clusters. So the VMs all look the same, but the clusters are all weird. So, like, we have to provide people with tools to make sure that the monitoring software is the same on all of them. And that, you know, all of the Kubernetes versions are the same. And, like, you know, all this admin users are the same and, like, all this kind of stuff. Right. So that's another aspect of scale out that I think we didn't anticipate that we had to go and build,

Starting point is 00:53:17 which is a number of clusters as opposed to size of cluster. I always hear in the news that, I mean, the anticipated scale is even higher than today's unprecedented scale. And I see people are purchasing GFU's like crazy. I'm curious, is there any upper bound where Kubernetes just cannot handle that load? Like, let's say you 10X from where it is today. Is it going to break down at some point and you need something more custom? Well, I mean, I think it all comes down. It all comes back to the storage layer. Because everything, again, because there was this design decision that everything routes around the storage layer. Everything else is basically horizontally scalable. So you have more API requests

Starting point is 00:54:10 coming in from more nodes. Well, you just need more API servers. You know, you want to do scheduling faster. Well, you need to just have more schedulers. And so everything else, more or less, you can just horizontally scale out. It's the storage layer that is the bottleneck. And that's where the work comes. And so you want to say, go 10x up? Well, you're going to have to probably figure out if you can make at CD scale that way or if you need to replace that CD with something else that has the same characteristics

Starting point is 00:54:42 but can operate at scale. So I don't think there's anything like inherent in the design that would prevent. it. But obviously, you know, there's a famous quote that every time you change an order of magnitude, the problem moves. And so I think that's really true is every time you increase by an order of magnitude, what you thought was the main problem is going to become easy and then like the problem moves somewhere else. So you were network constrained, now your CPU constrained. You were memory constrained, now your network constraint. Yeah, that'll be cool to watch how, because it seems like everyone wants to. Yeah, I think it's certainly, it's certainly the case that people continue to try and push

Starting point is 00:55:18 the limits of scale. And but I think like anything else, like when there's motivation, people go and figure it out, as long as there's not something inherent in the design. The last part of this conversation, I just wanted to reflect over your career a little bit. Maybe ask you a few questions about things. And you mentioned that you had a PhD in robotics. And I hear a lot of people say they don't recommend PhD. Some people do.

Starting point is 00:55:46 I'm curious what your take on on getting a PhD is. Yeah, that's probably like if I had to have a top 10 questions or top five questions that people ask me. That's definitely in the top five, top 10 questions. And I guess I'll answer it with two different stories. One story is that at one point in my career, I ran into a guy, same company, this guy who I went to undergrad with. And he'd gone off and done startups and done the tech industry thing and ended up in the same company that I was working at. And I'd gone off and done my PhD and come back into the industry.

Starting point is 00:56:24 And we were at the exact same level. We graduated the exact same year, same degree, and we were at the exact same level in this company. And so I guess that's one way of me saying like, eh? You know, like it probably doesn't matter. It probably doesn't matter one way or the other. But I'll also turn it around and say, A, I had a lot of fun. Right.

Starting point is 00:56:45 So like I had a lot of fun doing a PhD in robotics. So that's worth it to me. And then two, I think I learned a lot about how to, I think from the PhD and my PhD advisor, I learned a lot about how to write and put my ideas forth in both written and presentation form that you don't necessarily learn in the industry. And I think that benefited me. I think I benefited my ability to, you know, argue. We talked about that six-month period where we were arguing for like why,

Starting point is 00:57:18 we should be allowed to open source this thing. I think the skills I learned in terms of presentation and in terms of writing benefited me during that time and have continued to benefit me. And then I think when I went out as a professor for a couple of years, teaching CS 101 and having to explain stuff to students who didn't really know anything about computers,

Starting point is 00:57:45 I think really helped me organize the, the initial parts of the Kubernetes project so that somebody could learn about Kubernetes, because people were coming in and being like, what is a container? What is orchestration? How do I do this? There's a lot of like just teaching that you had to do. And I think that experience as a professor thinking about how do I teach students something really helped me do a good job with, you know, teaching Kubernetes to people. And so I think those things were really beneficial. And so I guess there's, there's the two different arguments, which is one is, like, it doesn't matter. The other is I learned a ton of stuff that I think was pretty useful to my career.

Starting point is 00:58:23 And I had some fun. Earlier, you said top questions that people ask you, and I'm kind of curious, what's the top question? I think the one, the other one that I get a lot is, how do I know what I should learn? Like, a lot of, especially when I talk to the interns or the first couple of years. you know, a lot of the questions are revolving around like, AI seems really hot right now, but I'm really interested in systems. Like, should I, like, go learn AI because it's hot,

Starting point is 00:58:52 or should I learn systems because I think systems are interesting? Actually, kind of don't care what you learn. I care that you're learning. And so the most important thing is to find something that you're excited about and energized about because, you know, you'll do that instead of watching YouTube. You know, so if you're not excited about AI, like, well, you're probably not going to do a very good job learning AI, which means that you're kind of going to waste your time. But if you're really excited about systems, you'll probably put a lot of passion and

Starting point is 00:59:23 energy into it. And we still need systems engineers. So, you know, I think that's the, that's kind of a pretty popular question. I think there's a lot of, I sense anyway, a lot of like fear of making the wrong decision. And I always tell everybody, like, there was no plan. I've never had a plan for my career. Like, never, ever, ever. Like, I've always just chased after things that I thought were useful and were fun and interesting. And, you know, obviously, like, that can work out badly for people. I'm sure it's good to have a plan, probably, for some people. But, like, I also want to make sure people understand that, like, when you look back, sometimes the thing, you think were mistakes or dead ends, like actually were critical things that taught you stuff.

Starting point is 01:00:14 And so, like, worrying about did I choose the wrong thing, am I going to choose the wrong thing? Like, as long as you're learning, you're probably doing okay. I mean, what you're describing, it reminds me of, I don't know if you've seen that Steve Jobs' commencement speech, but he literally says exactly that. You wrote down somewhere. I don't fully remember where you wrote this down, but I have this in my notes. It says, the inevitable trajectory of software is death. And I just can't imagine Kubernetes dying.

Starting point is 01:00:44 But how do you see that happening? And what do you think about that if it did? I mean, I definitely stick by that statement. Although I think that the sentence before I said that was you really should never fall in love with your software because the inevitable trajectory of software is death. I am, which means don't stick with it. Don't stick with it past when it's dying, right? I am like you should always be willing to throw away stuff.

Starting point is 01:01:15 Just don't stick with it just because you wrote it. You should always be willing to throw it away. But like obviously, I think if you look historically across the industry, it's true too, right? Like, like, and quite frankly, like even within Kubernetes, like, the source code that I wrote has been rewritten a number of times. over the 10 plus years history of the project. So what does it look like? I mean, I think it looks like something coming along that achieves similar things,

Starting point is 01:01:45 but easier with more, you know, with less complexity and more utility. And I think that, you know, I can imagine what that looks like. I think some of these natural language stuff, if you could actually really get it to be an interface that worked 100% of the time, like obviously it's way easier to come in and say,

Starting point is 01:02:06 I would like a reliable web service than it is to say yam-o, yam-le, amly, yamly, yamly, yamly, yamly, you know, I think sometimes, I think it's sort of true two different trajectories. Like sometimes the trajectory is, it goes away. Sometimes it just becomes so hidden that nobody sees it, right? Like underneath Linux, there's,

Starting point is 01:02:24 I mean, excuse me, underneath Kubernetes, there's Linux. And underneath Linux, there's a processor. But, you know, people don't pay much attention to that. And there's a lot of attention now on AI. and underneath a lot of the AI is Linux, but it could be that people's focused so much on the AI that they forget about the Kubernetes part.

Starting point is 01:02:41 And I think that's happening already, honestly. Like, I feel like if I look at the volume of changes and things like that, I think we've sort of plateaued in terms of like the amount of change that's driving through the system. Stuff needed to support AI is kind of like the exception to that category. But, you know,

Starting point is 01:03:02 I'd be shocked. I mean, I guess to put it this way. Let me take the long view and say, in a hundred years, this Kubernetes is still going to be running. I'd be pretty surprised. Right? No way.

Starting point is 01:03:14 It's hard to imagine, right? That would be true. I mean, I don't know. We haven't had computing systems for long enough to maybe know for certain. And there are things that we still use. I mean, there is some stuff that we still use from back then.

Starting point is 01:03:28 Plugs are still the same shape-ish, stuff like that. So maybe. But even like something like X-86, like if you had asked me six years ago and said, is the X-86 processor going away, I'd say like, well, maybe on, I mean, obviously on mobile it did, but like in the server, maybe not. But now two things have happened. One is all the processes on GPU now. And two, like Arm 64 is now a pretty important platform on the server for energy usage and other reasons. right? And so it's pretty dangerous to predict the future because like it has a tendency of show up

Starting point is 01:04:06 sooner than you you predict. Or longer than you predict too, right? Self-driving cars. I've heard I've heard self-driving cars were five years away for like the last 15 years. Yeah. Me too. I don't know if you read books for career's sake, but if you do, is there a book that impacted your career the most? Well, I mean, I would say like early on the book that, that impacted my career the most was a book, was the gang of four book. Was it software engineering designs and patterns or whatever? Like it's a software engineering book. I see it. It's design patterns, elements of reusable object-oriented software. Yeah, there you go. So that, like, early on, that was a very influential book. It's like a late 90s or mid-90s kind of book. There's a much more recent book

Starting point is 01:04:50 called Leadership on the Line that as I've become sort of a large org leader, that's a, I really like that book. And then there's this, what's this book? It's called, I think, five dysfunctions of teams, I think. That's a really good book, too, from how teams operate perspective. If I'm understanding, if you're an engineer, check out that first book. If you're a manager or a leader, check out those second two books. Yeah, that's probably about right. Yeah, I think that's right.

Starting point is 01:05:17 And, you know, it's an evolution over time, right? So, like, maybe you'll do both. Last question for you is, if you could go back to yourself when you just graduate college and give yourself some advice, what would you say? Keep better notes. You know, I feel like there's a great MBA thesis or a great, like, book in the whole Kubernetes journey and beyond. And, like, I just don't have enough notes to do that, to write it down.

Starting point is 01:05:48 You know, we went through a lot, like, a lot of different stuff happened, and I remember some of it, and I don't remember a lot of it. And I would have been nicer if I'd kept better notes, I feel like. Right. Well, you got all the code there. Maybe an LLM can parse it or something. You know, it's not so much about the notes part. It's not so much of the code part. It's like all of the, like, the stuff you were talking about, like all the partner discussions and all of the like interpersonal stuff and, you know, all that kind of stuff. And like, I remember a lot of it, but I don't remember all of it. Cool. Well, thank you so much for your time, Ben. I really appreciate it. Yeah, for sure. Thank you. Thank you for listening to the podcast. It's a passion project of mine that I really enjoyed Bill. Another passion project that I've been working on kind of in secret is building an ergonomic keyboard that I wish existed and I finally have a prototype so I'd love to show you what we've built.

Starting point is 01:06:38 It's ultra low profile and ergonomic and I couldn't find anything like it on the market. So that's why we built it. I'll put a link to the keyboard in the description. You can take a look and learn more about the project there. We could definitely use your support. Also, if you have any feedback for me about the show, I'd love to hear it. Comments on YouTube have led to guests coming on like Ilya Grigoric and David Fowler. I wasn't aware of them until someone dropped a comment.

Starting point is 01:07:03 Also, feedback in the comments helped me learn to reduce the number of cliffhangers in the intros. So your comments definitely make a difference. Please keep letting me know what you'd like to see more of in the show, and I'll see you in the next episode.

The Peterman Pod - The Co-Creator of Kubernetes On Convincing Google, Building It, and Scaling for LLMs

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.