Two's Complement - Running Programs

Episode Date: September 12, 2025

Matt and Ben discuss running in production; from running processes in screen to battling systemd configuration files. Ben sketches out daemonization rituals while Matt channels Tolkien to explain proc...ess hierarchies. Our hosts discover that Ansible playbooks are just bash scripts with better PR, and everyone still Googles journalctl syntax.

Transcript
Discussion (0)
Starting point is 00:00:00 I'm Matt Godbolt. And I'm Ben Radie. And this is To's Compliment, a programming podcast. So we planned comprehensively, as always. So we planned comprehensively as always. And today's topic is going to be signals. and processes. Yeah.
Starting point is 00:00:32 And that is the some extent of our planning. We said those words out loud. And I said, yes, and hit record. And then we continue talking about it during the intro. Uh-huh. And we're here. So why is that top of mind for you? Is there a reason why you are worrying about this right now?
Starting point is 00:00:51 There's a reason that I'm worried about this right now, which is that I'm always worried about this, because I see part of my job as a software engineer is making sure that the software that I write actually runs and does what it's supposed to do. I know that there are lots of places in the world where as a software engineer, you're expected to write code,
Starting point is 00:01:14 and then there's another group or team or organization or outsourced company that is responsible for actually taking that software and running it on computers and making sure that it continues to run on those computers and that it delivers the value that it is intended to do.
Starting point is 00:01:35 And in some cases, those things are like very separated. Right. You might just make a PR to a function. You change a function, your test pass, you check it in, and then you have literally no idea how it ends up serving people's requests or whatever it is your company does, yeah. Right, right.
Starting point is 00:01:53 And then on the other end of that spectrum, I think you can have situations and I have definitely been in these myself where it is like, no, we're building this for the very first time. There's no infrastructure team. There's you. And you are going to compile your code.
Starting point is 00:02:11 You are going to SCP your code onto a server somewhere. And then you are going to run a screen and then exec that program in the screen. And now you can post in a Slack channel or some other place. Hey, we've deployed. production. And by production, you mean, yes, the only reason it didn't quit is because I'm still
Starting point is 00:02:33 running it in a screen. Yes, exactly. I do control A, control D in the screen, and now our production environment is... Everything's fine. Yes. Everything's fine. What's your logging strategy? Oh, we log back in and we reattach.
Starting point is 00:02:45 You check the screen. See what happens. Yeah. Okay. That does work, but I can understand why, yeah, you might want something a little more. Yes. Well, and those are the two ends of the spectrum, I think, if we are going to simplify. it down to a spectrum of like like um and and you know i think that um you can in your career
Starting point is 00:03:07 and i have done a lot of this as a software engineer you can kind of like um hop to the left i don't know the side of that spectrum and say all right well okay i obviously i don't want to run it in a screen what else could i do and then you start learning about like system d and things like run it and Supervisor D and things like that. Old school no hub was right. Right. And then log out and then you're done, right. Exactly.
Starting point is 00:03:36 And then of course, you know, you start moving into distributed vitamins to cloud. You learn about, you know, Kubernetes and Elastic, what does it, ECS stand for? I forget. Container store? No, container service. Container service. Yeah. Yeah.
Starting point is 00:03:53 Or you've got, what's the, what's the? Hashikorp thing, nomad. Nomad, similar things. All these things which are like orchestration setups that say, hey, you just tell me through some mechanism what you would like to have running and I'll find a place to run them and run them in a particular controlled way. And then you take that part of the deployment and running part is taken out of your hands. It's done by a framework.
Starting point is 00:04:17 But presumably, yeah, go on. But all these things are accomplishing what is fundamentally the same goal, which is I have produced software and I want. it to run on a computer or maybe multiple computers maybe not multiple computers it's like oh this needs to run it like exactly one or exactly one right like they can all this like something is consuming a key and there better only be one of them at a time or bad things are going to happen right like yeah so uh i think all of that uh is kind of encompassed in this in this topic of like i'm trying to run a program and how do i actually make sure that
Starting point is 00:04:56 that is happening the way that I want. And I think that we could even structure this from sort of the bottom up, right? So we started with screen and I'm just running screen and now I've got a process and it's executed. Even screen is one level too far from, I literally run the process and it's there and I'm and I'm watching it. And I'm watching it. I'll control see it, which, you know, is also valid.
Starting point is 00:05:20 But it gives us a sort of starting point. Like what happens when you fire up a process and why is that not okay? Yeah, right, right. Yeah, that's great. I love this. Okay, so it's like, so you, when you do that, you're like, all right, my plan for deployment now is I'm going to SSH onto the production server or EC2 instance or whatever you got, and I'm going to copy and I'm going to SCP my bits up there. Yeah, let's not get into packaging and deployment. That's even more public. Let's leave it at that. Some magical process happens and you have the bits that you need on that machine. Yes, I have my executable bits on the machine and then I'm just going to run it. Well, now what you have is you have a process, whose child, who's a child of an SSHD process. Probably a child of the shell that you ran it on. Oh, yeah, no, you're doing it. I mean, if you're going to, no, you're absolutely right. So if you've got the tree, it's like, okay,
Starting point is 00:06:08 the child of Bash, and then Bash is going to be a child of SSHD, and then that's going to be a child of the parent SSH server, and then that's probably going to be a child of a knit, right? Which nowadays is probably system D. Yeah, Thorin, son of Thrain, son of, It's going to be your program, son of Bash, son of SSHD child process, son of SSHD child process, son of SSHD parent process. Right, right.
Starting point is 00:06:40 So yeah, got it. Yes, that makes sense. All right. So if you, so if you naively, or maybe like, not naively, but you just sort of like, just have enough knowledge to be dangerous, you're like, oh, I've got the ampersand operator in Bash, that I could put at the end of that, because it's like, okay, cool, the production server is running on my laptop. And if I put my laptop to sleep, or, you know, the SSH session, the client is on my laptop. The server is a server. But it's like, all right,
Starting point is 00:07:05 I started on the server. Now I want to go home. I need to close my laptop lid and I need to leave. Well, what exactly is going to happen if I close this lid? Like, I don't want it to stop, right? So you're like, okay. Well, let's talk about what happens in that situation. Yeah, okay. absolutely clear. Right. So let me read this back to you. So you're saying, yeah, you're running as described the production binary having SSHed into a machine and you've closed your laptop lid. All right. So assuming or even, yeah, assuming you just close your laptop lid and nothing shut down nicely, it just literally suspended. I don't actually know exactly what your laptop will do in this situation. But let's just assume it disappears off the network instantaneously, which is also completely
Starting point is 00:07:45 reasonable if you go into like a tunnel on the train on the way home, that kind of thing. Right. then eventually the TCP connection between your computer and the SSH demon on the remote end will time out there'll be a keep alive that's missing probably or some other heart beating mechanism will go down and the SSH demon will say hey that person's gone now it's time to clean up their session it will I think kill the bash process and the bash process then will kill all of the children that it knows about, something like that, or there's some, yeah, so this is, this is kind of it, right? So what is?
Starting point is 00:08:23 Yeah, signals and processes, right? Yeah. I mean, I know that the result will be, my program will die. Exactly how that dies. I'm not 100% sure, but that's what would happen. Eventually, maybe five or six minutes later when the SSH demon times out your connection and says this person's not there anymore, it kills the process tree through some mechanism, and then, yeah, you get a phone call
Starting point is 00:08:44 as you've got onto the train telling you that the production system is down, please fix it. Exactly, exactly right. And this is maybe where we troll our listener into posting the right answer on the internet to this because I would suspect what probably happens is that the SSH demon kills like the process group.
Starting point is 00:09:08 Of course, yeah, because Bash becomes a process. group controller or whatever the name is a leader process group leader that's right there's yeah yeah there's my stephen's book uh i haven't got it here no but yeah there's okay it's probably going to send a sig term to that process group and so every process in the process group is going to receive that term signal and then hopefully gracefully shut down i don't know if it follows it up with like a sig kill at some point or not maybe it does maybe it doesn't i'm not exactly sure what no no but that would seem reasonable yeah so that you never you don't end up with loads of processes that just decided not to kill themselves and frankly i think bash will probably do the right
Starting point is 00:09:52 thing yeah for that in that process that yeah circumstance so day one we try to deploy like this we we we close our laptop lid we go home we get the unfortunate call and then we rush home and then we open the laptop lid back up and then we rerun the process of all right well i can't do that so an enterprising person might say okay you and i'm going to do you what i'm going to do is I'm going to use the bash ampersand command because I know that that will put a process into the background, right? And so I'm going to do that next time.
Starting point is 00:10:21 I'm going to run. I'm going to do it my deploy. I'm going to put an ampersand on the end, right? And then I'm going to like, now it's running in the background and now I shouldn't have to worry about this. Although if I were to do that with like a process like we were just talking about,
Starting point is 00:10:35 the very first thing I would notice is that my shell prompt comes back and then immediately loads of junk from my log file is now appearing over top of what I'm running. So even before we get into processes, there's like a pragmatic thing. So what I would probably do is redirect output to,
Starting point is 00:10:51 you know, slash tilda log.txte and then put the ampersand on the end. Right. So good. And now it's in the background. And I think we're great. And I, you know, I'm tailing that log file for a bit. And that's safe because that's a separate process. Yeah.
Starting point is 00:11:09 And now I close the laptop lid and get on the plane, a plane train, whatever. any more mode of talk what happens now right well i think what happens and i i think this because i've had this burn me from time to time is that yes you'd redirected standard out but you did not redirect standard error and so there is actually still the demon has a uh a file handle that it thinks it needs to be writing to back to your thing and so you put this in the background and you do this again and it breaks again it does exactly the same thing all over again well i think there's more than one reason so yes First of all, standard error isn't going anywhere useful.
Starting point is 00:11:45 Right. The second thing here is that although it is in the background, it's still a child of Bash. So it's got you coming both ways and maybe thirdly, thirdful, is it standard input is still potentially connected to the console, the terminal, something. I'm waving my hands a lot here because that's a very, I'm less sure about it. But I certainly know that if you try and read from the console, you'll get one of the even more esoteric signals about like, hey, yeah, you can't. You're not connected to it right now. And then you'll get stopped. And so you'll see in Bash, stopped input required or something weird like that.
Starting point is 00:12:25 So all of those would defeat you and you end up with a dead process. Right. Right. So this is where you started investigating all of the various options that you can pass to SSH when you run this because you're like, I'm going to make a script. I'm going to make a script that works, and I'm just going to run the script, and it's going to do my deploy, and then I'm going to trust that it works. And you start learning that, okay, well, I need to do the option that, like, doesn't read from standard in, because I don't want the standard in problem. And then I've got to make sure that I redirect standard out and standard error. You're saying this thing in the background.
Starting point is 00:12:58 Just to be clear, these are options you say to SSH or to the Bash. SSH. Oh, I see. So now we're not going to run Bash at all. We're just going to run the executable directly. and well so you're going to run so i'm thinking of the world where it's like you do a thing you like copy the bits up to the machine uh-huh and then you have like a separate s-sha call where you're passing the command that you want to run as an argument into right so you're no
Starting point is 00:13:24 longer running an interactive session you're just going to right that makes sense okay then that takes bash out of the equation which helps us a bit in this context although there is a there is still another Bashian solution that I think I see people go for, which is you type disown in Bash, which says, push this thing and make it not a child of this process anymore. And that probably,
Starting point is 00:13:45 probably might solve the problem most of the time. Yes. Except you've left a big like rake in the grass for that because there are other processes on the system that might wish to get rid of that apparently now orphaned process. So
Starting point is 00:14:01 that's what NoHUPS Sephora. It's like it gets with the hang up and there's some other things that it does. And then there's demonization and other bits and pieces, which I'm sure we'll get to in a second. But let's, let's put that to one side and let's go down the rabbit hole that you've described, which is that like, I'm now going to run SSH on my computer. Yeah. And I'm going to pass it rather than just SSH, I'm going to do slash pass slash to my executable with all the redirects and things set and try and run it from a server and have it live on the remote machine with all of the pipes and things, stood in puts, did error and stood out, all connected to sensible places.
Starting point is 00:14:38 So yes, yes. Go ahead. Sorry, that's where I cut you off. So you do that. And then you should, I believe, be able to SSH in separately and do like a PS tree and see that the parent process of this, the parent of this process is now one. because it is disconnected from what it was doing before. Right. From the process group that it was in before. And at that point, you maybe have something where you can close your laptop and have it hang out.
Starting point is 00:15:17 Now, hopefully you sent your logs somewhere sensible and you don't fill up the disk with logs. You can pipe it into syslog, which is something that I do when I'm trying to, trying to punt on this problem entirely. You know what? There's already a log rotation system on this machine, and it's called cis log. So I'm just going to pipe all my logs into that. Quite possibly you already have log aggregation set up for that
Starting point is 00:15:39 so that you can go and read it on like a website and all that kind of as well. Maybe you do. I mean, but yeah, if you're considering that option, you probably don't because you probably don't have any other infrastructure to lean on.
Starting point is 00:15:50 Right. Right. Yeah. Yeah. Okay. So that seems reasonable. Yeah. Yeah.
Starting point is 00:15:55 So what do you do after this? So you do this, you finally can go home now. You can shut your laptop and go home. And you're like, surely we can make this better than this. What do we do next? Yeah, right. So I still have. You make the system D job is one is, um, well, see, I was thinking another thing that.
Starting point is 00:16:15 So there is a process call, that process is a terribly overloaded term. There is a sequence of things you can do on a POSIX system to become a demon. Mm-hmm. special incantation you've got to sacrifice something that's correct yes there's a there's a pentagram involved um and not a damon also because matt damon is the only damon so aside here so as you recall one of the first uh folks at the company you still work at was also called matt and was not me and we were discussing various long-lived processes that we were designing a system to use and the obvious name was the Matt Demon.
Starting point is 00:16:59 To be pronounced, Matt Damon, obviously. Right, right. But we never did it. Anyway, demonization is, let's not get into politics. Becoming a demon, as I understand it, is a multi-step process. The first thing you need to do is fork, which gives you a new process, a shiny new process. Then you call something called set Sid, which says, I would like to become the session leader for this new process that I've been being created
Starting point is 00:17:30 because only a process group, and I'm doing this from memory, so listener, please. And although betting, this is not necessarily correct. So to take this massive pile of me hallucinating all of us. We are. Yeah, let's know. Yes. So you fork. The child process then does set Sid to become a process leader in its and a new group.
Starting point is 00:17:54 And then if I remember rightly, you have to fork again. to then dissociate yourself from any last tendrils that that previous process had. And now you're running and you are completely in the clear. It's something like that. It's some weird sequence of events, which means that you have lost all connection with the previous process. And so when you run some like system process and you pass it with dash dash D or dash sorry, then, and it immediately returns and disappears, apparently like, hey, did it do anything,
Starting point is 00:18:26 but you run PS and it's still running. That's the kind of process that it's been through. And you know, you can type jobs and it won't be there. It's like completely lost from you. And probably I don't realize that the thing you were just talking about, and I'm having the penny is dropping now, some of the flags that you were talking about finding for SSH to set it up correctly might be the ones that effectively have the same side effect. But I, having just written something that is a demon for the, if you go back to the system D conversation we were having last time, something became a demon and I went through that process.
Starting point is 00:19:00 So it's a bit somewhat in top of mind. And even though I had a demonization thing that, I still, you can choose, I think, system D, which we're going to, to say either system D runs the process and does that for it in its own container, or it's expecting it to run in that particular way. and so it can babysit different types of processes if I remember rightly. Okay, let's go back to what you said about System D because that sounds like a useful thing to know about.
Starting point is 00:19:28 Right. So just to put the problem in context, System D is a solution to a problem. What's the problem? Well, so here's the problem. You've written your script. See the last conversation we had about it as to what solution it might be. What problem it is?
Starting point is 00:19:46 What problem are we creating by? solving another problem right yeah I think actually is that a thing I feel like I've said this before on the podcast I don't remember the difference between computer science and software engineering do we know this one computer science is solving problems with computers software engineering is solving the problems that you create when solving problems with computers and yes this is a that checks out the math checks out for that yeah and so what problems are we are we both solving and creating by using system d well so you write your bash script it deploys your thing you shut your laptop and then
Starting point is 00:20:23 you wait five minutes you open it back up and then you have to space jack and it's still running and you're like all right i think i maybe believe that this is going to work and you go home and the next day you come in and still running cool and then three days later it crashes and you're like you know what would have been super cool is instead of me getting a phone call in the middle of the night because it crashed if it had just restarted well i mean wouldn't it be cool if it hadn't crashed would be what the first thought you'd have, but at three in the morning, you probably just want to go, oh, for God's sake, just restart the thing. Just restart it, please.
Starting point is 00:20:52 I'll fix it tomorrow, but can we please not call me because I have to SSH back in and rerun the script again or whatever. Right, right, right. So you're like, I just, I want this to restart. And then you Google and you're like, well, maybe I should run this in system D, right? And so you wind up making a whole system D job definition. And you, I forget where you put it? You put it in Etsy something, right?
Starting point is 00:21:18 Or is it, yeah. So there's, I don't even remember now. So, I mean, my understanding is in the beginning, there is, there was in it. And in it is effectively the first thing that the kernel executes as a user process. And it then decides what to do. And back in the mysteries of time, there were like run levels and it was all like clever directory structures and things like that. And it just fired up the right sequence of demon processes, one of which would be,
Starting point is 00:21:45 you know, an SSHG so you could log into the machine or a Getty that would actually let you type on the console to get into the machine. And that was it. And then after that, you're off to the races. And System D is the new in it. And instead of it being a set of essentially shell scripts that get run to fire things up in the right order, again, I'm probably a bit of scrape covering missing loads of bits of context here. But it's a sort of a more principled approach where you have units that I like, I would like this thing to run, please. I would like this to be true under these circumstances. And it depends on these other things that also need to be either running or at least
Starting point is 00:22:22 have started before me. And so instead of having essentially numbered directories with, you know, 40. Yeah, RC.D or RC.1, RC.2, something. Yeah, those were the run levels, I think, which was slightly different because it's single user mode versus multi-user mode. But this is more like, hey, what sequence do I need to run things in and shut them down? in in order for my system to come up. And system D does that kind of the right way by actually tracking dependencies, which again was expensive and caused me, caused me problems in our last
Starting point is 00:22:53 conversation, but is the right approach and the correct thing to do. And so that's what system D is. It's like the the overarching orchestrator of a computer and all of the processes that are running on it. And so yes, to make something running system D, you put a file in the right magical place. You issue the correct incantation to system D to go and notice that that file is there. And then what? I'm looking at you because I thought you might have just done this and you could answer the question. Well, you need to reload the system D. Yeah, there's like demon control reload or something. That's the magical incantation that says, hey, system D, look if through your configuration files, something has changed. Yes. Please do the needful.
Starting point is 00:23:42 now. And then it should start up. And then you're using something like journal CTL to look at the logs of the thing to make sure that it started. Which is, I think for most people, when Linux systems particularly moved from in it to system D, the biggest frying pan to the side of the head was, where are all my chuffing logs? They used to be in var log, whatever, and that's burnt into my mind. They are text files and in var log blah and system d stop that and now there are a few logs in var log but nowadays you have to interact with it through and it has a binary log file format as I understand it behind the hood and you have to learn journal control journal CTL which I still haven't learned and I still Google the same thing over and over and over again and type in the thing that tells me to do which is note to
Starting point is 00:24:30 self don't don't do taking a note there don't do that make a cheat sheet for it stick it to my monitor to like all the other cheat sheets I have. Yeah, so that was, but that was like the, that broke most people, I think, because I didn't have to interact with adding and removing demons from my system. That's what, you know, my package management system did.
Starting point is 00:24:48 But whenever something went wrong, I'm like, where the hell's the log file? Anyway, so, it's a magical program called journal control. Okay. I feel like this is like, like, I want to go to the next level now. It's like, okay, cool.
Starting point is 00:25:02 We're going to run this on like two computers because we discovered. Let's finish the thought. The reason it crashes. It got even killed. Right, right, right. Let's just finish the thought there. So very concretely, you would install the binary to a known good location, which you probably
Starting point is 00:25:17 were anyway. It wasn't just your home directory, hopefully. Yeah. Pick a user that you're going to run it as. Yes, that's true. Might be root, might not. Yeah, let's hope it avoids being root if it can. Yeah.
Starting point is 00:25:28 But then, yeah, you make a little text file that sort of, it looks like Toml-ish to me, that system D-config-ish file. that says, hey, I need these things. I provide these things, which you often don't have to do. This is how I'm going to be started up. This script needs to run before I run. This needs the script needs to run after I run. There's a few like customization points you've got like that.
Starting point is 00:25:51 And you can say what you're wanted by as well. So in this instance, you probably say, I'm wanted by multi-user dot target, which is like a magical sort of target that says, hey, when it becomes a multi-user system, the fifth, whatever, run level five. then this is, I am saying that I am wanted by it, which is a way of you kind of going the other way around from the usual dependency saying, it depends on me.
Starting point is 00:26:14 Right, you're joining the dependency tree there. Yeah. So now when you start, when you reboot the machine, your service will come back up. And then you can have some policies about retrying, restarting it, maximum number of times to restart,
Starting point is 00:26:27 how often to wait between, how long to wait between them, those kinds of things. And then effectively, it runs itself after that. So that's what we do. Yeah. Yeah.
Starting point is 00:26:35 And so your installation process, is copy the binary bits up and make sure that this system deconfiguration is there. And then obviously, if you want to restart it, there are processes for restarting service, restart and all that kind of good stuff. Yeah, service ETL. Yeah, I still use service space, service name restart. There's almost certainly a hundred ways to do it. Honestly, I still want to go var run, blah, or whatever the whole thing was.
Starting point is 00:27:01 I actually don't know what the command is, but it just comes out of my fingers when I need to say, make that thing run again. But yeah, service space, name of things, space restart is now what I've learned to do. But okay. Okay. So that's where we are. Right. So now we're good, right?
Starting point is 00:27:15 We know that the process is being appropriately managed by a piece of software that's designed to start it up at the right time and keep it running. It also has some handling for like if it does output to stand it out, it'll go to a well-defined log place inside this journal control thing. If it crashes, it will restart it. If you reboot the machine, it'll come. back up with it if you set that to be so everything is wonderful so what's next right so what's next is that you discover that the thing just crashes every four or five days uh because it's running out a memory because it needs to run on more than one computer it is too big so you have to now run it on multiple computers and you're assuming whatever work you've ruled out the there's a memory leak
Starting point is 00:28:01 type yes it's not a memory leak yeah it's just too much beta yeah too much yeah too much yeah So what do we do now, then? So now we need to run it on multiple computers. And so, like, one thing you might reach for here is Ancible? I was going to say, is probably duplicating the line in the SCP-E-space SSH machine, service power restart, and just do four-host in. For host-in, host list, yes. And just do the exact same thing.
Starting point is 00:28:30 I would do, right, at least to start with, right? That's the V-0 of anything. Yes. Well, okay, let's deploy it to the two computers I know about. right now and just do the same thing on both of them and then that is probably what I would do and then I would have the thing where I would try to deploy it and there'd be some package or some configure oh we got to increase the size of the maximum size of the receive buffers on the network and so now I've got to like go and change that configuration I got to change it and I've already
Starting point is 00:28:57 scaled this out to like 10 computers now like every month for the last you know 10 months I've been just adding another having another host the list of hosts yeah and that's And now it takes like, you know, three minutes just to iterate through all of them. And I'm like, oh, and I have to remember to log in and set all these settings every time I had a new host and it's getting worse and worse and worse. Okay. So we've now gone firmly outside of signals and processes. And now this is like the setting up of the machine here is what you're talking about, which is well valid. And if you think of, you know, the system, sorry, the system D configuration unit file, whatever we just said, as being part of this machine configuration, then.
Starting point is 00:29:36 And it does make sense to talk about some of the other things that you might need that machine to have set up, like packages. And as you say, system setting. So let's segue into that. Let's do it. Yeah, okay. So you've decided that now, okay, I need to retire this BAScript. It's served me well, but it's time to move on to something a little bit where I don't have to like build all the stuff myself and make sure that it works and troubleshoot it all. So I'm going to try to use ancible, let's just say.
Starting point is 00:30:08 What is ancible and what makes something able to be anced, which is presumably what it means? Well, first you have to have pants, and you can have ants in your pants. That would be pants. And then, pansable. That's going to be the fork of ancible. Okay. Ancable. So Ancable is honestly a tool that I have only used sometimes.
Starting point is 00:30:30 It is not, I sort of like wind up making the jump. from like the shell script to like terraform. That's usually what I do is I'm like, all right, I'm gonna go and I'm gonna have something like Nomad Manage these or I'm managing the cloud. So at that point, you jump straight out into sort of an orchestration environment
Starting point is 00:30:51 as opposed to you, I'm controlling individual machines because that's the other thing in here, that host list and provisioning of those machines. We're assuming that these machines exist and you haven't got to like make them appear in EC2. But let's go through what ANSI is. But real high level, ANSIBLE is, you write a playbook and I think that playbook is pretty much in YAML and it's got like the steps that you want to perform and there's like a lot of sort of baked in things of like oh I need to copy this artifact from this place to this place cool I need to create a configuration file here cool I need to restart system D cool it can do all those things for you and there's lots of baked in tools in Ansible to sort of do the typical system management things you can install packages you can create users
Starting point is 00:31:34 you can you know because it's like hopefully like you said we weren't running this thing as root so we had a dedicated user for it i need when i'm setting up a new machine i need to make that user i need to make sure they don't have a password that they have the right sage keys you know all those kinds of wonderful things um so you have some you know script or some playbook that you run you know as root because it needs to be able to do all these things but then it sort of sets up the environment and then like subsequent deploys and things can um you know kind of uh the program can run as as a user and it doesn't need to get got it right that makes sense so it is it is essentially a canonical canonic that word um of what it the steps that you need to do the
Starting point is 00:32:18 playbook i mean that's a good name for it right like it it replaces the playbook which is the you know the google doc that you have that says when remember when you create a new machine here's the 25 steps do you have to do and you kind of roll your eyes and do them. And it's like, well, let's automate this. And it does it in a principled way using, with a bunch of support files that help you make support functionality that lets you do like ad user rather than having to go, whatever steps. Actually, you'd have to take to add the user, which I forget these days.
Starting point is 00:32:45 Okay, that makes sense to me. I think one of the things that I have had difficulty in getting my head around when looking at these sets of tools. And only because you've mentioned Terraform, one thing I like about something like Terraform is that you kind of describe the end state. Yeah. And Terraform's responsible for getting whatever the current state is to the end state. Yeah. So whereas with things like Ansible, as I understand it, is you have to be very careful to either be item potent so you can run the same thing twice.
Starting point is 00:33:22 and it doesn't re-ad another user if there is one already called that thing. Or you just have to not run that step again. You know, like, hey, once we add that user, don't try and do it again. And then you kind of go like, well, now I want to change the user to have a different, you know, full name or a different shell or whatever.
Starting point is 00:33:39 You're like, ah, now I have to run the change command. Right. I can't just change the ad. And Unix systems are so, so complicated. I can't actually imagine how you could write a more, a general purpose, like make my system look this way thing, except for at least one listener, somebody is currently shouting Nix into the void as they're walking along. And I know that Nix solves this in a very cool way, and I'm very excited by it.
Starting point is 00:34:10 But I don't have any personal experience with it other than someone demoing to me and me going, wow, that is super cool. Yeah, I've heard those same things about Nix. that yeah Nick seems to be it seems to be like a kind of a mind virus that people get not in a bad way necessarily that that does sound majority but like because once you get it I think you're like oh my gosh this is how everything should always be done yeah and that's great and you become like proselytize it to everybody and then most people's eyes glaze over and you're like that seems great and then you just log back onto the machine and just go pseudo apt install bomb and you're like
Starting point is 00:34:46 there we have we done anyway back to oops I've just banged my, sorry editing Matt, I've just, I've just whacked the microphone stand. So where were we? So I was sort of saying that there's a sort of difference between sort of prescriptive run these things in order and maybe they're item potent or maybe they can adapt and say like, well, if there's a user already there, don't re-ad it, that kind of feeling versus the Terraform thing where you just say, I should like this to be the end state. Here is a list of users the machine has to have with the properties that users have. Right. And then Terraform goes behind the scenes and goes, well, look at what users I've got. Oh, now I'll make a
Starting point is 00:35:19 plan. A plan is add three users, delete one user, and presents it to users. This is what I'm going to do. Yeah. Have you ever actually used Terraform to do that type of system administration before? Not on a system, no. I've only ever done it with infrastructural components. Yeah. So yes, that is true. I've never used it for. I don't know if it can do that actually. I don't know that it does. You're right. Yeah, now I say, but suddenly that's where I was going with that was less that terraformed specifically, but like the phrasing is either outcome or steps. And I, you know, it's nice to supply the outcome. But yeah, I don't know if something does exist.
Starting point is 00:35:57 And my only interaction with things like that are with Packer, where I always start from a script, an empty image and then run the sequence of steps to make an image that looks the way I want it to. So I never go back to it and kind of go, hey, I want that image, but slightly different. So, yeah. Yeah. Yeah. Yeah, maybe that's the, I feel like this, this podcast is like the rough draft. of a conference talk because it's like imagine that you want to run a program what do you do and you just sort of work up from the bottom up and then I think that's like when was the last time you gave a conference talk come on it's your turn oh it's been a long time I I'm probably overdue very much part of the the last week's conversation the reason I was looking into that
Starting point is 00:36:41 was because I was avoiding writing several conference talks that I have to give in about a month's time and a week has passed since we last spoke now I'm giving away all of our secrets although much longer will have passed in real time and I'll probably given the conference talk by the time I've released this so listener you can be the judge of whether it was any good or not but yeah I have done no work on it at all so oops but yeah this is a rough draft of a conference talk on it is so you want to deploy a service so you know exactly yeah you want run some software, right? How are you going to do it? And I feel like the punchline of this is like, okay, and now we're migrating this all to the cloud and we're going to use Terraform. We're going to
Starting point is 00:37:24 use GCPR. Or maybe you have like, you know, a lot of companies I feel like these days have like an essentially like an internal cloud. Like they're still using Terraform, but they're using tools like Nomad and they have their own, you know, physical servers and they have an infrastructure team that's managing it all. And this maybe leads us back. This is how you get this. Okay, this is the whole website. This is how you get into the state where you're just like, yeah, I just, like, changed one function with some unit tests and pushed a PR, and I have no idea where it goes. Yeah, that's exactly right. Yeah.
Starting point is 00:37:55 Uh-huh. Yeah. Well. And now the circle is complete. And now the circle is complete. Yeah. I think we've, we've probably reached a good spot then. Yeah.
Starting point is 00:38:07 It's good to know these. I think like all of these, everything we talk about, really, certainly everything that I hold dear that we talk about on this, on this podcast is all about. finding the right level of abstraction, knowing that there's a level beneath you, which in this case, you know, maybe your level of abstraction is those cloud tools that we've just been talking about and the services they're on. But knowing enough about the level beneath you to say like, okay, I do know that there are processes that run and that something is taking care of the input and output for those processes and making sure the right signals get to them at the right time and not the wrong things
Starting point is 00:38:42 like me logging out. But I know that it exists and maybe I could sketch something, but I don't, necessary note of the top of my head. And then you should know beneath that what that something exists, right? Beneath that layer, we know that there is a system D and I don't know how that works. But it's always good to have a decent understanding of the level beneath where you're working and then be aware of the layer below that. Right. No vaguely what to Google. Right. Or ask chat GPT. Or ask your favorite. Yeah. Ask your favorite. Yeah. Yeah. And so I think this plugs into that kind of mindset completely as like yeah. Yeah. It's kind of like.
Starting point is 00:39:16 know how the cloud works and then know where to look when it doesn't work. Mm-hmm. Mm-hmm. Yeah. Like if you, I, honestly, the only downside to this is that in those environments, I feel like where you have those like, you know, a million layers of abstraction between you and the physical server, if you're like an old fuddy-ddy-ddy like us, and you're like, can't I just S-S-H-N?
Starting point is 00:39:41 It's like, no, you can't have root. It's like, why? I know exactly what to do. I know exactly how to fix this problem and now I'm going to have okay fine sure yeah well and of course the irony is they can probably give you route but it's not even on the real computer
Starting point is 00:39:54 because several layers of virtualization away from the machine that's actually running you talk about it's like it's running in a container service there's no route to give you like you can't get there from here right yeah yeah yeah yeah cool all right friend well this has been great yeah we jammed it we did it not bad for winging it yeah listener you can let us know post a comment
Starting point is 00:40:15 somewhere. I mean, some people watch this on YouTube and that's where I see most of the comments and then otherwise tweeted us or hackied them, bio mastodony thing, or just email us. But we'd love to hear what you think and what we're doing right and wrong. We've never really asked that.
Starting point is 00:40:33 We just do this for us. This is just our excuse to catch up, isn't it? Yeah, that's true. Cool. All right, friend. Well, have yourself a great weekend. And don't speak to soon. Until this. You've been listening to Toos Compliment, a programming podcast by Ben Rayleigh and Matt Godbob.
Starting point is 00:40:54 Find the show transcript and notes at www.2.2.complement.com.com. Contact us on Mastod. We are at Tooscomptlement at hackyderm.com. Our theme music is by Inverse Phase. Find out more at InversePhase.com. I don't know.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.