Two's Complement - Unix Commands for Wizards

Starting point is 00:00:00 I'm Matt Godbolt. And I'm Ben Rady. And this is Two's Compliment, a programming podcast. Hey Ben. Hi Matt. How are things? Good, good. I've been thinking a bit about the tooling that we use.

Starting point is 00:00:25 And you and I both work in Unix environments predominantly, or Linux specifically in my case, and Mac, OSC type things. And there's an awful lot of tool crafts that we've picked up, both you and I, over the years. And with certain people we've worked with in various companies have been even better at this. I remember pairing with somebody particularly and learning a whole ton of stuff about it. I figured we should talk a bit about the kinds of things that we can do and have done and use the tooling for in the Unix shell.

Starting point is 00:00:57 Because I'm always surprised when I meet somebody who goes, how did you just do that thing? And I'm like, oh, it's just shell. Right, right. Let's start there. I think that's a great idea. I guess what is your top X, where X is as many minutes as we can do? Oh, man. Unix command line tools.

Starting point is 00:01:15 The one that immediately comes to mind for me is using sort and unique. I was just going to say the same thing. Get out. Yep. It's like the flathead screwdriver of Unix tools, right? It's the thing that you use for everything when you don't know how to use anything else. Right. I mean, I've got a big list of things and I just want to get a sense of the data, right?

Starting point is 00:01:37 I've got log lines and I'm like, well, how often is this thing happening? How often are things of this thing happening or how many how often are type things of this type happening so you might chuck in a grep oe dash oe you know to say only output the bit that i'm going to tell you in my little regular expression and then you pipe it through sort and then you pipe it through unique dash c usually and i guess we should explain what all of those things are and why but you end up with like a little list of hey there are exactly 50 instances of this string, 48 of this string, 10 of this, and then three of those. And that's exactly what I needed to know about this log. So I'm glad that we picked the same one there. That's a good sign. Yeah, that's a good tell. Yeah, you sort of use that to make a little histogram.

Starting point is 00:02:19 I guess it's also kind of like a group by. Isn't that what that is in a way? I suppose, yeah, it is a group by. It's very much like a group by, isn that what that is in a way i suppose yeah it is a group by it's very much like a group by isn't it because you're saying aggregate but on this key and then say how many of them there are it's like you know order by blah sorry group by blah count star yeah so in the pipeline i just described the the grep thing takes an input and it'll find just a snippet that you're interested in now if your files are you just want to find the unique types, counts of the lines of a file, the different contents of lines of a file. You don't need the grep file. But then you sort them to get them so that exactly the same lines are one after each other.

Starting point is 00:02:58 So if your input is A, B, C, A, line, each line, then you're going to end up with a a b c right and you're really only doing that to get the unique dash c to work that's right yeah that's a very good way of putting it yes yeah i i kind of wanted it because it's not like you want it sorted particularly right in fact you may re-sort it at the end which is what commonly one might do so you're getting it so that the the unique which is the unique is designed to drop uh subsequent or rather what's the word i'm looking for not subsequent um drop equivalent repeated lines of a file that's what it's designed for but it can also say how many duplicates it encountered as it as it went through so when you're piping a a b c through it it sees the first a goes that's

Starting point is 00:03:42 great that's i've only ever seen a before then it sees the same line again it goes oh well i'm not going to output that line but i'm going to count it and then it sees b and goes oh okay that's the end of the a's i will never see another a because it presumes that the inputs are uh or that this is what its function is i shouldn't say it's presuming anything it's not presuming anything it's just what it does so then hey i saw two a's then it'll say i print one b and then one c and you're done and that's wonderful right and so you get a count of each of the individual inputs to uh unique lines i should say not input now of course the result of that is a big list of like hey i saw a once or sorry i saw a twice i saw b once and i saw c once and that's great and then in the example i just made up on the spot they are in a useful order potentially right because it's like hey i want the most common but often

Starting point is 00:04:30 i want to say no just give me the top 10 right of those and so just like in sequel thousands of categories but you want to know what the most frequent one is right you know you exactly like in sequel if you don't give it a sort key afterwards, then the sort inside your pipeline is not sorting it usefully. It's just an implementation detail of the way your group by is working. But then you want to sort by dash n to say, sorry, sort dash n, which means sort numerically, which means it's going to interpret the first part as a number. And then you can pipe it through, say, less.

Starting point is 00:05:04 And what we've done then is or or head minus 10 to say just show me the first 10 of those and so in in what we've got grep sort unique another sort less five unix commands one after another and we've written a sql query for all intents and purposes on a line-based text file right so when you're like hey uh i noticed we got some user login rejections does that how often does that happen it's like oh we got a hundred thousand of those today oh wow is that a lot or does that happen every day then you go use these tools to sort of figure out if that's true right of course it's no substitute to actually having like metrics in your application that do it but like we all have been up against the gun

Starting point is 00:05:43 trying to make these kind of like what the heck's going on with my system and we are again joined by guest guest puppy in the background i apologize for the noise oh he's always welcome he's adorable yeah i guess away with um right but yes you were making a point about something technical uh yeah well yeah we were talking we were like like you said you know these are the things that you do like we talked about structured logging and observability in another episode and like if you know

Starting point is 00:06:13 if you have some inkling ahead of time of the things that you want although you know I can make some arguments for structured logging that would say like anything that you write to a unstructured log file you can write to a structured log and it will be strictly better but not every system has this kind of logging in place sometimes you know you start out with something and you don't necessarily know what you want your structure to be or whatever it might be so you have these things and being able to do this is just super useful plus like the logs that

Starting point is 00:06:37 you're reading aren't necessarily yours right right sometimes they're operating system logs or other other software's logs that you're doing this to. It's Apache's log file that just happened to be on the voxel. Right. Uh-huh. Right. So you need to have these. I feel like you need to have these skills no matter what.

Starting point is 00:06:55 And, of course, the next one that you want with that chain of that pipeline that you're talking about is awk, right? Because it's like, oh, the thing that I want is this particular column, right? Right. And awk is like a whole programming language. But mostly it's used for print. But mostly it's for print. Yes. Yes.

Starting point is 00:07:14 Although I will make an argument for another tool. You talk about awk first. Well, I was going to say there are other tools. I think I was going to say the one that I use for this is also cut. Cut. Yeah, yeah. That's where I was going to say the one that I use for this is also cut. Cut. Yeah, yeah. That's where I was going. Cut is a little bit easier than specifying $3 if you have a very clear non-space-based delimiter. So, for example, if you know that your JVM dump or whatever has, oh, it's between the second colon and then the third colon in some list of class path or whatever, then can like yeah cut dash d colon which means use colons as a delimiter and then you can pick you know three

Starting point is 00:07:49 as the third thing and that's that's another super way of just filtering the bit of the data that you want but awk is a full programming language yeah yeah and you can and i i honestly feel bad for not having used awk for more than just print dollar whatever there's a few things i think i've used it for but i always have to google those things i use it to sum up a numeric column that's another thing that i can do and also average it's quite easy to write although again yeah there's a little bit of stack overflow used here but if there's something where it's columnar and i just need to go over something with a relatively straightforward thing I can just use orc out of the gate to do that kind of thing or a running total or stuff like that is again is relatively straightforward but that's the only bits I can remember is like

Starting point is 00:08:32 begin something magical end something magical and then the line itself of course if you do have the benefit of a structured log you may have written your structured log using json because a lot of people do that in which case you're probably going to want to use another command line tool called JQ, which is an amazing tool, not just for dealing with structured logs, but dealing with any sort of JSON data. If you're interacting with a web service, one of my favorite tricks to do is to take, you know, I've got some web service, I'm trying to explore the API, and I've got the documentation there, maybe, and I want to see if the documentation is right, or I want to see what the actual underlying data is. And so what I'll do is I'll put together a little bash script that

Starting point is 00:09:14 curls that API, and then pipes it into JQ. And then I'll run that whole thing in another command called watch, which is another something that we've talked about where they'll just run it every couple of seconds right and now i have like a constantly refreshing view of what that what that json object looks like and i can start modifying the jq expression to select into particular elements and explore the whole tree of the resulting JSON object in a very interactive and fast way, right? So you sort of can walk over the whole tree and like, ooh, this value is interesting and this whole array of things is cool and, you know, ooh, we're going to need that value

Starting point is 00:09:57 and this doesn't match up with the documentation. And, you know, in just a matter of a few minutes, you can sort of see everything that there is to see about an API in a very interactive way. So those tools together, I am a huge fan of. But JQ in general is pretty fantastic. JQ is amazing. I think JQ is the first of the commands that we've mentioned so far that isn't sort of like a BSD staple, right? Everything else is part of the Unix-y environment.

Starting point is 00:10:22 Yeah, Unix, Ork, I think we've said Cut. They're all like, you've almost certainly got them on your machine already. Just type it in, whatever. You don't need a pseudo app to install anything. JQ is a separate process. I think you can probably get it for most applications. I think I installed it on

Starting point is 00:10:40 my Debian thing here. But it's a single static binary. You just grab it and chuck it in your bin directory. That's another fantastic thing about it, right? Like you said. I think I would be a little surprised if on most newer Linux distributions you didn't get JQ out of the box, but I could be wrong about that. I'm pretty sure it needs

Starting point is 00:10:56 to be installed. It's like one of those things that is in one of my Docker container things for Compiler Explorer to like, just, hey, I expect these tools to be in my machine it's like there it has to be there yeah but yeah that's it that's i use that all the time you can do math in it you can do transforms in it you can do all kinds of crazy stuff in it too so it's sort of like you know if you get into sort of heavy scripting and not not so much just exploring you

Starting point is 00:11:20 you can damn near write programs on it if you try geeking out a little bit about it it's just a really nicely written piece of software as well it's uh it's based on something called lib jq which is obviously comes i say based on it extracted from it is lib jq so if you want to do json parsing with like a language that that gives you the kind of descriptive power that the jq query language has it compiles to an intermediate bytecode it's just it's just sweet it's nicely done it's it's uh it's it's cool and it's worth saying as well that like while the vast majority of invocations of jq are jq dash capital c for me which says enable the color even though uh even though i'm about to do something that would make you want to turn it off

Starting point is 00:12:02 space dot which means select everything hey i just want everything and then i'm going to pipe it into less capital r less dash capital r which means hey less interpret but don't try and strip out the antsy color codes that are coming your way because normally you're going to freak out about that and that means i get like a pageable colored version of the you know syntax highlighted and pretty printed version of whatever I'm piping into JQ. So you don't have to use it to even do anything like any kind of interrogation at all. You just say like, hey, it's just a really nice pretty printer that has supports color. But then as you say, you can do dot stuff. It's got its own piping internally. You can do all sorts of clever trickery. It's a great product. And yeah. So what else have we got?

Starting point is 00:12:43 On the topic of watch, actually, another tool that's similar to that is this is another one that you're probably going to have to install called ENTR. ENTR basically uses the file system notification library that is definitely already in your distro to let you run a command in response to a file system change. Right. So if you want to scan a directory for changes, and then whenever a file in that directory changes, run make or run JQ or run, you know, whatever your compiler of choices or your linter or your tests or whatever it may be, you can use ENTR to do that. And it's like a really, really easy way to create a very interactive workflow with pretty much any programming language using this very simple tool.

Starting point is 00:13:34 You can also do it for more things you shouldn't do. You can use it as a way to, you know, like, oh, I'm going to hit this, I'm going to make this API call or I'm going to hit this web endpoint whenever this file changes and post the file because you don't want to actually build the event-based system that you should be building. Oh, right. You know, I've seen it used for that, which, you know, maybe in a MacGyver duct tape bailing wire situation is the right thing to do. But, you know, it's intense and i think it's it's

Starting point is 00:14:05 really sweet spot is um you know sort of like developer tools and little mini i mean i'm used to like npm run watch and things like that the various packages will have and they'll but they're provided by no j services but if you want to do it more generally you can use entr i've seen it used under i notify stuff that i've kind of hacked together myself for some of my Raspberry Pi development stuff. I had a thing that I watched, but it was a makefile target. If I'd have known there was ENTR to just do most of the horrible heavy lifting of the strange protocol that you have to use to talk iNotify, then this would have been great.

Starting point is 00:14:36 So ENTR, that's awesome. The last time I really used it in anger, I was writing a Wireshark plug-in. And what I would do is – I think I was writing a Wireshark plugin and what I would do is I think I was writing it in Lua and what I would do is whenever I was hacking away on my Wireshark plugin and whenever I changed the Wireshark plugin I would have ENTR automatically run

Starting point is 00:14:56 a capture that I had through Tshark and spit out the resulting processed result to get a sense of whether or not I was writing my Wireshark plugin correctly. That's awesome. That's so cunning because that gives you

Starting point is 00:15:10 CICD style or local CICD, not CD, but for Wireshark, which is like the last thing I would ever expect to actually have that process. That's cool. It was a miracle when they added the thing that allowed you to reload a Lua plugin in Wireshark without having to quit it and start it again

Starting point is 00:15:27 which has been my go-to way of doing this kind of development and then you know making a pop-up happen with you know oh i got to this point printf style yeah that was a few years ago but that worked out pretty well but yeah actually so that leads us to another oh uh one which of course is t-shark tcp dump depending on your your flavor there right like tcp dump is usually my go-to for capture but normally when i'm like analyzing something i tend to reach more for wire shark than t-shark but i've definitely seen people with great effect use t-shark for both capture and analysis right yeah and i mean i think this is another thing where once you've seen somebody

Starting point is 00:16:06 who is good at doing this kind of thing in action, debugging a problem that you would have scratched your head for days on and finding it in a few minutes with a packet capture. We should probably tell people.

Starting point is 00:16:18 I was going to say, we should tell people what Wireshark is because I think we can talk about it. Yeah, let's do that. What's Wireshark, Ben? What's a Wireshark? Yeah, so when you communicate over the network or actually i mean it can be you can use wireshark on like usb devices and stuff like that too bluetooth and stuff yeah bluetooth yeah so if you're if you're

Starting point is 00:16:37 doing any sort of communication protocol in general it's probably worth asking the question can i see this in wireshark why would you want to see it in Wireshark? Well, because Wireshark will show you all of the bytes, everything that you're sending back and forth between computer A and computer B, or device A and device B, and allow you to apply filters to them to sort of shrink them down to the stuff you care about, transform

Starting point is 00:16:58 the raw bytes into something that's a little bit more meaningful and readable. I was going to say, the raw bytes is a bit of underselling of Wireshark. Wireshark understands almost every protocol known to mankind. It's kind of like the C-3PO of things. And it will show you what it means almost always, even if, yeah.

Starting point is 00:17:14 Yeah, no, that's a great analogy. Yeah, but yeah. So all of the, and you know, the thing I was saying was, I was writing earlier was a, or a Wireshark plugin, like being able to see like at a,

Starting point is 00:17:26 you know, at the various OSI levels or whatever it might be for your particular thing. Like what are the messages that are going back and forth here? It's an incredibly powerful tool and it adds a level of observability to anytime, anywhere that you're, you know, basically connecting two devices or two computers together,

Starting point is 00:17:43 uh, or multiple computers together. So it's something that I've used, and I know you've used a whole ton to troubleshoot all kinds of problems. And if you're not familiar with it, I highly suggest you give it a try. Absolutely. Do you have a cool Wireshark story? Like, we never would have found this, but for Wireshark kind of a thing?

Starting point is 00:18:03 I do. I don't know that I can talk about it publicly unfortunately it's one of those more interesting ones but um yeah we have found some very unusual behaviors in esoteric networking devices before now that have uh have been traced back to either like hardware issues or similar but on that subject actually and in a similar vein and i realize we're doing like all of two minutes on each of these tools that we could easily do a whole this is like a lightning talk yeah i know that's true yeah right but in the similar vein to wireshark tcp dump and things like that system tap and s trace or s trace is probably the one people are most familiar with.

Starting point is 00:18:45 This is the snoop on a process, just like you were snooping on a network connection between processes in Wireshark. S-Trace will say, hey, I can run another process or I can attach to another process and say, what all operating system calls are you doing? And I want to look at the parameters that come in and the parameters the operating system gives you back and that can give you an an awful read on an awfully deep read not awful awful in a sense but really deep read on what a process is doing and that's super super useful when you have for example a process that you don't understand why it's in a weird state you can attach the s trace to it and go oh it's it's in a it's waiting on a an event what is the event oh it's file descriptor 37 what is file descriptor 37 and then you can go and look and this is something i noted for another part of the

Starting point is 00:19:29 talk but in linux you can go and look at that processes uh information in slash proc so i would say why are you blocked wait you're trying to read from 39 what is file descriptor 39 so i will go to proc slash proc slash and then the pid of the process and that's just mounted in the file system it's just a magical file system exactly and within that file system is a bunch of useful information and then there is one directory per process you can go into that directory and there's a bunch of files that aren't really files they're magic that talk to the kernel and then you can like look at for example all of the open file handles all of the open file handles are appear as sim links between a numbered file like 37 in the case of the thing i've just been talking about and it will be a sim link to either the actual file on disk or it will

Starting point is 00:20:15 be a sim link to a special magic looking thing that will tell you i'm a socket or i'm a i'm a unicycle yeah that kind of stuff but you know maybe you won't know what that is at that point. Maybe you'll have to give up at that point. But it gives you, hey, I'm blocked on the network at some level. And then you might crack out Wireshark and go, well, what are you blocked waiting for? Can I see anything before this point? But it works on your own software if that can give you a hint as to,

Starting point is 00:20:40 well, I think the only places where it could be blocked on reading from a file is here and here. Okay, that's where we're wedged. But more importantly, it works on other people's software. So if you're stuck with why on earth does this esoteric binary that what a vendor has given me, what on earth is it doing here? S-trace is a fine way to find out. So what's system tap then? System tap. Yeah. So system tapTap is like strace++. SystemTap allows you to write small programs that get injected into the kernel and run effectively in a sanitized, safe environment within the kernel on behalf of various other parts of the operating system.

Starting point is 00:21:20 So you can kind of trap and filter operating system calls, various kernel events that can happen that are a layer below even what like S-Trace can see. So, you know, hey, I had to allocate a new 4K page of RAM. And so that's an event that happens in the kernel. It's like, oh, well, that's interesting to me.

Starting point is 00:21:39 I want you to run this bit of code and do something when that happens. It has a bunch of um useful uh scripts that you can crib from to write from but so the story that ends up with with the punchline and system trace uh sorry system tap found the issue was um a latency spike in a trading system i was working on and the latency spike, we traced back to exactly what I just described. There was a counter that went up, which was like, hey, the number of file system, sorry, the number of page faults has gone up. So if you access an area of memory you haven't

Starting point is 00:22:18 accessed before, it's a page fault. The operating system has to decide what to do very often it says oh that is part of your heap that i just didn't give you the actual physical memory for you so i'm just going to find a spare 4k page that was free before swap it in there and then you can go on your merry way and you can continue with your life and that's great right and that that allows you to say allocate me 10 gig of ram and you don't actually get 10 gig of ram instantly the operating system just says here's a space that's 10 gig wide every time you look at a bit inside of there i'm going to kind of pop in some 4k pages for you and you don't you can't tell the difference but it takes a little bit longer to access the first time there's also a major page fault which is like

Starting point is 00:23:01 what people think of when you think about virtual memory which is like swapping to disk so this is like hey i ran out of memory and this was a page that i was talking to before i've been reading and writing from it maybe it contains my executable itself and the operating system says hey i'm a bit stuffed stuck for memory right now i'm gonna write this out to disk or i'm gonna throw it away knowing that i can load it back again from disk and then when you hit that page it goes oh oh no uh right now i need to find this for you and it's much longer obviously to actually go and get it from disk than it is to just find a bit of physical memory and say oh that's yours now right background set we were having issues where we were losing packets. We were dropping packets under a very high load. And it, long story, turned out to be something that we had presumed was pre-faulting.

Starting point is 00:23:52 That is, we'd specifically asked the vendor code to touch every 4K page in the block of RAM we'd given it. Specifically so that that faulting, that minor page faulting had happened for every single block. Now it turns out there are better ways of doing it than that, but that's how the vendor implemented it. We'd asked for this flag to be set on like a two gig buffer of RAM that we knew was really important to us. Nobody else should touch. But unbeknownst to us, that wasn't happening. And so every time the process went to access a new area of this 2GIG memory the first time,

Starting point is 00:24:26 it had to do a minor page fault, which again is really, really, really fast these days. But it requires taking out a lock, a process level lock, because you're about to monkey with the page table and move things around and map memory around. And so it was blocking on that lock we discovered. System tap was like, no, every time we get here, this is what the call stack looks like. And we were able to look up the call stack and go, oh my gosh, this is the actual kernel area,

Starting point is 00:24:53 the kernel code that's being called. And it's trying to take out this lock and it's sat there. That's where it is when we're spending all of this time. Interesting. I realize we've just taken 10 minutes to tell a war story about this, but System TAP gave us the facility. No, no, but that's what these are for right right yeah right um as it happened the the vendor that we were working with um uh opens had open sourced the source which was amazing it was really really valuable to us and i was able to find the part where uh you sort of set the flag and said

Starting point is 00:25:20 hey please can you fault this stuff in and they'd written the code which essentially said for i in number of pages uh and then literally the c code of parens char star the memory address you know int temp equals that right so that is read a byte from that memory and put it into a temporary register a temporary variable i should say and then it got optimized out of course it got optimized out the compiler's like you're not doing anything with that go away and so it got optimized out and uh this was one of the first times this was a long long time ago but the compiler explorer is up and around it was the first time that i remember sending rather sheepishly uh a patch to them and as a compiler explorer link that showed that their code on

Starting point is 00:26:01 a modern compiler that obviously was written for like gcc4 which didn't do it um you got optimized away so anyway the happy ending was we were able to fix that and uh move on with us but system tap was what allowed us to find it system tap can also do stuff like how often are you being descheduled that's another really good good sweet spot for it if you're like hey i'm running my process and i and I think I'm using the CPU all the time, but every now and then something happens and it takes longer, you can say, well, okay, what's happening on this?

Starting point is 00:26:30 Oh, it's a sibling CPU is sending you a TLB shoot down, which sounds like a really complicated sequence of words. And it is, but it's like one of these weird things that can happen between nodes and system, which is totally unobservable otherwise, right? There are some counters you can look at in proc interrupts or whatever, but if you want to know, no, you actually got descheduled because this really important thing

Starting point is 00:26:50 inside the kernel had to run. You're like, but my stuff's more important than you. Anyway, system tap, brilliant. Very difficult to kind of get on with, unfortunately. It's not as cool as the unique and sought world of things, but it is a useful thing to have in your arsenal.

Starting point is 00:27:07 I know that there is Dtrace on other systems, and I think there's a port of Dtrace to Linux, and there's some stuff using the, I'm turning it back circle. So the BSD packet filtering API is another kernel compiled sort of safe system, which you can use to specify packet capture filters like hey i want to see packets that are like this and it gets compiled and run in kernel so that you aren't

Starting point is 00:27:31 spending time going in and out of kernel space this is when you when you're doing your uh you know tcp dump or your yeah is that separate from libpcap or is that what libpcap is uh it's separate from so i don't know if libpcapAP uses it under the hood, I think. Actually, I'm not 100% certain on that, but I know that there's a Barkley packet capture filter

Starting point is 00:27:50 syntax thing is a thing, and it's slightly, it's more restrictive than, for example, what you can type into Wireshark, if you've ever seen the

Starting point is 00:27:55 difference between the two. So yeah, maybe it is the same as what PCAP does. Anyway, there's an eBPF, which is the extended Barkley

Starting point is 00:28:03 packet filter that can do more than packet filtering. It can do essentially what SystemTap can do, as far as I understand. But I haven't looked at it for a while, so I'm sure listeners are at home. Our listener is grinding their teeth, kind of going,

Starting point is 00:28:17 no, that's not how it works at all. In which case, I invite you to email us or tweet at us and tell us where we're going wrong with that. But yeah, okay. I'm going to get off my little soapbox of the exciting systems level tools that i play with system tap i mean it's so there's a whole other category actually uh of tools and this is definitely higher level than system tap but but maybe equally useful certainly more commonly useful

Starting point is 00:28:41 which is all the process management stuff so like ps like PS, PS tree, kill, top, all that stuff that it's like, okay, I have this machine, it's running all these, why is my computer so damn slow? That's basically the, you know, and what can I do about it, right? Like you bring up some, you know, you launch some tool or you bring up some webpage or whatever and it's like, ah, everything's super slow now, what's going on? Well, it's like, you know, first things first, you probably run top and see, okay, what is using all the CPU? What is using all the memory?

Starting point is 00:29:12 If you're not coming from a Linux environment, this is like, you know, task manager or what's the Mac one? I forget. Is it also task manager? I have literally no idea. Something like that. You know my position on Mac andosh. Yes, I do. I do.

Starting point is 00:29:27 But yeah, you know, so if you want to see everything that's running and how much memory it's using and how much CPU it's using and how much virtual memory it's using, what its command line arguments are even, and the controls to send signals to kill it or stop it or whatever you might need, Top can do that. There's another variant of this, which I sometimes use, called HTOP. I don't know if you're an HTOP fan a bit oh yeah it's a bit newfangled for a old fart like me it's got colors and bars and cpu things i don't understand it uh-huh it wants to use the f keys what are the f the f keys of my domain leave it alone no but no h top is a is a fine tool too yeah yeah but that's that you, sort of what the hell is going on with this computer? Why is it so slow? Why,

Starting point is 00:30:07 why is this stuff not working? And then once you have all that, you might want to dive a little bit deeper into it and you can do it one of two ways. You can just run top. And I honestly, half the time I feel like I just run top. I see what's wrong.

Starting point is 00:30:20 And then I, and then I jump out of top and I go use something like PS tree, maybe a PS with a grip or or something like that to dive deeper. You can do a lot of that stuff in top and also HTOP, right? Like all those things are there to like – I want to see the tree of processes and their threads so I can see like, you know, which process has now spawned like a bajillionty threads and what the hell are those threads doing? And why are you screwing up my machine with all these threads? And, you know, you can do it all like that. I honestly, I find myself oftentimes in situations where if I'm deploying something not using Docker, which is my preferred way to do this stuff most of the time. Are you trying to enrage me here all right carry on yes yeah that's a whole we're gonna do a whole episode

Starting point is 00:31:09 on that you you're teasing me a whole episode on on the appropriate use of docker yeah if i'm yes the appropriate use of docker very well put but if i'm deploying something not using docker there's a trade-off there and one of the trade-offs is that I have to generally do the process management myself. So if I want to turn the damn thing off and be very sure that it's off, I need to make sure that all those processes have actually stopped, and that means the whole tree of processes,

Starting point is 00:31:36 not just the top-level bash script that kicked it all off, right? And in those scenarios, I generally find myself using PSTree to look at the tree of processes and look at all their PIDs and look at what process groups they're in. And combining that with a kill that, like, kills a whole group of processes or kills a single process. And it's like, I'm going to kill this top-level process. Does it correctly kill the ones below it?

Starting point is 00:31:58 Well, I'm going to find out by killing it, right? Yeah. Or I'm just going to send another signal to it. That's another great thing about kill is that you don't have to just use it for kill you can use it for any signal um you can use it for hop you can use it for uh the user signals yes let's just hop is hang up it's like the equivalent of disconnecting it's the signal you used to be given when like the modem was disconnected you know from the serial connection but we now use it to mean, hey, gracefully shut down, please, probably. Or I'd like you to, or whatever you like.

Starting point is 00:32:29 Yeah, I mean, you can kind of, it's like the definition of it has gotten, it's almost like another user signal. It's like user three at this point, right? I've seen it used for, the one place where I've seen it used that sort of kind of made philosophical sense to me is check to see if your connections are stale, right? Like, you hop a process when you, you know, your computer has been woken back up from sleep and who knows if the TCP connections that it previously had are still connected. You can hop a process and be like, hey, and if it's written to handle that

Starting point is 00:33:03 signal, it might go interrogate all of its sockets and make sure that they really are connected. You know, send a heartbeat or do some other thing to make sure that everything is still good. But yeah, if you're doing things with signal handling and you want user two to mean something, you can use kill to send user two to your process. Just make sure you get the command arguments right and don't accidentally try to send Terminate to both your PID and the mythical PID.

Starting point is 00:33:31 Seg V is a fun one. Right. Seg V. I like confusing people by sending, you know, hey, your thing crashed. And you're like, oh, yeah, well, how do you – Especially if you've got automated reporting of crashes, which I've done before now.

Starting point is 00:33:42 You know, just kill minus Seg V. It'll take you a while to wear that one how do we die here all right enough i shouldn't be giving people ideas you're giving people the all your practical ideas and other funny things yeah this is it's too early for the april edition oh maybe it isn't it's too late probably for the april fools one yeah oh man next year um yeah so there's a ton of stuff you can do with yeah ps and top and we already talked about i mean both of those use both um commands uh operating system um api calls and they actually look inside slash proc most of the time as far as i can understand to understand uh what's going on so we should talk about something that's bad we've been talking

Starting point is 00:34:23 about all this good stuff i want to talk about something that is i drives me crazy all right well let's get to that in a second i want to just finish off on a couple of things because i'm looking at the hastily scrawled notes that i did while i was in a meeting before this before this recording and um you mentioned orc which we already said is a full programming language. There are a couple of other Ork-like, which is a strange thing. So, Sed is a great thing for just doing relatively straightforward stream editing, that is text replacement or with some minimal state. I mean, I typically use it just to literally change one thing in a line of files. One line of a file.

Starting point is 00:35:06 And then Perl, of course, which is a full programming language as everyone's like but like pearl to me is spelt pearl dash p i space dash e space quote and then a small replacement string and that is the pearl in place run this on a bunch of files in place replace them with the result of having run them through the Perl script what I am about to type in. And so that is what I use to do big refactors where my automated tools give up on me. I will do find-name. Oh, we didn't talk about find. Oh, no.

Starting point is 00:35:34 Oh, man. We know we're going to miss some. We're going to publish this and someone's going to be like, you didn't talk about this? And we're like, oh, my God, how? Tech. You know, instead of cat and gzip and gunzip. Yeah, but you find a bunch of files and you're

Starting point is 00:35:45 like here are all the cpp files here are the header files i'm now going to replace this string and i'm going to use pearl dash pi dash e to replace them in place and then i'm going to rely on git haha that's a whole other great tool um to tell me did i do this right or not by like me doing a git diff and seeing what did i change okay that looks great commit that run my tests obviously so those are the other things i wanted to talk about in this thing before we move on to the the we've done the good now it's time to do the bad the bad and we have to think of what is what constitutes the ugly to finish the boring you gotta have the sergio uh sergio leone isn't that i don't know the spaghetti westerns well yeah so the precursor to this though i think

Starting point is 00:36:25 i think is talking a little bit about like bash and shell check and the sort of you know the yes some of the bash flags because it is in doing that that you run into the problem that we're about to talk about and it drives can i talk about a a good thing yeah in bash first of all a bash and in fact anything that uses gnu read line so new read line is like the text input uh any almost anything you type into in a unit command line interactively is using read line and so that's when you know you can press up and down and go through your history and you know control backwards and forwards to move between words and that kind of stuff and this is the thing that i tell people, they're like, what? I didn't know you could do that. And then it's life-altering in some cases.

Starting point is 00:37:09 And that is alt period, press alt period to toggle through the last line arguments that you specified. So this, I know it sounds like, what would you need that for? But like you do, imagine you're doing ls or cat a file, right? You're catting the file to just, or lessing the file just to see whether or not this is the file that I want to delete, right?

Starting point is 00:37:28 I'm checking the contents of it. Yeah, I can definitely delete this. And then you're going to do rm space and then the same file. And you better be sure you type the same thing in again because you're going to delete the wrong file otherwise. Yeah, because the file is so gnarly gooey with some other identifier stuck on the end of it. Yeah, exactly. Exactly. Now, one way to do this, of course, is to go up arrow

Starting point is 00:37:46 and then replace the word less with RM. And you can do all these tricks with carrot, carrot, less, carrot, RM, or whatever. But I like to look at the thing first, stare it in the eye and say, this is definitely what I meant to do. So Alt P will bring that second, like the command line argument

Starting point is 00:38:04 to the previous function into under the carrot. And then you can keep pressing Alt-P and it goes through all the other ones. And so if it's not the last one, but the one before or one before, it's like up arrow, but for just that particular command. That's neat. And a quick plug for alternate shells, because if we're going to talk about Bash, I'm just going to quickly say other shells are available. My shell of choice is fish and fish um uses um will do partial matching on whatever you've partly typed in before you hit all period if you hit all period in an empty space it'll do exactly like bash but if you like like no um i just want to less the oh i don't know it's some log file i know it's got log written in the middle of it you just have a log or period

Starting point is 00:38:44 and then you can keep hitting alt period. It's like searching through any argument you've passed to any command in its history that has the word log in it, which is just beautiful. Yeah. Okay. Enough ranting about both alt period and fish. Yes. You were talking about shell check, which is another useful utility. Yeah, yes.

Starting point is 00:38:59 It's useful because you need it. Right. Not because it does anything valuable necessarily. So when you learn all these wonderful commands and you discover the power that is being able to do all of this stuff, it's almost like being a wizard. It's exactly like being a wizard. We were talking about this the other day.

Starting point is 00:39:15 It's like being a program is like a wizard. Yeah, carry on. Sorry. It's the closest thing you're going to get. I tell my kids that all the time. Does it work? You know, kind of they're your eldest is more technical minded they're they're buying it no i it's gonna be it's

Starting point is 00:39:30 gonna be interesting to see how that shakes out actually okay right um anyway wizards yeah so once you discover all these magical spells that you can cast in the terminal to give you all these powers the next thing that you're going to want to do is automate them right because you don't want to have to actually be around to do all this stuff and type it all out by hand. You want to start automating it. And when you do that, you're inevitably going to start writing bash scripts. And then you're going to write bad bash scripts, because that's what happens when you start writing bash scripts. And the thing to help you with that is shellcheck, because shellcheck will tell you everything that your bash scripts are doing wrong,

Starting point is 00:40:03 and you will gladly thank it for that. And this combines extremely well with all the other tools that we were just talking about earlier, like watch and ENTR, where you're like, oh, I'm writing this bash script. I'm going to use ENTR to run Shellcheck on it every time I change it so that I can never make a mistake, at least not one that Shellcheck would check, right? It's great. You probably will find conventions that you do, at least I do, when it comes to Bash where you want to toggle things. The set command in Bash can be used to turn things on and off. And there are things like, hey, if you see an undeclared variable and I try to use it, please fail. Don't just keep going. Also, if you encounter any error, please fail. Don't just keep going uh and the sort of the magical combination or like um i guess the other one in this little shibboleth that i always put at the top of my bash scripts is if a if you have a pipe so you have a series of commands connected to a pipe and the first thing

Starting point is 00:40:55 we have the same shibboleth yes set uo pipe fail is that pipe fail that's the one yeah yep so if if the first command in a pipe fails please don't just carry on to the next one you can just stop right there uh or indeed if an if any anything in the pipeline dies then consider the whole command to be dead exactly and occasionally you need to turn that off for very small areas of things because you're trying to do stuff yep and it doesn't yeah but but yes it's i until this moment i hadn't really that. I thought C++ had the monopoly on terrible defaults for things. But it turns out Bash is there with the same feel. If you think of Bash as a programming language, it is way in the top 10 of terrible.

Starting point is 00:41:36 Why would you do this? Why would you do this? But it's a shell. A whole bunch of magical things, rules. Oh, yeah. If you want to do, dollar at is like all of the arguments. But if you put dollar at in quotes, it quotes each one individually because that's magically useful, obviously, because when you want to pass – but not if you just do dollar one, you know, or dollar one space dollar two. That's different again. Which is subtly different than dollar at, right?

Starting point is 00:42:02 What? Yeah, exactly. That's one of those terrible interview questions that you could do what's the difference between dollar at and dollar star i don't want this job anymore you guys really this is what you're gonna throw at me i'm out i'm out i'm out um oh my yeah yeah but no so so you start you you learn the magical spells of bash you learn all these wonderful commands you start automating things and then and then you will come to hate what we hate which is the activate pattern because the activate pattern my god i didn't know where you were going with this it breaks all of it we've talked about

Starting point is 00:42:36 this so much and i was ranting about it like earlier this week and you said well we should do an episode on that and now you've sprung it on me and I haven't had a chance to build up the head of steam and bile. No, we should talk about when it's useful and when it's not useful. And I should try and be productive and not just knee-jerk angry. So what is the activate pattern, Matt? Well, to me, the activate pattern is the management of an environment that you're going to be running commands in

Starting point is 00:43:03 by manipulating global variables like the path, like magical variables that mean things to various applications you might be running or programs you might be running like the Python path or other things like that. And essentially, you're saying, I would like to have the convenience of pretending that my computer looks at this particular way and the way in order i'm going to do that is i'm going to run a shell script which is either going to mutate my environment it is or it's going to fire up a new environment with those things preset and now i can program away to my heart's content and my compiler will be gcc7 and my python will be python 3. 9 that comes from over here and everything's wonderful and

Starting point is 00:43:45 beautiful and that's that's great and it's a powerful way of having an environment you know without using something like docker that looks different from the default one you might get on the computer so I can see the allure of it and I understand why it came into being but what it does is it gives you a hybrid promise because parts of the system come from the real system. Parts of the system come from the magical environment variables you're sharing. And it's very hard to get decent inter... Between... What's the thing? Intersubjective?

Starting point is 00:44:17 Intersubjective is the magic word there. Experience, because you're like, oh, yeah, I'm running this thing. And if you are unbeknownst, so often we're trying to help out folks who are saying, hey, I'm typing this stuff in, and it's not making any sense. If you don't realize that there's some magic that's rerouting all of the normal things that you might do in a shell to other magical things,

Starting point is 00:44:36 then you're really, really stuck. You type witch GCC. You have to think like, well, hang on a second. What do you mean? Oh, yeah, it says opt bin, some magical thing, some giant thing in your.file directory. And you're like, whoa, whoa, whoa, where's that coming from? Now I don't know where I am anymore.

Starting point is 00:44:54 I don't understand your computer. Your operating system has been subverted in a very deep and scary way. And so that's one of the reasons, though. That's the sort of pragmatic reason why I'm against it. It just makes it difficult to debug because people will forget to say, oh, yeah, I'm using some magical Ruby switcher that magically switches Ruby every time I change directory. So that's one thing. It's also a global variable. Who likes global variables, right?

Starting point is 00:45:21 You're setting something which is – yeah. Have you got some things to say about it before I kind of explain? I mean, I think we could both do a whole other podcast on why this is a bad idea. A whole podcast? Oh, episode. No, I meant a whole other – like a 12-series, one hour each, breaking down why the activate pattern is terrible. I'll be interested because obviously I just ranted for a good 10 minutes or so about this. What's your feelings?

Starting point is 00:45:52 Have I missed out anything or have I been duplicated? What do you think? I think I can describe what I don't like about it in a slightly different way, but my feelings are basically the same. When I first got out of school, the shop that I worked for was combination Windows and Solaris. All the servers were Solaris and all of the desktop machines were Windows. And that was my first real introduction to the Unix environment. My school had a Solaris lab and I did some like very basic things there, but it's like, it's not until you're getting paid to do a job that you're like, I really should learn how this

Starting point is 00:46:23 stuff works. And that's when I first started learning these tools, which I have to say, by the way, investment in the Unix tool chain is one of the best, if not the best, technological investment I've made in my career. It's held up throughout literally my entire career. And it is so useful. And it sort of moves with me from job to job to job. And it's just one of the very best investments I've made in anything, in any technology ever. So strong endorse. But when I was first out of school working at Solaris, and I started writing these kinds of automated scripts, right? And my boss at the time was like, never rely on the path. And I'm like, why? It's

Starting point is 00:47:05 there. Like, just why would you not do that? He's like, it's a global variable. Never rely on the path. If you're calling a command, make sure that you have the full path to that command. And if for some reason you can't get the full path to that command, normalize it to a full path and then print it out so that you know what command you were actually running when the stupid thing ran, right? To me, the activate pattern is going in the opposite direction of that philosophy. It's not only we're going to rely on the path, we're going to rely on these 10 other magic global variables that you have no idea what they are, and they could change at any time. And it's just, it's going in the opposite direction. And I very quickly learned in those

Starting point is 00:47:45 early days why you don't rely on the path because you can make some bad mistakes that way you know running the wrong version of things especially when you start getting into like working uh and this was later obviously but like i've gotten burned by like oh i wrote this bash script for mac os but with the core utils package installed so So it has the GNU utilities instead of the BSD ones. But the BSD ones in this environment were further up the path than the other ones. So you got a different version of sed which did a totally different thing

Starting point is 00:48:14 and your thing broke, right? And so, for me, that activate pattern is just doubling down on a bad idea which is I'm going to mutate my global environment and then I'm going to rely on that. And especially when you're automating these kinds of things like there's not a human being there to see that the thing went terribly wrong a lot of the times right like so you really want to like just be i don't know hyper paranoid that you you know what commands

Starting point is 00:48:38 you're calling wow you've that that has gone in an even more militant direction away from it than i'm prepared to do i mean like as as you know we our company has open sourced a little magical uh sort of install install a light thing that does do some of this stuff right it does say well okay if we put this in the bin you can have these particular yeah um things but so in a way i feel that i've slightly argued myself against that program. But the idea of that. I love that program. It's a project called Ozzy, which we can put in the show notes or we can get people can probably can't Google it because it's a terrible, terrible name to Google for.

Starting point is 00:49:17 O-Z-Y. Yeah. But it's based off of dot files, isn't it? So it doesn't rely on path tricks so much. It has one path trick right at the beginning. It just says, okay, this is where I'm going to find the Aussie binary and this is how we're going to do things.

Starting point is 00:49:31 But other than that, it lets you install things like JQ and other bits and pieces based off of, well, here's a dot file in this directory. This is the version of JQ you get here. Now, again, that's sort of spooky action at a distance, but it's not a global variable. It's a local variable. Maybe it's unobvious because it's scoped to the current

Starting point is 00:49:48 directory and say like for example like although we haven't used it in compiler explorer we've talked about it because it's super convenient to say look we all know that we need the same version of terraform in order for us and the other admins to be able to like administrate the site and not forever be playing upgrade tennis with each other. And so having a.ozi.yaml which says, no, Terraform is version this done, means that even if I'm in that directory, I'm just going to get

Starting point is 00:50:15 the right version of Terraform. That's kind of a nice thing. But it shares some characteristics. It's got some, you know, the siren song of like, well, it's so convenient i just type terraform and i get the right version terraform right well as opposed to doing you know their trade-off though right yeah like and i will say when i use ozzy and i automate things that use ozzy i'm always sure to not rely on ozzy bin the bin directory for ozzy being in the path

Starting point is 00:50:40 i always right explicitly call you know home.aussie slash bin slash jq when I want to queue. Or whenever it is. Yeah, yeah. Whenever I want to call the command, right? Because I'm saying, I want the Aussie version, not the one that's got installed in the operating system because I want the version that's in my YAML file.

Starting point is 00:50:57 For exactly this reason, Aussie supports a command, which I use in some makefile somewhere, which says, hey, here are the name of all of the things that I'm going to use. First of all, make sure that they're downloaded and installed and then print out all their paths. And then you can just use that as the, or rather,

Starting point is 00:51:09 it prints out the path that everything is installed in so that you can ensure and put that in front of all of the subsequent commands. So it kind of, it's like your one touch point with the, just tell me where things are going to be. Tell me where JQ is going to be. Tell me where Terraform or whatever else. I feel like we've gone slightly off track

Starting point is 00:51:28 from that by bringing up Aussie. Well, in a way, it's related because a lot of what we use Aussie for is managing these command line tools that we're all talking about. It's another way to manage them and install them. And especially when you're building bash scripts and other automation on top

Starting point is 00:51:44 of them, you want to make sure that you have the versions that you think you have running in different environments and all that so it's kind of it right we want to make sure that everybody has the same experience but that same experience isn't predicated on magical activation i think that's the other thing it's like this whole idea of like source activate.sure is a user-specific step you have to do that's now polluted that terminal until you unactivate it in some way. And obviously your prompt changes usually.

Starting point is 00:52:11 And it's kind of like, hey, look, you're in the magic world now. I'm like, I don't like that. How about it just works always? Like you go into the directory and say, okay, I'm just going to run Terraform. Oh, it's Terraform 0.17. Of course it is because I'm in the compiler explorer repository. That's what it 0.17. Of course it is because I'm in the compiler explorer repository. That's what it needs.

Starting point is 00:52:26 Yeah. So, yeah, I suppose then it's not so much against the idea of managing, carefully managing the environment that you want to run things in. It's almost like the way that it's achieved. It's achieved through magical state action at a distance. Right. Environment variables in particular, like mutating environment variables as a part of an activate process

Starting point is 00:52:50 just is like, I kind of want to depend on as few environment variables. Like user, okay. Home, sure. Locale. That's about it. That's another one actually. That was the other one. Still even locale, right?

Starting point is 00:53:04 A friend of the previous company would have his locale set to i can't remember what it was now but it was like something which was different from anyone else so it means it means that things with capital letters appeared like in a different space from where i'm used to it he said no i've just always wanted it this way i'm like i can't use your computer but also again it's another thing that shows up stuff like um scripts that were like doing ls pipe you know head minus one to find like the first thing would not necessarily work if there was the locale was set differently because ls sorts things based on the locale you know it's global variables are everywhere man

Starting point is 00:53:35 it's it's not not a good thing well that's ugly that's it you found the ugly man i found the ugly we had the good we had the bad we had the ugly so let's I found the ugly. We had the good, we had the bad, we had the ugly. So let's do a quick conclusion then. So there's a ton of tools that will be already installed on almost any Unix system. And obviously we've been talking about Unix the whole time. I'm aware that Windows has something called PowerShell that is conceptually sounds cooler

Starting point is 00:54:00 because it's object-based rather than line-based. Having looked at a chunk of power ship power shell code i don't get it yet i don't and i'm sure it's much more powerful than i understand and in fact i know it is but don't have any experience so i can't really help you out there but like anyway unix small command line utilities is probably already there we've talked about ps we talked about top we talked about the slash proc file system we talked about unique and sort and particularly that how you can use it to build a group by pattern that's true uh column is another one actually column which we didn't talk about column takes inputs um and then it will find it will columnize the inputs assuming that they are

Starting point is 00:54:40 separated by uh spaces or if you do column dash T, it uses tabs. And it'll turn outputs that were like, well, the $1, $2, $3 style spaces into $1, and then there's enough white space to make it so that all the $2s line up and then all the $3s line up. So it makes a nice little table in your browser, in your shell. So column's good. Sed, awk, Pearl, Cut,

Starting point is 00:55:07 all sort of do the kind of manipulation that can be useful. You talked about TCP dump, system tap, strace, wireshark, watch we talked about, grep obviously. What else?

Starting point is 00:55:22 Are there any other tools? I just want to, before we give up on this thing there's Dmessage Dmessage oh my golly alright yeah

Starting point is 00:55:29 that's a great one to finish with this Dmessage is the fantastic how to out sysadmin the sysadmin nine times out of ten

Starting point is 00:55:37 right so in my experience when you've reached the point where something really really odd is happening on a computer you've probably already pinged the s, really odd is happening on a computer,

Starting point is 00:55:49 you've probably already pinged the sysadmin team that help you administrate your system and say, look, something funny is going on. It's taking a lot longer to read files. Nine times out of 10, typing dmes, which dumps the current kernel's ring buffer of the most recent things that have happened, noteworthy things that have happened, you'll find in the last half dozen lines probably the clue if it's something that's really weird. So the things that I normally find that trip this up are things like any seg faults and anything on the system will be reported there. So you can say, hey, that's weird.

Starting point is 00:56:17 Some kernel process seg faulted just now. I bet you it's related. Or you'll see, oh, CPU overheating, throttling it back. Oh, I wonder if there's a problem with the cooling. Or you'll see imminent smart drive failure alert. You know, all these kinds of things that you hope will be automated away. But D-mesh appears, and then you appear like you know what's going on in the system, but in fact, you don't.

Starting point is 00:56:38 You just read the last three lines of D-mesh. So that's my D-mesh pitch. I don't know if you want to add to that. You're a wizard, Harry. You've been listening to Two's Compliment, a programming podcast by Ben Rady and Matt Godbolt. Find the show transcript and notes at twoscompliment.org. Contact us on Twitter at twoscp

Starting point is 00:57:06 That's at T-W-O-S-C-P Theme music by Inverse Phase

Your Ad Here

Two's Complement - Unix Commands for Wizards

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.