Two's Complement - Unix Commands for Wizards
Episode Date: June 22, 2021Matt and Ben discuss their favorite *nix command line tools, and make various movie references while doing so. Included in this episode are references to both Sergio Leone and gunzip, although the two... are surprisingly unrelated. Matt recalls using System Tap to discover latency in a trading system. Ben explains a method for writing Wireshark plugins that sparks joy.
Transcript
Discussion (0)
I'm Matt Godbolt.
And I'm Ben Rady.
And this is Two's Compliment, a programming podcast.
Hey Ben.
Hi Matt.
How are things?
Good, good.
I've been thinking a bit about the tooling that we use.
And you and I both work in Unix environments predominantly,
or Linux specifically in my case, and Mac, OSC type things.
And there's an awful lot of tool crafts that we've picked up,
both you and I, over the years.
And with certain people we've worked with in various companies
have been even better at this.
I remember pairing with somebody particularly and learning a whole ton of stuff about it.
I figured we should talk a bit about the kinds of things that we can do and have done and use the tooling for in the Unix shell.
Because I'm always surprised when I meet somebody who goes, how did you just do that thing?
And I'm like, oh, it's just shell.
Right, right.
Let's start there.
I think that's a great idea.
I guess what is your top X, where X is as many minutes as we can do?
Oh, man.
Unix command line tools.
The one that immediately comes to mind for me is using sort and unique.
I was just going to say the same thing.
Get out.
Yep.
It's like the flathead screwdriver of Unix tools, right?
It's the thing that you use for everything when you don't know how to use anything else.
Right.
I mean, I've got a big list of things and I just want to get a sense of the data, right?
I've got log lines and I'm like, well, how often is this thing happening?
How often are things of this thing happening or how many how often are type things of this type happening so you might chuck in a grep oe dash oe you know to say only output the bit that i'm going to tell you
in my little regular expression and then you pipe it through sort and then you pipe it through unique
dash c usually and i guess we should explain what all of those things are and why but you end up
with like a little list of hey there are exactly 50 instances of this string, 48 of this string,
10 of this, and then three of those. And that's exactly what I needed to know about this log.
So I'm glad that we picked the same one there. That's a good sign.
Yeah, that's a good tell. Yeah, you sort of use that to make a little histogram.
I guess it's also kind of like a group by. Isn't that what that is in a way?
I suppose, yeah, it is a group by. It's very much like a group by, isn that what that is in a way i suppose yeah it is a group by it's very much like
a group by isn't it because you're saying aggregate but on this key and then say how many of them
there are it's like you know order by blah sorry group by blah count star yeah so in the pipeline
i just described the the grep thing takes an input and it'll find just a snippet that you're
interested in now if your files are you just want to find the unique types, counts of the lines of a file, the different contents of lines of a file.
You don't need the grep file.
But then you sort them to get them so that exactly the same lines are one after each other.
So if your input is A, B, C, A, line, each line, then you're going to end up with a a b c right and you're really
only doing that to get the unique dash c to work that's right yeah that's a very good way of putting
it yes yeah i i kind of wanted it because it's not like you want it sorted particularly right
in fact you may re-sort it at the end which is what commonly one might do so you're getting it
so that the the unique which is the unique is designed to drop uh
subsequent or rather what's the word i'm looking for not subsequent um drop equivalent repeated
lines of a file that's what it's designed for but it can also say how many duplicates it encountered
as it as it went through so when you're piping a a b c through it it sees the first a goes that's
great that's i've only ever seen a before then it sees the same line again it goes oh well i'm not going to output that line but i'm going to
count it and then it sees b and goes oh okay that's the end of the a's i will never see another
a because it presumes that the inputs are uh or that this is what its function is i shouldn't say
it's presuming anything it's not presuming anything it's just what it does so then hey i saw two a's
then it'll say i print one b and then one c and you're done and that's wonderful right and so you get a count of
each of the individual inputs to uh unique lines i should say not input now of course the result of
that is a big list of like hey i saw a once or sorry i saw a twice i saw b once and i saw c once
and that's great and then in the example i just made up on the spot they are in a useful order potentially right because it's like hey i want the most common but often
i want to say no just give me the top 10 right of those and so just like in sequel thousands of
categories but you want to know what the most frequent one is right you know you exactly like
in sequel if you don't give it a sort key afterwards, then the sort inside your pipeline is not sorting it usefully.
It's just an implementation detail of the way your group by is working.
But then you want to sort by dash n to say, sorry, sort dash n,
which means sort numerically,
which means it's going to interpret the first part as a number.
And then you can pipe it through, say, less.
And what we've done then is or or head
minus 10 to say just show me the first 10 of those and so in in what we've got grep sort unique
another sort less five unix commands one after another and we've written a sql query for all
intents and purposes on a line-based text file right so when you're like hey uh i
noticed we got some user login rejections does that how often does that happen it's like oh we
got a hundred thousand of those today oh wow is that a lot or does that happen every day then you
go use these tools to sort of figure out if that's true right of course it's no substitute to actually
having like metrics in your application that do it but like we all have been up against the gun
trying to make these kind of like what the heck's going on with my system and we are again joined by
guest guest puppy in the background i apologize for the noise oh he's always welcome he's adorable
yeah i guess away with um right but yes you were making a point about something technical
uh yeah well yeah we were talking we were like like you said you know these are the things that you do
like we talked about
structured logging and observability
in another episode
and like if you know
if you have some inkling ahead of time of the things
that you want although you know I can make some
arguments for structured logging that would say like
anything that you write to a unstructured
log file you can write to a structured log and it will
be strictly better but not every system has this kind of logging in place sometimes you know you start out with
something and you don't necessarily know what you want your structure to be or whatever it might be
so you have these things and being able to do this is just super useful plus like the logs that
you're reading aren't necessarily yours right right sometimes they're operating system logs or
other other software's logs that you're doing this to.
It's Apache's log file that just happened to be on the voxel.
Right.
Uh-huh.
Right.
So you need to have these.
I feel like you need to have these skills no matter what.
And, of course, the next one that you want with that chain of that pipeline that you're talking about is awk, right?
Because it's like, oh, the thing that I want is this particular column, right?
Right.
And awk is like a whole programming language.
But mostly it's used for print.
But mostly it's for print.
Yes.
Yes.
Although I will make an argument for another tool.
You talk about awk first.
Well, I was going to say there are other tools.
I think I was going to say the one that I use for this is also cut.
Cut. Yeah, yeah.
That's where I was going to say the one that I use for this is also cut. Cut. Yeah, yeah. That's where I was going.
Cut is a little bit easier than specifying $3 if you have a very clear non-space-based delimiter.
So, for example, if you know that your JVM dump or whatever has, oh, it's between the second colon and then the third colon in some list of class path or whatever, then can like yeah cut dash d colon which means use colons as a delimiter and then you can pick you know three
as the third thing and that's that's another super way of just filtering the bit of the data that you
want but awk is a full programming language yeah yeah and you can and i i honestly feel bad for not
having used awk for more than just print dollar whatever there's a few things
i think i've used it for but i always have to google those things i use it to sum up a numeric
column that's another thing that i can do and also average it's quite easy to write although
again yeah there's a little bit of stack overflow used here but if there's something where it's
columnar and i just need to go over something with a relatively straightforward thing I can just use orc out of the gate to do that kind of thing or a running total or stuff
like that is again is relatively straightforward but that's the only bits I can remember is like
begin something magical end something magical and then the line itself of course if you do have the
benefit of a structured log you may have written your structured log using json because a lot of
people do that in which case you're probably going to want to use another command line tool called JQ,
which is an amazing tool, not just for dealing with structured logs, but dealing with any sort
of JSON data. If you're interacting with a web service, one of my favorite tricks to do is to
take, you know, I've got some web service, I'm trying to explore the API, and I've got the
documentation there, maybe, and I want to see if the documentation is right, or I want to see what
the actual underlying data is. And so what I'll do is I'll put together a little bash script that
curls that API, and then pipes it into JQ. And then I'll run that whole thing in another command
called watch, which is another
something that we've talked about where they'll just run it every couple of seconds right and
now i have like a constantly refreshing view of what that what that json object looks like
and i can start modifying the jq expression to select into particular elements and explore the whole tree of the resulting JSON object
in a very interactive and fast way, right?
So you sort of can walk over the whole tree and like, ooh, this value is interesting and
this whole array of things is cool and, you know, ooh, we're going to need that value
and this doesn't match up with the documentation.
And, you know, in just a matter of a few minutes, you can sort of see everything that there
is to see about an API in a very interactive way.
So those tools together, I am a huge fan of.
But JQ in general is pretty fantastic.
JQ is amazing.
I think JQ is the first of the commands that we've mentioned so far that isn't sort of like a BSD staple, right?
Everything else is part of the Unix-y environment.
Yeah, Unix, Ork, I think we've said Cut.
They're all like, you've almost
certainly got them on your machine already.
Just type it in, whatever. You don't need a pseudo
app to install anything. JQ is
a separate
process. I think you can probably get it for most
applications. I think I installed it on
my Debian thing here. But it's a single
static binary. You just grab it and chuck it in your bin
directory. That's another fantastic thing about it, right?
Like you said. I think I
would be a little surprised if on
most newer Linux distributions
you didn't get JQ out of the box, but I
could be wrong about that. I'm pretty sure it needs
to be installed. It's like one of those things that
is in one of my Docker container
things for Compiler Explorer to like, just,
hey, I expect these tools
to be in my machine
it's like there it has to be there yeah but yeah that's it that's i use that all the time you can
do math in it you can do transforms in it you can do all kinds of crazy stuff in it too so it's sort
of like you know if you get into sort of heavy scripting and not not so much just exploring you
you can damn near write programs on it if you try geeking out a little bit about it
it's just a really nicely written piece of software as well it's uh it's based on something
called lib jq which is obviously comes i say based on it extracted from it is lib jq so if you want
to do json parsing with like a language that that gives you the kind of descriptive power that the
jq query language has it compiles to an intermediate bytecode it's just it's just
sweet it's nicely done it's it's uh it's it's cool and it's worth saying as well that like while
the vast majority of invocations of jq are jq dash capital c for me which says enable the color
even though uh even though i'm about to do something that would make you want to turn it off
space dot which means select everything
hey i just want everything and then i'm going to pipe it into less capital r less dash capital r
which means hey less interpret but don't try and strip out the antsy color codes that are coming
your way because normally you're going to freak out about that and that means i get like a pageable
colored version of the you know syntax highlighted and pretty printed version of whatever I'm piping into JQ. So you don't have to use it to even do anything like any kind of interrogation at all.
You just say like, hey, it's just a really nice pretty printer that has supports color.
But then as you say, you can do dot stuff. It's got its own piping internally. You can do all
sorts of clever trickery. It's a great product. And yeah. So what else have we got?
On the topic of watch, actually, another tool that's similar to that is this is another one that you're probably going to have to install called ENTR.
ENTR basically uses the file system notification library that is definitely already in your distro to let you run a command in response to a file system change.
Right.
So if you want to scan a directory for changes, and then whenever a file in that directory
changes, run make or run JQ or run, you know, whatever your compiler of choices or your
linter or your tests or whatever it may be, you can use ENTR to do that.
And it's like a really, really easy way to create a very interactive workflow with pretty
much any programming language using this very simple tool.
You can also do it for more things you shouldn't do.
You can use it as a way to, you know, like, oh, I'm going to hit this, I'm going to make this API call
or I'm going to hit this web endpoint whenever this file changes and post the file because
you don't want to actually build the event-based system that you should be building.
Oh, right.
You know, I've seen it used for that, which, you know, maybe in a MacGyver duct tape bailing
wire situation is the right thing to do.
But, you know, it's intense and i think it's it's
really sweet spot is um you know sort of like developer tools and little mini i mean i'm used
to like npm run watch and things like that the various packages will have and they'll but they're
provided by no j services but if you want to do it more generally you can use entr i've seen it
used under i notify stuff that i've kind of hacked together myself for some of my Raspberry Pi development stuff.
I had a thing that I watched, but it was a makefile target.
If I'd have known there was ENTR to just do most of the horrible heavy lifting
of the strange protocol that you have to use to talk iNotify,
then this would have been great.
So ENTR, that's awesome.
The last time I really used it in anger, I was writing a Wireshark plug-in.
And what I would do is – I think I was writing a Wireshark plugin and what I would do is I think I was writing it in
Lua and what I would
do is whenever I was
hacking away on my Wireshark plugin and whenever
I changed the Wireshark plugin
I would have ENTR automatically run
a capture that I had through Tshark
and spit out the resulting
processed
result to get a sense of whether
or not I was writing
my Wireshark plugin correctly.
That's awesome. That's so cunning because
that gives you
CICD style
or local CICD, not CD,
but for Wireshark, which is like the
last thing I would ever expect to
actually have that process.
That's cool. It was a miracle when they
added the thing that allowed you to reload
a Lua plugin in Wireshark without having to quit it and start it again
which has been my go-to way of doing this kind of development and then you know making a pop-up
happen with you know oh i got to this point printf style yeah that was a few years ago but that
worked out pretty well but yeah actually so that leads us to another oh uh one which of course is t-shark tcp dump depending on
your your flavor there right like tcp dump is usually my go-to for capture but normally when
i'm like analyzing something i tend to reach more for wire shark than t-shark but i've definitely
seen people with great effect use t-shark for both capture and analysis right yeah and i mean
i think this is another thing
where once you've seen somebody
who is good at doing this kind of thing
in action,
debugging a problem
that you would have scratched your head
for days on
and finding it in a few minutes
with a packet capture.
We should probably tell people.
I was going to say,
we should tell people what Wireshark is
because I think we can talk about it.
Yeah, let's do that.
What's Wireshark, Ben?
What's a Wireshark?
Yeah, so when you communicate over the network or actually i mean it can be you can use wireshark on like usb
devices and stuff like that too bluetooth and stuff yeah bluetooth yeah so if you're if you're
doing any sort of communication protocol in general it's probably worth asking the question
can i see this in wireshark why would you want to see it in Wireshark? Well, because Wireshark will show you all of the
bytes, everything that you're sending
back and forth between computer A
and computer B, or device A and device
B, and allow you to
apply filters to them to sort of shrink
them down to the stuff you care about, transform
the raw bytes into something that's a little bit more
meaningful and readable.
I was going to say, the raw bytes
is a bit of underselling of Wireshark.
Wireshark understands almost every protocol known to mankind.
It's kind of like the C-3PO of things.
And it will show you what it means almost always,
even if, yeah.
Yeah, no, that's a great analogy.
Yeah, but yeah.
So all of the,
and you know,
the thing I was saying was,
I was writing earlier was a,
or a Wireshark plugin,
like being able to see like at a,
you know,
at the various OSI levels or whatever it might be for your particular thing.
Like what are the messages that are going back and forth here?
It's an incredibly powerful tool and it adds a level of observability to
anytime,
anywhere that you're,
you know,
basically connecting two devices or two computers together,
uh,
or multiple computers together.
So it's something that I've used,
and I know you've used a whole ton to troubleshoot all kinds of problems.
And if you're not familiar with it, I highly suggest you give it a try.
Absolutely.
Do you have a cool Wireshark story?
Like, we never would have found this, but for Wireshark kind of a thing?
I do.
I don't know that I can talk about it publicly unfortunately it's one of those more interesting ones but um yeah
we have found some very unusual behaviors in esoteric networking devices before now that have
uh have been traced back to either like hardware issues or similar but on that subject actually
and in a similar vein and i realize we're doing like all of two minutes on each of these tools
that we could easily do a whole this is like a lightning talk yeah i know that's true yeah right
but in the similar vein to wireshark tcp dump and things like that system tap and s trace or s trace
is probably the one people are most familiar with.
This is the snoop on a process, just like you were snooping on a network connection between
processes in Wireshark. S-Trace will say, hey, I can run another process or I can attach to
another process and say, what all operating system calls are you doing? And I want to look
at the parameters that come in and the parameters the operating system gives you back and that can give you an an awful read on an awfully deep read not awful awful in a sense but really deep read on
what a process is doing and that's super super useful when you have for example a process that
you don't understand why it's in a weird state you can attach the s trace to it and go oh it's
it's in a it's waiting on a an event what is the event oh it's file descriptor 37 what is file
descriptor 37 and then you can go and look and this is something i noted for another part of the
talk but in linux you can go and look at that processes uh information in slash proc so i would
say why are you blocked wait you're trying to read from 39 what is file descriptor 39 so i will go to
proc slash proc slash and then the pid of the process and that's just mounted
in the file system it's just a magical file system exactly and within that file system is a bunch of
useful information and then there is one directory per process you can go into that directory and
there's a bunch of files that aren't really files they're magic that talk to the kernel and then you
can like look at for example all of the open file handles all of the open file handles are appear as sim links between a numbered file like 37 in the case of the thing
i've just been talking about and it will be a sim link to either the actual file on disk or it will
be a sim link to a special magic looking thing that will tell you i'm a socket or i'm a i'm a
unicycle yeah that kind of stuff but you know maybe you won't know what that is at that point.
Maybe you'll have to give up at that point.
But it gives you, hey, I'm blocked on the network at some level.
And then you might crack out Wireshark and go,
well, what are you blocked waiting for?
Can I see anything before this point?
But it works on your own software if that can give you a hint as to,
well, I think the only places where it could be blocked on reading from a file
is here and here.
Okay, that's where we're wedged. But more importantly, it works on other
people's software. So if you're stuck with why on earth does this esoteric binary that what a
vendor has given me, what on earth is it doing here? S-trace is a fine way to find out.
So what's system tap then?
System tap. Yeah. So system tapTap is like strace++.
SystemTap allows you to write small programs that get injected into the kernel and run effectively in a sanitized, safe environment within the kernel on behalf of various other parts of the operating system.
So you can kind of trap and filter operating system calls, various kernel events that can happen
that are a layer below
even what like S-Trace can see.
So, you know, hey, I had to allocate
a new 4K page of RAM.
And so that's an event
that happens in the kernel.
It's like, oh, well, that's interesting to me.
I want you to run this bit of code
and do something when that happens.
It has a bunch of um useful uh
scripts that you can crib from to write from but so the story that ends up with with the punchline
and system trace uh sorry system tap found the issue was um a latency spike in a trading system
i was working on and the latency spike, we traced back to exactly what I just
described. There was a counter that went up, which was like, hey, the number of file system,
sorry, the number of page faults has gone up. So if you access an area of memory you haven't
accessed before, it's a page fault. The operating system has to decide what to do very often it says oh that
is part of your heap that i just didn't give you the actual physical memory for you so i'm just
going to find a spare 4k page that was free before swap it in there and then you can go on your merry
way and you can continue with your life and that's great right and that that allows you to say
allocate me 10 gig of ram and you don't actually get 10 gig of ram instantly the operating system
just says here's a space that's 10 gig wide every time you look at a bit inside of there i'm going
to kind of pop in some 4k pages for you and you don't you can't tell the difference but it takes
a little bit longer to access the first time there's also a major page fault which is like
what people think of when you think about virtual memory which is like swapping to disk so this is like hey i ran out of memory and this was a page that i was talking to
before i've been reading and writing from it maybe it contains my executable itself and the
operating system says hey i'm a bit stuffed stuck for memory right now i'm gonna write this out to
disk or i'm gonna throw it away knowing that i can load it back again from disk and then when you hit
that page it goes oh oh no uh right now i need to find this for you and it's much longer obviously to actually
go and get it from disk than it is to just find a bit of physical memory and say oh that's yours now
right background set we were having issues where we were losing packets. We were dropping packets under a very high load. And it, long story, turned out to be something
that we had presumed was pre-faulting.
That is, we'd specifically asked the vendor code
to touch every 4K page in the block of RAM we'd given it.
Specifically so that that faulting,
that minor page faulting had happened for every single block.
Now it turns out there are better ways of doing it than that, but that's how the vendor implemented it.
We'd asked for this flag to be set on like a two gig buffer of RAM that we knew was really important to us.
Nobody else should touch.
But unbeknownst to us, that wasn't happening. And so every time the process went to access a new area of this 2GIG memory the first time,
it had to do a minor page fault, which again is really, really, really fast these days.
But it requires taking out a lock, a process level lock,
because you're about to monkey with the page table and move things around and map memory around.
And so it was blocking on that lock we discovered.
System tap was like, no, every time we get here,
this is what the call stack looks like.
And we were able to look up the call stack and go,
oh my gosh, this is the actual kernel area,
the kernel code that's being called.
And it's trying to take out this lock and it's sat there.
That's where it is when we're spending all of this time.
Interesting.
I realize we've just taken 10 minutes to tell a war story about this,
but System TAP gave us the facility. No, no, but that's what these are for right right yeah right um as it happened the the vendor
that we were working with um uh opens had open sourced the source which was amazing it was really
really valuable to us and i was able to find the part where uh you sort of set the flag and said
hey please can you fault this stuff in and they'd written the code which
essentially said for i in number of pages uh and then literally the c code of parens char star
the memory address you know int temp equals that right so that is read a byte from that memory and
put it into a temporary register a temporary variable i should say and then it got optimized
out of course it got optimized out the compiler's like you're not doing anything with that go away
and so it got optimized out and uh this was one of the first times this was a long long time ago
but the compiler explorer is up and around it was the first time that i remember sending rather
sheepishly uh a patch to them and as a compiler explorer link that showed that their code on
a modern compiler that obviously was written for like gcc4
which didn't do it um you got optimized away so anyway the happy ending was we were able to fix
that and uh move on with us but system tap was what allowed us to find it system tap can also
do stuff like how often are you being descheduled that's another really good good sweet spot for it
if you're like hey i'm running my process and i and I think I'm using the CPU all the time,
but every now and then something happens
and it takes longer,
you can say, well, okay, what's happening on this?
Oh, it's a sibling CPU is sending you a TLB shoot down,
which sounds like a really complicated sequence of words.
And it is, but it's like one of these weird things
that can happen between nodes and system,
which is totally unobservable otherwise, right?
There are some counters you can look at in proc interrupts
or whatever, but if you want to know, no, you
actually got descheduled because this really important thing
inside the kernel had to run.
You're like, but my stuff's more
important than you. Anyway, system
tap, brilliant. Very difficult
to kind of get on with, unfortunately. It's not
as cool as the
unique and sought world of
things, but it is a useful thing to have in your arsenal.
I know that there is Dtrace on other systems,
and I think there's a port of Dtrace to Linux,
and there's some stuff using the,
I'm turning it back circle.
So the BSD packet filtering API
is another kernel compiled sort of safe system,
which you can use to specify packet capture filters like hey
i want to see packets that are like this and it gets compiled and run in kernel so that you aren't
spending time going in and out of kernel space this is when you when you're doing your uh you
know tcp dump or your yeah is that separate from libpcap or is that what libpcap is uh it's separate
from so i don't know if libpcapAP uses it under the hood, I think.
Actually, I'm not 100% certain on
that, but I know
that there's a
Barkley packet
capture filter
syntax thing is a
thing, and it's
slightly, it's more
restrictive than, for
example, what you
can type into
Wireshark, if you've
ever seen the
difference between
the two.
So yeah, maybe it
is the same as
what PCAP does.
Anyway, there's an
eBPF, which is the
extended Barkley
packet filter that
can do more than packet filtering.
It can do essentially what SystemTap can do,
as far as I understand.
But I haven't looked at it for a while,
so I'm sure listeners are at home.
Our listener is grinding their teeth,
kind of going,
no, that's not how it works at all.
In which case, I invite you to email us
or tweet at us and tell us
where we're going wrong with that.
But yeah, okay.
I'm going to get off my little soapbox of the exciting systems level tools that i play with system tap
i mean it's so there's a whole other category actually uh of tools and this is definitely
higher level than system tap but but maybe equally useful certainly more commonly useful
which is all the process management stuff so like ps like PS, PS tree, kill, top, all that stuff that it's like, okay, I have this machine,
it's running all these, why is my computer so damn slow?
That's basically the, you know, and what can I do about it, right?
Like you bring up some, you know, you launch some tool or you bring up some webpage or
whatever and it's like, ah, everything's super slow now, what's going on?
Well, it's like, you know,
first things first, you probably run top and see, okay, what is using all
the CPU? What is using all the memory?
If you're not coming from a Linux
environment, this is like, you know,
task manager or
what's the Mac one? I forget.
Is it also task manager?
I have literally no idea. Something like that.
You know my position on Mac andosh. Yes, I do.
I do.
But yeah, you know, so if you want to see everything that's running and how much memory it's using and how much CPU it's using and how much virtual memory it's using, what its command line arguments are even, and the controls to send signals to kill it or stop it or whatever you might need, Top can do that.
There's another variant of this, which I sometimes use, called HTOP.
I don't know if you're an HTOP fan a bit oh yeah it's a bit newfangled for
a old fart like me it's got colors and bars and cpu things i don't understand it uh-huh it wants
to use the f keys what are the f the f keys of my domain leave it alone no but no h top is a is a
fine tool too yeah yeah but that's that you, sort of what the hell is going on with this computer?
Why is it so slow?
Why,
why is this stuff not working?
And then once you have all that,
you might want to dive a little bit deeper into it and you can do it one of
two ways.
You can just run top.
And I honestly,
half the time I feel like I just run top.
I see what's wrong.
And then I,
and then I jump out of top and I go use something like PS tree,
maybe a PS with a grip or or something like that to dive deeper.
You can do a lot of that stuff in top and also HTOP, right?
Like all those things are there to like – I want to see the tree of processes and their threads so I can see like, you know, which process has now spawned like a bajillionty threads and what the hell are those threads doing? And why are you screwing up my machine with all these threads?
And, you know, you can do it all like that.
I honestly, I find myself oftentimes in situations where if I'm deploying something not using Docker, which is my preferred way to do this stuff most of the time.
Are you trying to enrage me here all right carry on yes yeah that's a whole we're gonna do a whole episode
on that you you're teasing me a whole episode on on the appropriate use of docker yeah if i'm
yes the appropriate use of docker very well put but if i'm deploying something not using docker
there's a trade-off there and one of the trade-offs is that I have to generally do the process management myself.
So if I want to turn the damn thing off
and be very sure
that it's off, I need to make
sure that all those processes have actually
stopped, and that means the whole tree of processes,
not just the top-level bash script
that kicked it all off, right?
And in those scenarios, I generally find
myself using PSTree to look at the
tree of processes and look at all their PIDs and look at what process groups they're in.
And combining that with a kill that, like, kills a whole group of processes or kills a single process.
And it's like, I'm going to kill this top-level process.
Does it correctly kill the ones below it?
Well, I'm going to find out by killing it, right?
Yeah.
Or I'm just going to send another signal to it.
That's another great thing about kill is that you don't have to just use it for kill you can use it for any signal um you can
use it for hop you can use it for uh the user signals yes let's just hop is hang up it's like
the equivalent of disconnecting it's the signal you used to be given when like the modem was
disconnected you know from the serial connection but we now use it to mean, hey, gracefully shut down, please, probably.
Or I'd like you to, or whatever you like.
Yeah, I mean, you can kind of, it's like the definition of it has gotten,
it's almost like another user signal.
It's like user three at this point, right?
I've seen it used for, the one place where I've seen it used
that sort of kind of made philosophical sense to me is
check to see if your connections are stale, right? Like, you hop a process when you, you know, your computer has
been woken back up from sleep and who knows if the TCP connections that it previously had are
still connected. You can hop a process and be like, hey, and if it's written to handle that
signal, it might go interrogate
all of its sockets and make sure that they really are connected.
You know, send a heartbeat or do some other thing to make sure that everything is still
good.
But yeah, if you're doing things with signal handling and you want user two to mean something,
you can use kill to send user two to your process.
Just make sure you get the command arguments right and don't accidentally try to send Terminate to both your PID
and the mythical PID.
Seg V is a fun one.
Right.
Seg V.
I like confusing people by sending, you know,
hey, your thing crashed.
And you're like, oh, yeah, well, how do you –
Especially if you've got automated reporting of crashes,
which I've done before now.
You know, just kill minus Seg V.
It'll take you a while to wear that one how do we die here all right enough i shouldn't be giving people ideas you're
giving people the all your practical ideas and other funny things yeah this is it's too early
for the april edition oh maybe it isn't it's too late probably for the april fools one yeah oh man
next year um yeah so there's a ton of stuff you can do with yeah ps and top and
we already talked about i mean both of those use both um commands uh operating system um
api calls and they actually look inside slash proc most of the time as far as i can understand
to understand uh what's going on so we should talk about something that's bad we've been talking
about all this good stuff i want to talk about something that is i drives me crazy all right well let's get to that in a
second i want to just finish off on a couple of things because i'm looking at the hastily
scrawled notes that i did while i was in a meeting before this before this recording and um you
mentioned orc which we already said is a full programming language. There are a couple of other Ork-like, which is a strange thing.
So, Sed is a great thing for just doing relatively straightforward stream editing,
that is text replacement or with some minimal state.
I mean, I typically use it just to literally change one thing in a line of files.
One line of a file.
And then Perl, of course, which is a full programming language as everyone's like but like pearl to me is spelt pearl dash p i space dash e
space quote and then a small replacement string and that is the pearl in place run this on a bunch
of files in place replace them with the result of having run them through the Perl script what I am about to type in.
And so that is what I use to do big refactors
where my automated tools give up on me.
I will do find-name.
Oh, we didn't talk about find.
Oh, no.
Oh, man.
We know we're going to miss some.
We're going to publish this and someone's going to be like,
you didn't talk about this?
And we're like, oh, my God, how?
Tech.
You know, instead of cat and gzip and gunzip.
Yeah, but you find a bunch of files and you're
like here are all the cpp files here are the header files i'm now going to replace this string
and i'm going to use pearl dash pi dash e to replace them in place and then i'm going to rely
on git haha that's a whole other great tool um to tell me did i do this right or not by like me
doing a git diff and seeing what did i change okay that looks great commit that run my tests
obviously so those are the other things i wanted to talk about in this thing before we move on to the the we've done the good now it's
time to do the bad the bad and we have to think of what is what constitutes the ugly to finish the
boring you gotta have the sergio uh sergio leone isn't that i don't know the spaghetti westerns
well yeah so the precursor to this though i think
i think is talking a little bit about like bash and shell check and the sort of you know the yes
some of the bash flags because it is in doing that that you run into the problem that we're
about to talk about and it drives can i talk about a a good thing yeah in bash first of all a bash and in fact anything that uses gnu read line
so new read line is like the text input uh any almost anything you type into in a unit command
line interactively is using read line and so that's when you know you can press up and down
and go through your history and you know control backwards and forwards to move between words and
that kind of stuff and this is the thing that i tell people, they're like, what? I didn't know you could do that.
And then it's life-altering in some cases.
And that is alt period,
press alt period to toggle through the last line arguments that you specified.
So this, I know it sounds like,
what would you need that for?
But like you do,
imagine you're doing ls or cat a file, right?
You're catting the file to just,
or lessing the file just to see whether or not this is the file that I want to delete, right?
I'm checking the contents of it.
Yeah, I can definitely delete this.
And then you're going to do rm space and then the same file.
And you better be sure you type the same thing in again because you're going to delete the wrong file otherwise.
Yeah, because the file is so gnarly gooey with some other identifier stuck on the end of it.
Yeah, exactly.
Exactly. Now, one way to do this, of course,
is to go up arrow
and then replace the word less with RM.
And you can do all these tricks
with carrot, carrot, less, carrot, RM, or whatever.
But I like to look at the thing first,
stare it in the eye and say,
this is definitely what I meant to do.
So Alt P will bring that second,
like the command line argument
to the previous function into under the carrot.
And then you can keep pressing Alt-P and it goes through all the other ones.
And so if it's not the last one, but the one before or one before, it's like up arrow, but for just that particular command.
That's neat.
And a quick plug for alternate shells, because if we're going to talk about Bash, I'm just going to quickly say other shells are available.
My shell of choice is fish and fish um uses um will do partial matching on whatever you've partly typed in before you hit all period if you hit all period in an empty space
it'll do exactly like bash but if you like like no um i just want to less the oh i don't know
it's some log file i know it's got log written in the middle of it you just have a log or period
and then you can keep hitting alt period.
It's like searching through any argument you've passed to any command in its history that has the word log in it, which is just beautiful.
Yeah.
Okay.
Enough ranting about both alt period and fish.
Yes.
You were talking about shell check, which is another useful utility.
Yeah, yes.
It's useful because you need it.
Right.
Not because it does anything valuable necessarily. So when you learn all these wonderful commands
and you discover the power
that is being able to do all of this stuff,
it's almost like being a wizard.
It's exactly like being a wizard.
We were talking about this the other day.
It's like being a program is like a wizard.
Yeah, carry on.
Sorry.
It's the closest thing you're going to get.
I tell my kids that all the time.
Does it work?
You know, kind of they're
your eldest is more technical minded they're they're buying it no i it's gonna be it's
gonna be interesting to see how that shakes out actually okay right um anyway wizards yeah
so once you discover all these magical spells that you can cast in the terminal to give you
all these powers the next thing that you're going to want to do is automate them right because you
don't want to have to actually be around to do all this stuff and type it all
out by hand. You want to start automating it. And when you do that, you're inevitably going to start
writing bash scripts. And then you're going to write bad bash scripts, because that's what
happens when you start writing bash scripts. And the thing to help you with that is shellcheck,
because shellcheck will tell you everything that your bash scripts are doing wrong,
and you will gladly thank it for that. And this combines extremely well with all the other tools that we were just talking about earlier, like watch and ENTR, where you're like, oh, I'm writing this bash script. I'm going to use ENTR to run Shellcheck on it every time I change it so that I can never make a mistake, at least not one that Shellcheck would check, right? It's great. You probably will find conventions that you do, at least I do, when it comes to Bash where you want to toggle things.
The set command in Bash can be used to turn things on and off.
And there are things like, hey, if you see an undeclared variable and I try to use it, please fail.
Don't just keep going.
Also, if you encounter any error, please fail.
Don't just keep going uh and the sort of the magical combination or like
um i guess the other one in this little shibboleth that i always put at the top of my bash scripts is
if a if you have a pipe so you have a series of commands connected to a pipe and the first thing
we have the same shibboleth yes set uo pipe fail is that pipe fail that's the one yeah yep so if
if the first command in a pipe fails please don't just carry on to the next one
you can just stop right there uh or indeed if an if any anything in the pipeline dies then consider
the whole command to be dead exactly and occasionally you need to turn that off for
very small areas of things because you're trying to do stuff yep and it doesn't yeah but but yes
it's i until this moment i hadn't really that. I thought C++ had the monopoly on terrible defaults for things.
But it turns out Bash is there with the same feel.
If you think of Bash as a programming language, it is way in the top 10 of terrible.
Why would you do this?
Why would you do this?
But it's a shell.
A whole bunch of magical things, rules.
Oh, yeah. If you want to do, dollar at is like all of the arguments.
But if you put dollar at in quotes, it quotes each one individually because that's magically useful, obviously, because when you want to pass – but not if you just do dollar one, you know, or dollar one space dollar two.
That's different again.
Which is subtly different than dollar at, right?
What?
Yeah, exactly.
That's one of those terrible interview questions
that you could do what's the difference between dollar at and dollar star i don't want this job
anymore you guys really this is what you're gonna throw at me i'm out i'm out i'm out um
oh my yeah yeah but no so so you start you you learn the magical spells of bash you learn all
these wonderful commands you start automating things and then and then you will come to hate what we hate which is the activate pattern because the activate
pattern my god i didn't know where you were going with this it breaks all of it we've talked about
this so much and i was ranting about it like earlier this week and you said well we should
do an episode on that and now you've sprung it on me and I haven't had a chance to build up the head of steam and bile.
No, we should talk about when it's useful and when it's not useful.
And I should try and be productive and not just knee-jerk angry.
So what is the activate pattern, Matt?
Well, to me, the activate pattern is
the management of an environment
that you're going to be running commands in
by manipulating global
variables like the path, like magical variables that mean things to various applications you
might be running or programs you might be running like the Python path or other things like that.
And essentially, you're saying, I would like to have the convenience of pretending that my computer looks at this
particular way and the way in order i'm going to do that is i'm going to run a shell script
which is either going to mutate my environment it is or it's going to fire up a new environment
with those things preset and now i can program away to my heart's content and my compiler will
be gcc7 and my python will be python 3. 9 that comes from over here and everything's wonderful and
beautiful and that's that's great and it's a powerful way of having an environment you know
without using something like docker that looks different from the default one you might get on
the computer so I can see the allure of it and I understand why it came into being but what it
does is it gives you a hybrid promise because parts of the system come from the real system.
Parts of the system come from the magical environment variables you're sharing.
And it's very hard to get decent inter...
Between... What's the thing?
Intersubjective?
Intersubjective is the magic word there.
Experience, because you're like, oh, yeah, I'm running this thing.
And if you are unbeknownst,
so often we're trying to help out folks who are saying,
hey, I'm typing this stuff in, and it's not making any sense.
If you don't realize that there's some magic
that's rerouting all of the normal things
that you might do in a shell to other magical things,
then you're really, really stuck.
You type witch GCC.
You have to think like, well, hang on a second.
What do you mean?
Oh, yeah, it says opt bin, some magical thing,
some giant thing in your.file directory.
And you're like, whoa, whoa, whoa, where's that coming from?
Now I don't know where I am anymore.
I don't understand your computer.
Your operating system has been subverted in a very deep and scary way.
And so that's one of the reasons, though.
That's the sort of pragmatic reason why I'm against it.
It just makes it difficult to debug because people will forget to say, oh, yeah, I'm using some magical Ruby switcher that magically switches Ruby every time I change directory.
So that's one thing.
It's also a global variable.
Who likes global variables, right?
You're setting something which is – yeah.
Have you got some things to say about it before I kind of explain?
I mean, I think we could both do a whole other podcast on why this is a bad idea.
A whole podcast?
Oh, episode.
No, I meant a whole other – like a 12-series, one hour each, breaking down why the activate pattern is terrible.
I'll be interested because obviously I just ranted for a good 10 minutes or so about this.
What's your feelings?
Have I missed out anything or have I been duplicated?
What do you think?
I think I can describe what I don't like about it in a slightly different way, but my feelings are basically the same.
When I first got out of school, the shop that I worked for
was combination Windows and Solaris. All the servers were Solaris and all of the desktop
machines were Windows. And that was my first real introduction to the Unix environment.
My school had a Solaris lab and I did some like very basic things there, but it's like,
it's not until you're getting paid to do a job that you're like, I really should learn how this
stuff works. And that's when I first started learning these tools, which I have to say, by the way, investment in the Unix tool chain is one of the best, if not the best, technological investment I've made in my career.
It's held up throughout literally my entire career.
And it is so useful.
And it sort of moves with me from job
to job to job. And it's just one of the very best investments I've made in anything, in any
technology ever. So strong endorse. But when I was first out of school working at Solaris,
and I started writing these kinds of automated scripts, right? And my boss at the time was like,
never rely on the path. And I'm like, why? It's
there. Like, just why would you not do that? He's like, it's a global variable. Never rely on the
path. If you're calling a command, make sure that you have the full path to that command. And if for
some reason you can't get the full path to that command, normalize it to a full path and then
print it out so that you know what command you were actually running when the stupid thing ran, right? To me, the activate pattern is going in the
opposite direction of that philosophy. It's not only we're going to rely on the path, we're going
to rely on these 10 other magic global variables that you have no idea what they are, and they
could change at any time. And it's just, it's going in the opposite direction. And I very
quickly learned in those
early days why you don't rely on the path because you can make some bad mistakes that way you know
running the wrong version of things especially when you start getting into like working uh and
this was later obviously but like i've gotten burned by like oh i wrote this bash script for
mac os but with the core utils package installed so So it has the GNU utilities instead of the BSD
ones. But the BSD ones
in this environment were further up the path
than the other ones. So you got a different version of
sed which did a totally different thing
and your thing broke, right?
And so, for
me, that activate pattern is just
doubling down on a bad idea
which is I'm going to mutate my global environment
and then I'm going to rely on that. And especially when you're automating these kinds of things like
there's not a human being there to see that the thing went terribly wrong a lot of the times right
like so you really want to like just be i don't know hyper paranoid that you you know what commands
you're calling wow you've that that has gone in an even more militant direction away from it than i'm prepared to do i
mean like as as you know we our company has open sourced a little magical uh sort of install
install a light thing that does do some of this stuff right it does say well okay if we put this
in the bin you can have these particular yeah um things but so in a way i feel that i've slightly
argued myself against that program.
But the idea of that.
I love that program.
It's a project called Ozzy, which we can put in the show notes or we can get people can probably can't Google it because it's a terrible, terrible name to Google for.
O-Z-Y.
Yeah.
But it's based off of dot files, isn't it?
So it doesn't rely on path tricks so much.
It has one path trick right at the beginning.
It just says, okay,
this is where I'm going to find the Aussie binary
and this is how we're going to do things.
But other than that,
it lets you install things like JQ
and other bits and pieces based off of,
well, here's a dot file in this directory.
This is the version of JQ you get here.
Now, again, that's sort of spooky action at a distance,
but it's not a global variable.
It's a local variable. Maybe it's unobvious because it's scoped to the current
directory and say like for example like although we haven't used it in compiler explorer we've
talked about it because it's super convenient to say look we all know that we need the same
version of terraform in order for us and the other admins to be able to like administrate the site and not forever be playing upgrade tennis
with each other.
And so having a.ozi.yaml
which says, no, Terraform is version this
done, means
that even if I'm in that directory, I'm just going to get
the right version of Terraform. That's kind of a nice thing.
But it shares
some characteristics. It's got some, you know,
the siren song of like,
well, it's so convenient i just
type terraform and i get the right version terraform right well as opposed to doing you
know their trade-off though right yeah like and i will say when i use ozzy and i automate things
that use ozzy i'm always sure to not rely on ozzy bin the bin directory for ozzy being in the path
i always right explicitly call you know home.aussie slash bin slash jq
when I want to queue.
Or whenever it is.
Yeah, yeah.
Whenever I want to call the command, right?
Because I'm saying, I want the Aussie version,
not the one that's got installed in the operating system
because I want the version that's in my YAML file.
For exactly this reason,
Aussie supports a command,
which I use in some makefile somewhere,
which says, hey, here are the name
of all of the things that I'm going to use.
First of all, make sure that they're downloaded and installed
and then print out all their paths.
And then you can just use that as the, or rather,
it prints out the path that everything is installed in
so that you can ensure and put that in front of
all of the subsequent commands.
So it kind of, it's like your one touch point with the,
just tell me where things are going to be.
Tell me where JQ is going to be.
Tell me where Terraform or whatever else.
I feel like we've gone slightly off track
from that by bringing up Aussie.
Well,
in a way, it's related because a lot of what
we use Aussie for is managing these command line
tools that we're all talking about.
It's another way to manage
them and install them. And especially when you're building
bash scripts and other automation on top
of them, you want to make sure that you have the versions that you think you have
running in different environments and all that so it's kind of it right we want to make sure that
everybody has the same experience but that same experience isn't predicated on
magical activation i think that's the other thing it's like this whole idea of like source
activate.sure is a user-specific step you have to do
that's now polluted that terminal
until you unactivate it in some way.
And obviously your prompt changes usually.
And it's kind of like, hey, look, you're in the magic world now.
I'm like, I don't like that.
How about it just works always?
Like you go into the directory and say,
okay, I'm just going to run Terraform.
Oh, it's Terraform 0.17.
Of course it is because I'm in the compiler explorer repository.
That's what it 0.17. Of course it is because I'm in the compiler explorer repository. That's what it needs.
Yeah.
So, yeah, I suppose then it's not so much against the idea of managing,
carefully managing the environment that you want to run things in.
It's almost like the way that it's achieved.
It's achieved through magical state action at a distance.
Right.
Environment variables in particular,
like mutating environment variables as a part of an activate process
just is like, I kind of want to depend on as few environment variables.
Like user, okay.
Home, sure.
Locale.
That's about it.
That's another one actually.
That was the other one.
Still even locale, right?
A friend of the
previous company would have his locale set to i can't remember what it was now but it was like
something which was different from anyone else so it means it means that things with capital
letters appeared like in a different space from where i'm used to it he said no i've just always
wanted it this way i'm like i can't use your computer but also again it's another thing that
shows up stuff like um scripts that were like doing ls pipe you know head minus one
to find like the first thing would not necessarily work if there was the locale was set differently
because ls sorts things based on the locale you know it's global variables are everywhere man
it's it's not not a good thing well that's ugly that's it you found the ugly man i found the
ugly we had the good we had the bad we had the ugly so let's I found the ugly. We had the good, we had the bad, we had the ugly.
So let's do a quick conclusion then.
So there's a ton of tools that will be already installed
on almost any Unix system.
And obviously we've been talking about Unix the whole time.
I'm aware that Windows has something called PowerShell
that is conceptually sounds cooler
because it's object-based rather than line-based.
Having looked at a chunk of power
ship power shell code i don't get it yet i don't and i'm sure it's much more powerful than i
understand and in fact i know it is but don't have any experience so i can't really help you out
there but like anyway unix small command line utilities is probably already there we've talked
about ps we talked about top we talked about the slash proc file system we talked about unique and sort and particularly that how you can use it to build a
group by pattern that's true uh column is another one actually column which we didn't talk about
column takes inputs um and then it will find it will columnize the inputs assuming that they are
separated by uh spaces or if you do column dash T, it uses tabs.
And it'll turn outputs that were like,
well, the $1, $2, $3 style spaces into $1,
and then there's enough white space to make it so that all the $2s line up and then all the $3s line up.
So it makes a nice little table in your browser,
in your shell.
So column's good.
Sed, awk, Pearl, Cut,
all sort of do the kind of manipulation
that can be useful.
You talked about TCP dump,
system tap,
strace, wireshark,
watch we talked about,
grep obviously.
What else?
Are there any other tools?
I just want to,
before we give up on this thing
there's
Dmessage
Dmessage
oh my golly
alright yeah
that's a great one
to finish with
this Dmessage
is the fantastic
how to
out sysadmin
the sysadmin
nine times out of ten
right
so in my experience
when you've reached
the point where
something really
really odd
is happening on a computer
you've probably already pinged the s, really odd is happening on a computer,
you've probably already pinged the sysadmin team that help you administrate your system and say,
look, something funny is going on. It's taking a lot longer to read files.
Nine times out of 10, typing dmes, which dumps the current kernel's ring buffer of the most recent things that have happened, noteworthy things that have happened, you'll find in the last half dozen lines
probably the clue if it's something that's really weird.
So the things that I normally find that trip this up
are things like any seg faults and anything on the system
will be reported there.
So you can say, hey, that's weird.
Some kernel process seg faulted just now.
I bet you it's related.
Or you'll see, oh, CPU overheating, throttling it back.
Oh, I wonder if there's a problem with the cooling.
Or you'll see imminent smart drive failure alert.
You know, all these kinds of things that you hope will be automated away.
But D-mesh appears, and then you appear like you know what's going on in the system,
but in fact, you don't.
You just read the last three lines of D-mesh.
So that's my D-mesh pitch.
I don't know if you want to add to that.
You're a wizard, Harry.
You've been listening to Two's Compliment,
a programming podcast by Ben Rady and Matt Godbolt.
Find the show transcript and notes at twoscompliment.org.
Contact us on Twitter at twoscp
That's at T-W-O-S-C-P
Theme music by Inverse Phase