Screaming in the Cloud - Corey Screws Up Logstash For Everyone with Jordan Sissel
Episode Date: September 29, 2021About JordanJordan is a self proclaimed “hacker.” Links:Twitter: https://twitter.com/jordansissel ...
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
This episode is sponsored in part by you, Gabite.
Distributed technologies like Kubernetes are great, citation very much needed,
because they make it easier to have resilient, scalable systems.
SQL databases haven't kept pace, though.
Certainly not like no SQL databases have, like Route 53, the world's greatest database.
We're still, other than that, using legacy, monolithic databases that require ever-growing
instances of compute.
Sometimes we'll try and bolt them together to make them more resilient and scalable,
but let's be honest,
it never works out well. Consider UgaByteDB. It's a distributed SQL database that solves basically all of this. It is 100% open source, and there's no asterisk next to the open on that one,
and it's designed to be resilient and scalable out of the box so you don't have to charge yourself
to death. It's compatible with PostgresSQL, or Postgresqueel as I insist on pronouncing it,
so you can use it right away without having to learn a whole new language and refactor everything.
And you can distribute it wherever your applications take you,
from across availability zones to other regions or even other cloud providers
should one of those happen to exist.
Go to yugabyte.com, that's Y-U-G-A-B-Y-T-E
dot com, and try their free beta of Yugabyte Cloud, where they host and manage it for you.
Or see what the open source project looks like. It's effortless distributed SQL for global apps.
My thanks to you, Gabite, for sponsoring this episode.
This episode is sponsored in part by our friends at
VMware. Let's be honest, the past year has been far from easy due to, well, everything. Caused us
to rush cloud migrations and digital transformation, which of course means long hours refactoring your
apps, surprises on your cloud bill, misconfigurations, and headaches for everyone trying to manage
disparate and fractured cloud environments. VMware has an answer for this. With VMware's
multi-cloud solutions, organizations have the choice, speed, and control to migrate and optimize
applications seamlessly without recoding, take the fastest path to modern infrastructure,
and operate consistently across the data center, the edge, and any cloud.
I urge you to take a look at vmware.com slash go slash multicloud.
You know my opinions on multicloud by now, but there's a lot of stuff in here that works on any cloud.
But don't take it from me. That's vmware.com slash go multi-cloud, all one word. And my thanks to them
again for their sponsorship of my ridiculous nonsense. Welcome to Screaming in the Cloud.
I'm Corey Quinn. I've been to a lot of conference talks in my life. I've seen good ones, I've seen
terrible ones, and then I've seen the ones that are way worse than that. But we don't tend to think
in terms of impact very often about how conference
talks can move the audience. In fact, that's the only purpose of giving a talk ever, to my mind,
is you're trying to spark some form of alchemy or shift in the audience and convince them to do
something. Maybe in the banal sense, it's to sign up for something that you're selling, or to go
look at your website, or to contribute to a project,
or maybe it's to change the way they view things. One of the more transformative talks I've ever
seen that shifted my outlook on a lot of things was at scale in 2012. The person who gave that
talk is my guest today, Jordan Cissell, who is the, among many other things in his career,
was the original creator behind Logstash, which is the L in Elkstack.
Jordan, thank you for joining me.
Thanks for having me, Corey.
I don't know how well you remember those days in 2012. It was the dark times. We thought,
oh, the world is going to end. That wouldn't happen until 2020. But it was an interesting
conference full of a bunch of open source folks. It was my local conference because I lived in Los
Angeles. And it was the thing I looked forward to every year because I would
always go and learn something new. I was in the trenches in those days. And I had a bunch of
problems that looked an awful lot like other people's problems. And having a hallway track
where, hey, how are you solving this problem was a big deal. I miss those days in some ways.
Yeah, scale was a particularly good conference. I think I made it twice.
Traveling down to LA was infrequent for me,
but I always enjoyed how it was a very communal setting.
They had a dedicated hallway tracks.
They had kids tracks, which I thought was great
because folks couldn't usually come to conferences
if they couldn't bring their kids
or they had to take care of that stuff.
But having a kids track was great.
They had kids presenting.
It felt more organic than a lot of other conferences did.
And that's kind of what drew me to it initially.
Yeah, it was my local network.
It turns out that the Southern California tech community is relatively small and we all go different lives.
And it's L.A., let's face it.
I lived there for over a decade.
Flaking is a way of life.
So yeah, oh, we'll go out and catch dinner.
Ooh, half the flake at the last minute.
If you're one of the good people,
you tell people you're flaking
instead of just no showing, but it happens.
But this was the thing that we would gather
and catch up every year.
And oh, what have you been doing?
Wow, you work at that company now.
Congratulations slash what's wrong with you?
It was fun, just sort of a central sync point.
It started off as hanging out with friends. And in those days, I was approaching the idea of, you know what?
I should learn to give a conference talk someday. But let's be clear. People don't give conference
talks. Legends give conference talks. One day, I'll be good enough to get on stage and give a
talk to my peers at a conference. Now, the easy cynical interpretation
would be, well, then I saw your talk and I figured, hey, any jackhole can get up there. If he can do
it, anyone can. But that's not at all how it wound up impacting me. You were talking about Logstash,
which, let's start there, because that's a good entry point. Logstash was transformative for me.
Before that, I'd spent a lot of time playing around with syslog. Usually our syslog, but there are other stories here.
When a system does something and it spits out logs, ideally, how do you make sure you capture
those logs in a reliable way? So if you restart a computer, you don't wind up with a gap in your
logs. If it's the right computer, it can be a gap in everything's logs while the thing's coming back
up. And that's avoid single points of failure in the rest. And I had done all kinds of
horrible monstrosities. And someone asked me at one point, yeah, someone said, well, there are a
couple of options. Why don't you use Splunk? And the answer was, is that I don't have a spare
princess lying around that I can ransom back to her kingdom. So I can't afford it. Okay, what
about Logstash? And my answer was,
what's a Logstash? And thus that sound was Pandora's box creaking open. So I started playing
with it and realized, okay, this is interesting. And I lost track of it because we have demands
on our time. Then I was dragged into a session that you gave and you explained what Logstash was.
I'm not going to do nearly as good of a job as you can on this. What the hell
was Logstash for folks who are not screaming at syslog while they first hear of it? All right,
so you mentioned rsyslog, and there's old is often a pejorative of more established projects,
because I don't think these projects are bad. But rsyslog, syslog-ng, things like that were
common to see for me as a sysadmin. But to talk about
Logstash, we need to go back a little further than 2012. So the Logstash project started...
I disagree because I wasn't aware of it until 2012, until I become aware of something.
It doesn't really exist. That's right. I have the object permanence of an infant.
That's fair. And I've always felt like perception is reality. So if someone,
this gets into something I like to say, but if someone is having a bad time or someone doesn't
know about something, then it might as well not exist. So Logstash as a project kind of started
in 2008, 2009. I don't remember when the first commits landed, but it was, it was, gosh, it's
more than 10 years ago now. But even before that, in college, I was fortunate to, through a network of friends,
get a job as a sysadmin.
And as a sysadmin, you stare at logs a lot
to figure out what's going on.
And I wanted a more interesting way to process the logs.
I had taught myself regular expressions
and I wasn't finding joy in it at all.
Like pretty much most people probably, either they look at regular expressions and I wasn't finding joy in it at all. Like pretty much most people probably
either they look at regular expressions and just evacuate with disgust, which is absolutely
an appropriate response, or they dive into it and they have to use it for their job,
but it's not, it wasn't enjoyable. And I found myself repeating stuff a lot,
matching IP addresses, matching strings, URLs, just trying to pull out useful
information about what is going on. Oh, and the timestamp problem too. One of the things that I
think people don't understand who have not played in this space is that all systems do have logs,
unless you've really pooched something somewhere. And it shows that at this point in time,
this thing happened. As we start talking about multiple computers and distributed systems,
but even on the same computer, great. So at this time, there was something that showed up in the system log
because there was a disk event or something. And at the same time, you have application logs that
are talking about what the application running is talking about. And that is ideally using a
somewhat similar system to do this, but often not. And the way that timestamps are expressed in these are radically
different. And the way that the log files themselves are structured, one might be timestamp
followed by host name, followed by error code. The other one might be host name followed by
timestamp in a different format, followed by a copyright notice because a big company got to it,
followed by the actual event notice. And trying to disambiguate all of these into a standardized form was, first, obnoxious, and secondly, very important because
you want to see the exact chain of events. This also leads to a separate sidebar on making sure
that all the clocks are synchronized, but that's a separate story for another time.
And that's where you enter the story in many respects.
Right. So my sort of thought around what led to Logstash is you could take a sysadmin or a software
IT developer, whatever, expert, and you can sit them in front of a bunch of logs and they
can read them and say, that's the time it happened.
That's the user who caused this action.
This is the action.
But if you try and abstract and step away, and so you ask, how many times did this action
happen?
When did this user appear? What time did this happen?
You start losing the ability to ask those questions without being an expert yourself
or sitting next to an expert and having them be your keyboard.
Kind of a phenomenon I call the human keyboard problem,
where you're speaking to a computer, but someone has to translate for you.
And so in around 2004, I was super into Perl.
No shocker that I enjoyed ish. I sort of enjoyed regular expressions, but I was super into Perl.
And there was a Perl module called regex colon colon common, which was a library of regular
expressions to match known things, IP addresses, certain kinds of timestamps,
quoted strings, and whatnot. And this stuff is always challenging because it sounds like,
oh, an IP address. One of the interview questions I hated the most, someone asked me was,
write a regular expression to detect an IP address. It turns out that to do this correctly,
even if you bound it to IPv4 only, the answer takes up multiple lines on a screen. Oh, for sure. It's enormous.
It's like a full page of code you can't read. And that's one of the things that it was sort of like
standing on the shoulders of the person who came before. It was kind of a epiphany to me. Yeah. So
I can copy and paste that into my code, but someone has to maintain that thing after I get fired
is going to be, what the hell is
this? And what does it do? It's like, it's the blessed artifact that the ancients built it and
left it there. Like it's a Stargate sitting in your code. And it's, we don't know how it works.
We're scared to break it. So we don't even look at that thing directly. We just know that we put
nonsense in and IP address comes out and let's not touch it ever again. Exactly. And even to your
example, even
before you get fired and someone replaces you and looks at your regular expression, the problem I
was having was I would have this library of copy and pasteable things, and then I would find a bug,
an edge case, and I would fix that edge case. But the other 15 scripts that were using the same
regular expression, I can't even read them anymore because I don't carry that
kind of context in my head for all of that syntax. So you either have to go back and copy and paste
and fix all those old regular expressions, or you just say, you know what, we're not going to fix
the old code. We have a new version of it that works here, but everywhere else this edge case
fails. So that's one of the things that kind of drew me to the regex colon colon common library in Perl was that it was reusable and things had names. It was, I want to match an IP address.
You didn't have to memorize that long piece of text too precisely and accurately,
accept only regular expressions and reject things that are not. You just said, give me the regular
expression that matches an IP. And from that library gave me the idea to
write grok. Well, if we could name things, then maybe we could turn that into some kind of data
structure. The sort of the combination of I have a piece of log data. And I as an expert, I know
that's an IP address, that's the username, and that's the timestamp. Well, now I can apply this
library of regular expressions that I didn't have to write and hopefully has a unit test suite and say, now we can pull out instead of that plate piece of
text. Now that is hard to read as a non-expert. Now I can have a data structure. We can format
however we want that non-experts can see. And even experts can just relax and not have to be
full experts all the time using that part of your brain. So now you can start getting towards answering search-oriented questions.
How many login attempts happened yesterday from this IP address?
Right. And back then, the way that people would do these things was elastic search. So that's
the thing you shove all your data into in a bunch of different ways, and you can run full-text
queries on it, and that's great. But now we want to have that stuff actually structured. And that is sort of the magic of Logstash, which was used in conjunction with
Elasticsearch a lot. And it turns out that typing random SQL queries into command line is not
generally how most business users like to interact with this stuff. So it needs to be something
dashboard-y like, and the project that folks used for that was Kibana. And ElkStack became a thing
because Elasticsearch in isolation
can do a lot, but it doesn't get you all the way there for what people were using to look at logs.
You're right.
And Kibana is also one of the projects that Elastic owned. And at some point,
someone looks around like, oh, Logstash, people are using that with us an awful lot.
How big is the company that built that? Oh, it's an open source project run by some guy.
Can we hire that guy? And the answer
is apparently because you wound up working as an Elastic employee for a while. Yeah, it was kind of
an interesting journey. So in the beginning of Logstash in sort of 2009, I kind of had this
picture of how I wanted to solve log processing search challenges. And I broke it down into a
couple of parts of visualization. To be clear, I broke it down into a couple of parts of visualization.
To be clear, I broke it down in my head, not into code, but visualization kind of exploration.
There's the processing and transmission, and then there's storage and search. And I only felt confident really attending to a solution for one of those parts. And I picked log processing,
partly because I already had a jumpstart from a couple of years prior working on Grok and feeling really comfortable with regular expressions. I
don't want to say good because that's... You heard it here first. We found the person that knows
regular expressions. And Logstash was being worked on to solve this problem of taking your data,
processing it, and getting it somewhere. That's why Logstash has so many outputs,
has so many inputs, and lots of filters.
And about, I think, a year into building Logstash,
I had experimented with storage and search backends,
and I never found something that really clicked with me.
And I was experimenting with Lucene,
and knowing that I could not complete this journey
because the problem space is so large, it would be foolish of me to try to do distributed log
storage or anything like that, plus visualization. I just didn't have the skills or the time in the
day. I ended up writing a front end for Logstash called Logstash Web. Naming things is hard. And I wasn't particularly skilled
or attentive to that project. And it was more of a very lightweight front end to solve the
visualization, the exploration aspect. And about a year into Logstash being alive, I found Elasticsearch.
And what clicked with me from being a sysadmin and having worked at large data center companies in the past is I know the logs on a single system are going to quickly outgrow it.
So whatever storage system will accept these logs, it's got to be easy to add new storage.
And Elasticsearch's first day promise was it's distributed.
You can add more nodes and go about your day. And it fulfilled that promise. And I think it still fulfills that
promise that if you're going to be processing terabytes of data, yeah, just keep dumping it
in there. That's one of the reasons I didn't try and even use MySQL or Postgres or other data
systems, because it didn't seem obvious how to have multiple storage servers collecting this
data with those solutions for me at the time.
It turns out that solving problems like this that are global and universal lead to
massive adoption very quickly. I want to get this back a bit before you wind up joining Elastic,
because you get up on stage and you talk through what this is. And I mentioned at the start of this
recording that it was one of those transformative talks, but let's be clear here. I don't remember 95% of how Logstash works. Like,
the technology you talked about 10 years ago is largely outmoded slash replaced slash outdated
today. I assure you, I did not take anything of note whatsoever from your talk regarding
regular expressions, I promise. Good.
But that's not the stuff that was transformative to me. What was, was the way that you talked about these things.
And it was the first time I'd ever heard the phrase that if a new user has a bad time,
it's a bug.
This was 2012.
The idea of empathy hadn't really penetrated into the ops and engineering spaces in any
meaningful way yet.
It was about gatekeeping.
It was about read the manual, fool, if people had questions.
And it was about read the manual, fool, if people had questions. And it was actively user
hostile. And it was something that I found transformative of, forget the technology piece
for a second. This is a story about how it could be different. Because Logstash was the vehicle to
deliver a message that transcended far beyond the boundaries of how to structure your logs, or
maybe beyond the boundaries of regular expressions. I'm never quite sure where those things start and stop. But it was something that was actively
transformative where you're on stage as someone who is a recognized authority in the space,
and you're getting up there and you're sending an implicit message, both explicitly and by example,
of be nice to people, demonstrate empathy. And that left a hell of an impact.
Thank you.
I wound up doing a spot check just now. And I wound up looking at this, and sure enough,
early in 2013, I wound up committing, it's still in the history of the change log for Logstash
because it's open source. I committed two pull requests, 10 minutes apart, two submissions. I
don't know if pull requests were even a thing back then, but it wound up in the log.
Because another project you were renowned for was FPM, F'n Package Manager.
Is that what the acronym stands for, or am I misremembering?
We'll go with that.
I'm sure vulgar viewers will know what the F stands for, but you don't have to say it.
It's F'n Package Management.
Yeah.
But yeah, I think I really do believe that if a user,
especially if a new user has a bad time, it's a bug. And that came from many years of
participating at various levels in open source, where if you came at it with like a tinkerers or
a hacker's mindset, and you think this project is great, I would like it to do one additional thing.
And I would like to talk to someone
about how to make it do that one additional thing.
And you go find the owners
or the maintainers of that project.
And you come in with gusto and energy
and you describe what you want to do.
And first they say,
what you want to do is not possible.
They don't even say they don't want to do it.
They frame the whole universe against you. It's not possible. Why would you want to do is not possible. They don't even say they don't want to do it. They frame the whole universe against you. It's not possible. Why would you want to do that if you want to make that do
it yourself? You know, none of these things are an extended hand, a lowered ladder, an open door,
none of those. It's always, you're bothering me, go away. Please read the documentation and see
where we clearly, which they don't document that
this is not a thing we're interested in and i kind of came to the conclusion that any future
open source or collaborative work that i worked on it's got to be from a place where
you're welcome and whatever contributions or participation levels you choose are okay.
And if you have an idea, let's talk about it. If you're having a bad time,
let's figure out how to solve it. Maybe the solution is we point you in the right direction
to the documentation if documentation exists. Maybe we find a bug that we need to fix.
The idea that the way to build communities is
through kindness and collaboration, not through walls or gatekeeping or just being rude. And I
really do think that's one of the reasons Logstash became so successful. I mean, any particular
technology could have succeeded in the space that Logstash did, but I believe that it did so
because of that one piece of framework
where if a new user has a bad time, it's a bug. Because to me, that opens the door to say,
yeah, you know what? Some of the code I write is not going to be good. Or the thing you want to do
is undocumented. Or the documentation is out of date. It told you a lie and you followed the
documentation and it misled you because it's
incorrect, we can fix that. Maybe we don't have time to fix it right now. Maybe there's no one
around to fix it. But we can at least say, you know what, that information is incorrect, and I'm
sorry you were misled. Come on into the community and we'll figure it out. And one of the patterns
I know is on the IRC channel, which is where the Logstash real-time community chat,
I don't know how to describe that.
No, it was on Freenode.
That's part of the reason I felt okay talking to you.
At that point, I was volunteer network staff.
This was before Freenode turned into basically
a haven for Nazis this past year.
Yeah, it was still called Lilo.
LiloNet?
No, the Open Freedom Network.
That predates me.
This was, yeah, Lilo had died about six years prior.
Oh, all right.
But Reno's been around a long time.
What made this thing work was that I was network staff,
and that means that I had a bit of perceived authority.
It's a chat room, not really.
But it was one of those things where it was at least,
okay, this is not just some sketchy drive by Rando,
which I very much was, but I didn't present that way.
So I could strike up conversations. But with you talking about this stuff, I never needed to be that person. It was
just someone wants to pitch in on this. Great. More hands make lighter work. Sure. Yeah, for sure.
And for me, the interesting part is not even around the Logstash aspect so much. It's your
other project, FPM. Well, one of your other projects. Back in 2012, that was an interesting
year for me. Another area that got very near and
dear to my heart in open source world was the SaltStack project. I was contributor number 15,
and I didn't know how Python worked. Not that I do now, but I can fake it better now. And Tom Hatch,
the guy that ran the project before it was a company, was famous for this, where I could send
in horrifying levels of code. And every time he would merge it in,
and then 10 minutes, there'd be another patch that comes in that fixes all the bugs I just
introduced. And it was just such a warm onboarding. I'm not suggesting that approach, and I'm not
saying it's scalable, but I started contributing. And I became the first Debian and Ubuntu packager
for SaltStack, which was great. And I did a terrible job at it
because, let me explain. I don't know if it's any better now, but back in those days, there were
multiple documentation sources on the proper way to package software. They were all contradictory
with each other. There was no guidance as to when to follow each one. There was never a,
you know nothing about packaging, here's what you need to know step by step. And when you get it
wrong, they yell at you. And it turns out that the best practice then to get it formally accepted
upstream, which is what I did, is do a crap ass job. And then you'll wind up with a grownup coming
in like, this is awful, move, and then they'll fix it and yell at you and gatekeep
like hell. And then you have a package that works and gets accepted upstream because the magic
incantation has been said somewhere. And what I loved about FPM was that I could take any random
repo or any source tarball or anything I wanted, run it through with a single command, and it would
wind up building out a RPM
and a DEB file, and I don't know what else it supported. Those are the ones I cared about,
that I could then install on a system. I could put a repo and add that to a sources list on
systems and get it to automatically install. So I could use configuration management,
like SaltStack, to wind up installing custom local packages. And oh my God, did the packaging
communities for multiple different distros hate you. And specifically what you had built, because
this was not the proper way to package. How dare you solve an actual business problem someone has
instead of forcing them to go to packaging school where the address is secret and you have to learn
that. It was awful. It was the clearest example
that I can come up with of gatekeeping. And then you're coming up with FBM, which gets rid of user
pain. And I realized that in that fight between the church of orthodoxy of this is how it should
be done and the you're having a problem, here's a tool that makes it simple. I know exactly what
side of that line I wanted to be on. And I hadn't always
been previously. And that is what clarified it for me. Yeah. FPM was a, was like a really delightful
enjoyment for me to build. The origins of that was I worked at a company and they were all,
I think at the time we were RPM based. And then as folks tend to do, I bounced around between jobs
almost every year. So I went from one place that...
Hey, it's me.
Right?
And there's absolutely nothing wrong with leaving every year or staying longer.
It's just whatever progresses your career in the way that you want and keeps you safe
and your family safe.
But we were using RPM and we were building packages already not following the orthodoxy.
A lot of times, if you ask someone to build like a package for Fedora,
they'll point you at like the maximum RPM book.
And that's a lot of pages.
And honestly, I'm not going to sit down and read it.
I just want to like take a bunch of files,
name it and install it on 30 machines with Puppet.
And that's what we were doing.
Q one year later, I moved to a new company
and we were using Debian packages
and they're the same thing.
What struck me is they are identical.
It's a bunch of files
and don't pedant me about this.
It's a bunch of files with a name
with some other sometimes useful metadata
like other names that you might depend on.
And I really didn't find it
enjoyable to transfer my knowledge of how to build RPMs and the tooling and the structures and the
syntaxes to building Debian packages. And this was not for greater publication. This was, I have a
bunch of internal applications I needed to package and deploy with, at the time it was Puppet.
And it wasn't fun. So I did what we did with Grok,
which was codify that knowledge
to sort of reduce the burden.
And after a few, probably a year or so of that,
it really dawned on me that a generality
is all packaging formats
are largely solving the same problem.
And I wanted to build something
that was solving problems for folks like
you and me, sysadmins who were handed a pile of code and they needed to get into production.
And I wasn't interested in formalities or appeasing any priesthoods or orthodoxies about
what really, you know, you should really shine your package
with this special wax kind of thing. And because all of the documentation for Debian packages,
Fedora packages are often dedicated to those projects. You're going to submit a package to
Fedora so that the rest of the world can use it on Fedora. That wasn't my use case.
I've built a thing and the thing that I built is awesome and i want the world to use it so now i have to go to packaging school not just
once but twice and possibly more that's awful or more yeah and it's tough this episode is sponsored
in part by our friends at jellyfish so you're sitting in your office chair bleary-eyed parked
in front of a powerpoint and oh, my sweet feathery
Jesus, it's the night before the board meeting, because of course it is. As you slot that crappy
screenshot of traffic light-colored Excel tables into your deck, or sift through endless spreadsheets
looking for just the right data set, have you ever wondered why is it that sales and marketing get
all this shiny, awesome analytics and insight tools, whereas engineering basically gets left with the dregs? Well, the founders of Jellyfish certainly
did. That's why they created the Jellyfish Engineering Management Platform, but don't
you dare call it JEMP. Designed to make it simple to analyze your engineering organization,
Jellyfish ingests signals from your tech stack, including JIRA,
Git, and collaborative tools. Yes, depressing to think of those things as your tech stack,
but this is 2021. And they use that to create a model that accurately reflects just how the
breakdown of engineering work aligns with your wider business objectives. In other words,
it translates from code into spreadsheet. When you have to
explain what you're doing from an engineering perspective to people whose primary IDE is
Microsoft PowerPoint, consider Jellyfish. That's jellyfish.co and tell them Corey sent you.
Watch for the wince. That's my favorite part. And this gets back to what I found. It was rare
that I could find a way to contribute to something meaningfully. And I was using Logstash after your talk. I'd started using it and rolling it out somewhere. And I discovered that there wasn't a Debian. And the thing is, is you would never frame it
this way. But the answer was, of course, pull request welcome, which is often an invitation
to do free volunteer work for companies. But this was an open source project that was not backed by
a publicly traded company. It was some guy. And of course, I'll pitch in on that. And I
checked the commit log on this for what it is that I see. And sure enough, I have two commits. The first one
was on Sunday night in February of 2013. And my commit message was initial packaging work
for dev building. And sure enough, there's a bunch of files I put up there and it's great.
And my second and last commit was 12 minutes later saying, remove large binary because I'm foolish. Yeah. Is that you?
Oh yeah, I'm sure. Yeah, it was great. I didn't know GoGetWorked back then. I'm sure it's still
in the history there. I wonder how big that binary is and exactly how much I have screwed people over
in the last decade since. I've noticed this over time and every now and then you'd be, I would be,
or someone would be on a slow internet connection,
which again is something that we need to optimize for, or at least be aware of and help if where
we can.
Someone would be cloning Logstash or on an airplane or something like that, or rural
setting.
And they would say, get stuck at 76% for like 10 minutes.
And you would go back and dust off your tome of how to use Git because it's
a very difficult piece of software to use. And you would find this one blob, and I never even
looked at who committed it or whatever, but it was like, I think it was 80 megs of a jar file
or a Debian package that was the Logstash release. That's such a small world that you're like, yep, that was me. Oh yeah, oh yeah.
Let's check this just for fun here.
To be clear, the entire repository right now is 167 megs.
So that file that I had up there for all of 13 minutes
lives indelibly and gets history
and it is fully half of the size
of the entirety of the Logstash project.
All right then, I didn't realize this was one of those
confess your sins episodes, but here we are.
Look, sometimes we put flags on the moon.
Sometimes we put big files in Git.
You could, just for posterity,
we could go back and edit the history and remove that,
but it never became important to do it.
It wasn't loud.
People weren't upset enough by it,
or it didn't come up enough to say,
you know what, this is a big file. So it's there. You left your mark. You know, we take what we can get.
It's an odd time. I'll have to do some digging around. I'm sure I'll tweet about this as soon
as I get a little more, a bit more data on it. But I wonder how often people have had frustration
caused by like, there's no ill intent here to be very clear, but it was instead a, I didn't know how
Git worked very well. I didn't know what I was doing in a lot of respects. And sure enough,
in the fullness of time, some condescending package people came in and actually made this
right. And there is a reasonable responsible package now because surprise, of course there is,
but I wonder how much inadvertent pain that caused people by that ridiculous commit. And it's the
idea of impact and how this stuff works. Like I'm not happy that people on a plane with slow connection had
to wait an extra minute or two to download that nonsense. It's one of those things that is,
oops, I feel like a bit of a heel for that. Not for not knowing something, but for causing harm
to folks. It's intent doesn't outweigh impact. There is a lesson in there for it.
Agreed. On that example, I think one of the things,
code is not the most important thing I can contribute to a project,
even though I feel very confident in my skills
and programming in a variety of environments.
I think the number one thing I can do is listen
and look for sources of pain.
And people would come in and say,
I can't get this to work.
And we would work together and figure out
how to make it work for their use case. And that could result in a new feature, a bug fix get this to work. And we would work together and figure out how to make it work for
their use case. And that could result in a new feature, a bug fix, or some documentation
improvements, or a blog post or something like that. And I think in this case, I don't really
recall any amount of noise for someone saying, cloning the Git repository is just a pain in the
butt. And I think a lot of that is because either the people
who would be negatively impacted by that weren't doing that use case they were downloading the
releases which were as small as we could possibly get them or they were editing files using the
github online edit the file thing which is a totally acceptable it's perfectly fine way to do
things in Git.
So I don't remember anyone complaining about that particular file size issue.
The Elasticsearch repository is massive. And I don't think it even has binaries. It just has so much more. Someone accidentally committed their entire production test data set at one point,
and oopsie doozy. Yeah, it is not the most egregious harm I've ever caused. Yeah, it's
there. The thing that I guess resonates with me and still does is the lessons I learned from you,
I could sum them up as being not just empathy driven, because that's the easy answer,
but the other layers were that you didn't need to be the world's greatest expert in a thing
in order to credibly give a conference talk.
To be clear, you were miles ahead of me and still are in a lot in order to credibly give a conference talk. To be clear, you were miles
ahead of me and still are in a lot of different areas. And that's fine, but you don't need to be
the, like, you are not the world's greatest expert on empathy, but that's what I took from the talk
and that's what it was about. It also taught me that things you can pick up from talks and other
means are, there are things you can talk about in terms of technology and there
are things you can talk about in terms of people and the things about people do not have expiration
dates in the same way that technology does and if i'm going to be remembered for impact on people
versus impact on technology for me there's no contest and you forced me to really think about
a lot of those things and it started my path to i guess becoming a public speaker and then later
all of the rest that followed like this podcast the the nonsense on Twitter and all the rest. So it is, I guess,
we can lay the responsibility for all that at your feet. Enjoy the hate mail.
Ah, my email address is now closed. I'm sorry.
Exactly.
Well, I appreciate the kind words.
We'll get letters on this one. It's the impact that people have. And sometimes,
I don't think you knew at the time that that's the impact you were having.
It matters.
I agree.
I think a lot of it came from, how do I want to experience this?
And it was much later that it became something that was really outside of me in the sense
that it was building communities.
One of the things I learned shortly after, or even just before joining Elastic, was how
many folks were looking to solve a problem,
found Logstash, became a participant in the community. And that participation could just
be anything, just hanging out on IRC, on the mailing list, whatever. And the next step for
them was to get a better paying job in an environment they enjoyed that helped them
take the next step in their career. Some of those people came to work with me at Elastic.
Some of them started to work on the Logstash team.
At some point, they decided,
because a lot of Logstash users were sysadmins,
and on the Logstash team, we were all developers.
We weren't sysadmins.
There was nothing to operate.
And a lot of folks would come on board,
and they were like, you know what?
I'm not enjoying writing Ruby for my job.
And they could take the next step to transition to the support team or the sales engineer team or cloud operations team at Elastic.
So it was really, like you mentioned, it has nothing to do with the technology of, to me, why these projects are important.
They became an amplifier and a hand to pull people up
to go the next step they need to go.
And on the way, maybe they can make a positive impact
in the communities they participate in.
If those happen to be FPM or Logstash, that's great.
But I think I want folks to see that technology
doesn't have to be a grind of getting through gatekeepers,
meeting artificial barriers and things like that.
The thing that I took to is that I gave a talk in 2015 or 16, which is strangely appropriate now,
terrible ideas in Git. And yes, checking large binaries in is one of the terrible ideas I talk
about. It's Git through counter example. And around that time, I also gave a talk for a while on how to handle a job interview and advance your career. Only one
of those talks has resulted in people approaching me even years later saying that what I did had
changed aspects of their life. It wasn't the get one. And that's the impact it comes down to. That
is the change that I wanted to start having because I saw someone else do it and realized, you know, maybe I could possibly be that good someday. Well, I'd like to think I made it on
some level. I'm proud of the impact you've made. And I agree with you. It is about people. Even
with FPM, where I was very selfishly tickling my own itch, I don't want to remember all of this
stuff. And I also enjoy operating outside of the boundaries of a church or, you know, whatever the priesthoods that say this is how you must do a thing.
I knew there was a lot of folks who worked at jobs and they didn't have authority and they had to deploy something.
And they knew if they could just package it into a Debian format or an RPM format or whatever they needed to do, they could get it deployed and it would make their lives easier. Well, they didn't have the time or
the energy or the support in order to learn how to do that. And FPM brought them that success where
you can say, here's a bunch of files, here's a name, poof, you have a package for whatever format
you want. Where I found FPM really take off is when Jam and Python and Node.js support were added.
The sysadmins were kind of sandwiched
in between two impossible worlds
where they are only authorized
to deploy a certain package format,
but all of their internal application developer teams
were using Node.js and newer technologies
and all of those package formats
were not permitted by whoever had the authority to permit those things at their job. But now they
had a tool that said, you know what, we can just take that thing. We'll take Django in Python and
we'll make it an RPM and we won't have to think a lot about it. And that really, I think, to me,
my hope was that it de-stresses that sort of work environment where you're not having to do three weeks of brand new work every time someone releases something internally at your company. You can just run a script that you wrote a month ago and maintain it as you go.
Wouldn't that be something?
Ideally, ideally. Jordan, I want to thank you for not only the stuff you did 10 years ago, but also the stuff
you just said now.
If people want to learn more about you, how you view the world, see what you're up to
these days, where can they find you?
I'm mostly active on Twitter, at Jordan Cissell, all one word.
Mostly these days, I post repair or repair stuff I do on the house.
I'm a stay-at-home full-time dad these days.
And I'm still doing maintenance on the projects that need maintenance,
like FPM or XdoTool.
So if you're one of those users, I hope you're happy.
If you're not happy, please reach out,
and we'll figure out what the next steps can be.
But yeah, if you like bugs, especially spiders,
or if you don't like spiders and you want to like spiders, check me out on Twitter.
I'm often posting macro photos, close-up photos of butterflies, bees, spiders, and the like.
And we will, of course, throw links to that in the show notes.
Jordan, thank you so much for your time today. It's appreciated.
Thank you, Corey. It's good talking to you.
Jordan Cissell, founder of Logstash, and currently blissfully not working on a particular corporate
job. I envy him some days. I'm cloud economist Corey Quinn, and this is Screaming in the Cloud.
If you've enjoyed this podcast, please leave a five-star review on your podcast platform of
choice. Whereas if you've hated this podcast, please leave a five-star review on your podcast
platform of choice, along with an angry comment that in which you have also embedded a large binary.
If your AWS bill keeps rising and your blood pressure is doing the same,
then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying.
The Duck Bill Group works for you, not AWS.
We tailor recommendations to your business, and we get to the point.
Visit duckbillgroup.com to get started. this has been a humble pod production
stay humble